New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforced data size in channels 2.1.7 prevent large file uploads #1240

Open
EliotBerriot opened this Issue Feb 6, 2019 · 16 comments

Comments

Projects
None yet
4 participants
@EliotBerriot
Copy link

EliotBerriot commented Feb 6, 2019

Channels was released a few days ago (kudos and thank you for the maintainance effort!) and we noticed a breaking change in our application.

Large file uploads that were working before are now failing, and understand it's related to this specific commit a1ecd5e by @Zarathustra2

I'm not discussing the rationale behind that commit, because enforcing the max size looks indeed better than before from a security/stability perspective. However, it is a breaking change for our app (and possibly others), because file uploads that were working before are now failing.

Since we were not setting DATA_UPLOAD_MAX_MEMORY_SIZE in our settings file, the default value (2.5Mio) applies, which is considerably lower than our average uploaded file size.

In theory, we could stick on channels==2.1.7 and set DATA_UPLOAD_MAX_MEMORY_SIZE to match our requirements, but AFAIU, it would apply to the whole payload, and not only file size. And we'd like to allow large files, but not excessively large POST data.

Also, as stated in Django's documentation about DATA_UPLOAD_MAX_MEMORY_SIZE:

The check is done when accessing request.body or request.POST and is calculated against the total request size excluding any file upload data.`

Based on that, I do think there is a bug in the way the check was implemented in channels, because it does not exclude file data.

I can try working on a PR if you agree with my suggested changes:

  • Document the breaking change in the changelog, so people know what to do when upgrading
  • Remove uploaded files when computed payload size

Let me know your thoughts and I'll start working on a fix :)

@carltongibson

This comment has been minimized.

Copy link
Member

carltongibson commented Feb 6, 2019

Hi @EliotBerriot. OK, yes, from the description is sounds as if something is amiss. It sounds as if the change (plus your requirements) has unveiled a latent issue.

First step: can you put together a reproduce, in a test case or sample project so we can see exactly what's going on?

@Zarathustra2

This comment has been minimized.

Copy link
Contributor

Zarathustra2 commented Feb 6, 2019

Hi,

@carltongibson @EliotBerriot first of all, sorry that my commit has brought you trouble.

Unfortunately, my django knowledge is not good enough to fix it by myself but I would be happy to help you since I created the bug.

I think this is also related to #1171. So in a possible solution we could also move the check of DATA_UPLOAD_MAX_MEMORY_SIZE to the read method and provide lazy evalution as it is originaly implemented in Django.

@carltongibson

This comment has been minimized.

Copy link
Member

carltongibson commented Feb 6, 2019

Yep. That makes sense. @Zarathustra2 this isn’t you fault. 🙂 The more developed handling just isn’t there yet. Very happy to see input. (That test case would still be where I’d start...)

@EliotBerriot

This comment has been minimized.

Copy link
Author

EliotBerriot commented Feb 7, 2019

@carltongibson @EliotBerriot first of all, sorry that my commit has brought you trouble.

Please don't be, things break and it's nobody's fault :)

The more developed handling just isn’t there yet. Very happy to see input. (That test case would still be where I’d start...)

I'm not sure if I'll find my way around the fix, but I feel confident enough to write a test case, and I'll be working on that this afternoon!

@carltongibson

This comment has been minimized.

Copy link
Member

carltongibson commented Feb 7, 2019

Awesome. Thanks @EliotBerriot! (It’s much easier from there, since there’s something to play with.)

EliotBerriot added a commit to EliotBerriot/channels that referenced this issue Feb 7, 2019

EliotBerriot added a commit to EliotBerriot/channels that referenced this issue Feb 7, 2019

EliotBerriot added a commit to EliotBerriot/channels that referenced this issue Feb 7, 2019

@EliotBerriot EliotBerriot referenced a pull request that will close this issue Feb 7, 2019

Open

Enforced data size in channels 2.1.7 prevent large file uploads #1242

2 of 2 tasks complete
@jpic

This comment has been minimized.

Copy link
Contributor

jpic commented Feb 7, 2019

@EliotBerriot did you also try the following suggestion, otherwise can you provide any refutation perhaps ?

we could also move the check of DATA_UPLOAD_MAX_MEMORY_SIZE to the read method and provide lazy evalution as it is originaly implemented in Django.

@EliotBerriot

This comment has been minimized.

Copy link
Author

EliotBerriot commented Feb 7, 2019

@jpic I didn't, the concrete steps to do that were not really clear for me. Also, I wanted to avoid touching to code I did not understand (this is my first contact with the channels codebase).

Since the read() method is currently a member of django's own http.HttpRequest class, it would also mean either reimplementing it or overriding it and I'm not sure how it would fix the problem. The current implementation fails fast, which is probably better, the real issue IMHO is that it is not compatible with what Django actually does in regard to file handling.

@jpic

This comment has been minimized.

Copy link
Contributor

jpic commented Feb 7, 2019

the concrete steps to do that were not really clear for me

Same here, and here's what I figured while searching, you're gonna love this, or i'm completely lost 😂

Turns out that:

  • your tests are valid,
  • they also pass without the check raise in the constructor introduced by a1ecd5e,
  • because the test does not catch the raise that comes from the constructor,
  • see for yourself, after removing the check channels have in the constructor, and removing the pytest.raises:
================================== FAILURES ===================================
_____ RequestTests.test_size_check_ignore_files_but_honor_other_post_data ______

self = <tests.test_http.RequestTests testMethod=test_size_check_ignore_files_but_honor_other_post_data>

    def test_size_check_ignore_files_but_honor_other_post_data(self):
        ....
        with override_settings(DATA_UPLOAD_MAX_MEMORY_SIZE=1):
>           request.POST

tests/test_http.py:233: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
channels/http.py:139: in _get_post
    self._load_post_and_files()
.tox/py37-dj21/lib/python3.7/site-packages/django/http/request.py:310: in _load_post_and_files
    self._post, self._files = self.parse_file_upload(self.META, data)
.tox/py37-dj21/lib/python3.7/site-packages/django/http/request.py:269: in parse_file_upload
    return parser.parse()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <django.http.multipartparser.MultiPartParser object at 0x7fd0e1a38358>

    def parse(self):
         ...
    
                    # Avoid reading more than DATA_UPLOAD_MAX_MEMORY_SIZE.
                    if settings.DATA_UPLOAD_MAX_MEMORY_SIZE is not None:
                        read_size = settings.DATA_UPLOAD_MAX_MEMORY_SIZE - num_bytes_read
    
                    # This is a post field, we can just set it in the post
                    if transfer_encoding == 'base64':
                        raw_data = field_stream.read(size=read_size)
                        num_bytes_read += len(raw_data)
                        try:
                            data = base64.b64decode(raw_data)
                        except binascii.Error:
                            data = raw_data
                    else:
                        data = field_stream.read(size=read_size)
                        num_bytes_read += len(data)
    
                    # Add two here to make the check consistent with the
                    # x-www-form-urlencoded check that includes '&='.
                    num_bytes_read += len(field_name) + 2
                    if (settings.DATA_UPLOAD_MAX_MEMORY_SIZE is not None and
                            num_bytes_read > settings.DATA_UPLOAD_MAX_MEMORY_SIZE):
>                       raise RequestDataTooBig('Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE.')
E                       django.core.exceptions.RequestDataTooBig: Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE.

@EliotBerriot, both your tests still pass after completely reverting a1ecd5e, see for yourself in https://github.com/jpic/channels/tree/testrevert

My recommendation would be to keep your security tests which we always need to have, but revert a1ecd5e at all.

@carltongibson

This comment has been minimized.

Copy link
Member

carltongibson commented Feb 7, 2019

Hey @EliotBerriot and @jpic. Thanks for your efforts here. Super. We need to be 100% clear here so all for the good!

This line here contrasts markedly with Django's:

channels/channels/http.py

Lines 131 to 132 in 8499a42

self._body = body
assert isinstance(self._body, bytes), "Body is not bytes"

body is bytes. So have we already got it in memory, and do we need to bail on that to avoid potential memory exhaustion? (Current status: I need to look.)

@jpic

This comment has been minimized.

Copy link
Contributor

jpic commented Feb 7, 2019

I might not be able to provide further relevant clues without actually reading the WSGI and ASGI specs themselves. Meanwhile, there's something that I find interesting in the django code you have pointed out.


            # Limit the maximum request data size that will be handled in-memory.
            if (settings.DATA_UPLOAD_MAX_MEMORY_SIZE is not None and
                    int(self.META.get('CONTENT_LENGTH') or 0) > settings.DATA_UPLOAD_MAX_MEMORY_SIZE):
raise RequestDataTooBig('Request body exceeded settings.DATA_UPLOAD_MAX_MEMORY_SIZE.')

Does that mean that this check is completely "declarative" ? I mean, what happens if I add a long shellcode in the User-Agent header value along with a small Content-Length header ? Will that really prevent the User-Agent header value from being read in memory at all ? If so, how ? Thanks in advance for baring with me.

@carltongibson

This comment has been minimized.

Copy link
Member

carltongibson commented Feb 7, 2019

No problem. 🙂. The read() call is key. In WSGI land we get given a file like object. Which we can then pull into memory. (How the server in front of us handled that is not our problem.)

So yes, we have to look at what we're getting here, and work out where best to place our limits.

@carltongibson

This comment has been minimized.

Copy link
Member

carltongibson commented Feb 7, 2019

@jpic You knew this would be fun right! 😀

@jpic

This comment has been minimized.

Copy link
Contributor

jpic commented Feb 7, 2019

Maybe we should open a new issue about this, because Eliot's patch itself works as intended:

  • new tests seem good to have,
  • patch in runtime code does fix a regression,

Wether or not we want to reconsider a1ecd5e & connected code, should not block Eliot's PR because contains a bugfix that looks critical ?

After all, the patch fixes a regression and at the same time adds extra tests to prove that it's still fine, and secures channels in case Django changes the contract.

@jpic

This comment has been minimized.

Copy link
Contributor

jpic commented Feb 7, 2019

@carltongibson you bet 😂

About a1ecd5e, I can understand that it's trying to reproduce the Django code you have pointed out, which legitimates it after all. But it's not clear to me how boundary content is excluded from the check in Django, since Content-Length should include the sum of boundary data length.

Is it because they use self._body or just self instead of self.body in the case of multipart/form-data, which completely bypasses the check in the self.body property ?

It seems the code from Django in the body property (pointed out by @carltongibson above) is not applicable in the case of multipart/form-data requests. In that case, it will actually avoid calling self.body, call self.parse_file_upload which in turn calls MultiPartParser.parse, which does the check that Eliot's tests prove still working / covering channels code itself.

So, it turns out I understand the justification for the parent change a1ecd5e as well as Eliot's change as-is now.

Thanks y'all for sharing some of your insight ;)

@carltongibson

This comment has been minimized.

Copy link
Member

carltongibson commented Feb 12, 2019

So, just an update here.

I am looking into this. In particular, how we're loading the request body prior to instantiating the AsgiRequest:

channels/channels/http.py

Lines 208 to 214 in a660d81

# See if the message has body, and if it's the end, launch into
# handling (and a synchronous subthread)
if "body" in message:
body += message["body"]
if not message.get("more_body", False):
await self.handle(body)
return

I want to look into whether we can wrap that somehow in a file-like, which would then allow us to leverage Django's established patterns here. (I don't know at all if that'll work yet but...) It would be nice to lazily pull the request body data into memory.

I'd rather Measure twice, cut once here. I'm not convinced that there's anything critical at stake, in the sense that we need to rush a half-thought patch out: adjusting DATA_UPLOAD_MAX_MEMORY_SIZE is a reasonable approach in the meantime, and I'm presuming there's always a reverse proxy in play (nginx etc) that can (also) impose sensible limits.

@jpic

This comment has been minimized.

Copy link
Contributor

jpic commented Feb 12, 2019

@carltongibson security became a non issue for me when:

  • django also bypasses this kind of security for this kind of requests
  • we're always supposed to have at least nginx if not a waf in front on django/channels
  • everytime i make a site that's serious about upload it turns out i have to change the default nginx settings for that purpose

So at the end of the day the pull request has no impact on actual security, and still fixes a regression that's pretty critical when supporting file uploads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment