Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading multipart data with \n (LF) lines #3492

Merged
merged 5 commits into from
Jan 15, 2019

Conversation

kxepal
Copy link
Member

@kxepal kxepal commented Jan 6, 2019

What do these changes do?

While RFC clearly says about CRLF newlines, there quite a lot of
implementations which uses just LF. Even Python's stdlib produces
multiparts with \n newlines by default for compatibility reasons.

We wouldn't change how we produce multipart content - here we follow
RFC. However, we can detect \n lines quite easily which makes their
support quite cheap.

Are there changes in behavior for the user?

No notable, except now they can read multipart data with "popular" newlines.

Related issue number

#2302

@codecov-io
Copy link

codecov-io commented Jan 6, 2019

Codecov Report

Merging #3492 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3492      +/-   ##
==========================================
- Coverage   97.95%   97.93%   -0.03%     
==========================================
  Files          44       44              
  Lines        8554     8561       +7     
  Branches     1377     1378       +1     
==========================================
+ Hits         8379     8384       +5     
- Misses         71       72       +1     
- Partials      104      105       +1
Impacted Files Coverage Δ
aiohttp/multipart.py 96.14% <100%> (+0.04%) ⬆️
aiohttp/web_fileresponse.py 96.55% <0%> (-1.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a0c7bf...e9c8c33. Read the comment docs.

@asvetlov
Copy link
Member

asvetlov commented Jan 8, 2019

Thanks for the PR!
Does newline style autodetection make sense?

@kxepal
Copy link
Member Author

kxepal commented Jan 8, 2019

Honestly, I'm still not sure, but if it solves user problems and while it's quite cheap (let's not take in account tests...) in implementation it feels fine. So far only @thehesiod reported issue about.

We can hold this change for awhile and gather more user feedback to make sure we have strong reasons to relax RFC implemenation on reader part. But it feels for me it will be long time to wait.

Copy link
Contributor

@thehesiod thehesiod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not fix StreamReader.readline, called via MultipartReader.next

@thehesiod
Copy link
Contributor

oh, I may have tested this incorrectly, checking

@kxepal
Copy link
Member Author

kxepal commented Jan 8, 2019

@thehesiod Hey! (:
Could you provide some code as an example for test case? Because we already have tests for next and readline and both are fine with both newlines.

@thehesiod
Copy link
Contributor

ya I was using the wrong version, sorry, and now and now getting issues starting up multiple threaded servers after conforming to the new APIs. How can I dismiss my review? I'm still trying to get my branch working with these changes, will report back when I have findings.

@kxepal
Copy link
Member Author

kxepal commented Jan 8, 2019

IIRC, on merge block right below the comments there should be some button about review cancellation. As backup plan, I can just change the latest commit hash via rebase over master or just ask @asvetlov for sudo git merge. In anyway, no worries about (:

@thehesiod
Copy link
Contributor

thehesiod commented Jan 8, 2019

ok so, after a few tries it works, will have to look into it more later, seems like a multithreading issue with aiohttp (one loop and server per thread with new release), but for now:

  • I was calling web.Request.multipart(), however there's no param to set for the newline, so I now have to do: MultipartReader(req.headers, req.content, newline=b'\n'), should we add a param to this method?
  • for automatic detection, this would be really nice because otherwise your server can only accept one type or the other, but not both. Similar for reading multipart, somehow you'd have to detect/know what the newlines are otherwise. I'd prefer having an auto-detect mechanism.

thanks guys! sorry about that little disruption, got my virtualenvs confused :)

@thehesiod
Copy link
Contributor

btw I don't see anything about cancelling my review there, perhaps asvetlov or you can dismiss my review

@kxepal
Copy link
Member Author

kxepal commented Jan 8, 2019

The initial idea is about to autodetect on MultipartReader side, while BodyPartReader have to be informed about what kind of newline to expect. So it should work in seconds case. But since the former one is deep in implementation details, everything should work for you without worry about what kind of newline is there.

@kxepal
Copy link
Member Author

kxepal commented Jan 8, 2019

Hm...I should make a test case for this. Hold on.

@thehesiod
Copy link
Contributor

sorry not following, from what I understand of this PR, you need to state upfront to your multipart reader what the newline is....which means if I write a server, two things are true:

  1. It needs to know upfront what the multipart newline format is
  2. It can only accept or format or another

These are two big negatives. In my case it's ok, because I know upfront what it is, and I don't need to support mixed newline requests. However in general this is not true if you have a generic server, you can't control what format clients will send their newlines.

@kxepal
Copy link
Member Author

kxepal commented Jan 8, 2019

No, you don't need to care about what newline is. It handles by this part of changes for you. In other words: first boundary dictates newline format fot the rest ones.

@thehesiod
Copy link
Contributor

very cool! 🙇

@asvetlov
Copy link
Member

asvetlov commented Jan 8, 2019

Sorry, I don't follow the design.

Looks like newline argument is accepted here: https://github.com/aio-libs/aiohttp/pull/3492/files#diff-235eeadd1960735e1191879aa500b7b7R538
but overridden later here: https://github.com/aio-libs/aiohttp/pull/3492/files#diff-235eeadd1960735e1191879aa500b7b7R651

Did I miss something?

@kxepal
Copy link
Member Author

kxepal commented Jan 8, 2019

@asvetlov
Multipart is recursive format. Once we detect newline at the very first boundary we have to pass it around for nested readers. So we have to have it as an argument and have to rewrite it during parsing, but only once.
I just noticed that current implementation allows to have mix of different newlines - that's need to forbid for good.

@asvetlov
Copy link
Member

Is it ready to merge?

@kxepal
Copy link
Member Author

kxepal commented Jan 13, 2019

Sorry, not yet. Will be able to finish this a bit later today.

While RFC clearly says about `CRLF` newlines, there quite a lot of
implementations which uses just `LF`. Even Python's stdlib produces
multiparts with `\n` newlines by default for compatibility reasons.

We wouldn't change how we produce multipart content - here we follow
RFC. However, we can detect `\n` lines quite easily which makes their
support quite cheap.
@kxepal
Copy link
Member Author

kxepal commented Jan 14, 2019

@asvetlov
Ok, I think it's ready. I was tries to catch the mixed newlines case when CRLF and LF may vary, but it seems to be not issue for us. PR rebased. If everythink ok for you and @thehesiod let's merge it.

Just for case. That's a strange case, but it seems we pass it.
aiohttp/multipart.py Outdated Show resolved Hide resolved
This argument is not designed to be defined by users. Depending on
parsing multipart newline format it will be chosen automatically and
due to recursive native of multipart format it have to be passed around
for nested readers.
@asvetlov asvetlov merged commit fcedc66 into aio-libs:master Jan 15, 2019
@asvetlov
Copy link
Member

Thanks!

@thehesiod
Copy link
Contributor

Nice work, thank you!!

@kxepal
Copy link
Member Author

kxepal commented Jan 15, 2019

🍻

@lock
Copy link

lock bot commented Jan 15, 2020

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Jan 15, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jan 15, 2020
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jan 15, 2020
@@ -0,0 +1 @@
Support reading multipart data with `\n` (`LF`) lines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. we need to watch for incorrect single quotes in RST...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants