Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multipart_form instead of multipart-form-data #45

Merged
merged 2 commits into from
May 4, 2021

Conversation

dinosaure
Copy link
Contributor

As mentioned in #3, this PR is a first-shot to be able to stream and upload large files with dream. I tried to respect as much as I can the initial interface and tweak a bit some details but:

  1. the initial example still works with the new interface
  2. I added an other example to upload and save a file at a specific location (which completely ignore security values)

I'm not very convince by the API and the internal state of the request but I don't have the time to figure out about a better interface 馃槙. The state uses a trick about object to generate unique ID and I think this is the only obscure part of my code.

Then, the "state-machine" is pretty simple and we don't need the ancient plumbing with streams & tasks.

Finally, the multipart_form package (what is needed for this PR) is unreleased but it's about few functions such as *.to_string ot Header.to_list. Hope that this PR is good, feel free to share a feedback 馃檪 .

@tmattio
Copy link
Contributor

tmattio commented Apr 22, 2021

Just skimming in, @dinosaure you might want to provide a script to generate the 100MB file (can be in a dune rule to automate it for users) instead of committing it to the repo, otherwise, it'll slow down the git clones for contributors

@dinosaure
Copy link
Contributor Author

Just skimming in, @dinosaure you might want to provide a script to generate the 100MB file (can be in a dune rule to automate it for users) instead of committing it to the repo, otherwise, it'll slow down the git clones for contributors

Ah yes, I forget about that, sorry.

@aantron
Copy link
Owner

aantron commented Apr 22, 2021

Thanks! I'm reviewing the code now.

I'm initially puzzled by this phrase, though:

this PR is a first-shot to be able to stream and upload large files with dream.

AFAIK the current dependency multipart-form-data is also able to both stream and upload large files, and Dream uses it correspondingly (as you must have seen in the code). Can you positively confirm that multipart-form-data cannot stream? What does this comment refer to? Does multipart-form-data do poor flow control, leading to an accumulation of data and growth of internal buffers? Does it buffer large chunks for processing in some way, that is not obvious from the API? Does it have some bugs with streaming? Something else?

dream.opam Outdated
@@ -95,6 +95,10 @@ depends: [
"ppx_expect" {with-test}
]

pin-depends: [
[ "multipart_form" "git+https://github.com/dinosaure/multipart_form#e4cbe0db5c6a03e02e9018c25e6ba0dbc5de4809" ]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a note here, to resolve it once there is a multipart_form release, so we can remove the pin-depends and add a constraint instead.

src/dream.mli Outdated Show resolved Hide resolved
example/g-stream-upload/dune Outdated Show resolved Hide resolved
src/pure/inmost.ml Outdated Show resolved Hide resolved
src/middleware/upload.ml Show resolved Hide resolved
src/middleware/upload.ml Outdated Show resolved Hide resolved
src/middleware/upload.ml Outdated Show resolved Hide resolved
@dinosaure
Copy link
Contributor Author

AFAIK the current dependency multipart-form-data is also able to both stream and upload large files, and Dream uses it correspondingly (as you must have seen in the code). Can you positively confirm that multipart-form-data cannot stream? What does this comment refer to? Does multipart-form-data do poor flow control, leading to an accumulation of data and growth of internal buffers? Does it buffer large chunks for processing in some way, that is not obvious from the API? Does it have some bugs with streaming? Something else?

So multipart-form-data can stream out the output but I think it does that in the wrong way. From experimenting with @Armael on multipart_form, we noticed several issues about Lwt_stream and lwt in general. We did the choice to use Lwt_stream.create_bounded internally because it seems that lwt does not take the opportunity to reschedule properly between the reader and the writer (even if the reader is plugged to a waiting syscall - such as read).

We noticed that we don't let any chance to OCaml to garbage-collect previous chunks and this is why we decided to use Lwt_stream.create_bounded to blocks and:

  1. reschedule to the writer
  2. clean old chunks

We tried to play with Lwt.pause () and even if it's better, it's less deterministic than a use of Lwt_stream.create_bounded. Then, by design, I really think that we should let the opportunity to use Lwt.async/Lwt.both to the end-user to show that we really able to concurrently execute the consumer & the producer. multipart-form-data does not really highlight such usage - and I think it's why you wrote such plumbing before.

Moreover, multipart_form comes from a huge work about RFCs where RFC 7578 is only the top. The bedrock of all of these stuffs is the RFC 822 unfortunately which can allow many (many ...) surprises. For example, we can talk about the FWS token which allows to write a field value into multiple lines (and, as far as I can see, multipart-form-data does not handle such case - which is pretty common if it's combined with the limit of 80 characters per line). So I developed unstrctrd to takes care about this "detail".

RFC 7578 deprecates Content-Transfer-Encoding, however we can have such way to encode values. It's specially true for the quoted-printable encoding which is handled by pecu. The base64 encoding exists too and in a different way than what we usally think about base64. It's why we extended ocaml-base64 to handle RFC 2045 contents. All of that are behind multipart_form - you don't need to deal with quoted-printable or base64.

I think, if we want to compare multiple implementations of multipart/form-data, we should take about such details. In most of our cases, all of these details don't really appear for an usual case but I already discovered these mysteries in the web. I know that we have a huge gap between what the RFC describes and what we commonly use - and thanks to @Armael to help to fill this gap. But I'm convince that the right way to handle is not to discard historic RFCs which are the part of the result by (historic) inheritance.

@aantron aantron force-pushed the master branch 5 times, most recently from fb17623 to a3bc25b Compare April 28, 2021 10:55
@dinosaure dinosaure force-pushed the mutlipart_form branch 3 times, most recently from 52995fb to cfaee2e Compare May 3, 2021 20:20
@dinosaure
Copy link
Contributor Author

I pushed-force a new version (without any changes on examples due to a cryptic compilation error with ppx - feel free to push on my branch to fix it if you want). The internal state to implement upload and upload_part is much more easier (about maintenance - in my view) and I merged values with the same name to correctly handle multiple files.

Let me know if something is not good for you 馃憤.

@aantron
Copy link
Owner

aantron commented May 4, 2021

Just to put down in writing here messages from chat, the cryptic errors are caused by ocaml-ppx/ppxlib#221, and it should be possible to fix them, for the time being, by downgrading ppxlib to 0.20.0. Once there is a fix, I'll have Dream require the fixed version, so that this doesn't come up again (hopefully!) (#56).

@dinosaure
Copy link
Contributor Author

I forced-pushed the last version which includes headers field into the part type. I'm not a good writer for documentation (comparing to you) but I tried to explain on my way (in examples/w-stream-upload) what is going on. I think, this PR is mostly done regardless your feedback 馃憤. So if you have some concerns, feel free to tell me.

PS: the fix about ppxlib.0.20.0. works, thanks for that 馃憤.

@aantron aantron merged commit dccd5ff into aantron:master May 4, 2021
@aantron
Copy link
Owner

aantron commented May 4, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants