Streaming large requests #1897
Comments
I am also interested in this. |
Looking deeper at RequestStream, it looks like the primary reason this stream is seekable is for parsing multipart requests. It seems if the buffering of the request stream could be lazy, then it would solve the problem. However, it looks like the RequestStream object itself is writable primarily for test cases. Being writable feels wrong. Would a largish change to make the RequestStream lazy and readonly be welcome at all? |
@adamhathcock It's hard to say what the implications is without seeing any code. I assume there's a reason it's done as it is today. @grumpydev? |
It's as it is for multipart requests where you need to be able to seek back and forth, as long as it switches to a filestream I'm not sure I understand the problem - you'd never stream directly to the "destination file", you'd want to make sure the request was complete before you potentially overwrite what was already there? If what you're building is essentially just a proxy for a request to s3 why not just upload straight to s3 with a secure url? Failing that I'd probably fall back on something more lightweight like potentially a raw OWIN middleware for handling this - using a web framework, even a light one like nancy, as a proxy seems pretty heavy to me. |
S3 is just an example really. Though we do proxy files to various things. Another scenario is that where we have a continuous stream that is an array of JSON objects. On the response side, we gather the relevant data create the object and send it on the response with zero buffering as there can be a lot of this data. We'd want to be able for a client to send a continuous stream for the server to process. The server would only have to validate a single JSON object at a time. If the stream dies early, it doesn't invalidate the successful objects received. I guess I wanted to stick with Nancy as I can have non-buffered responses and code centered around Nancy and other things. I wanted the request side to be able to look the same. |
If you want to have a stab at making it lazy (making sure multipart and the test framework doesn't break) then we'll certainly bring it in if there's no downsides, just seemed a bit of an odd requirement to me :) |
Here's another use case for this requirement: I'm working on a API that among other things accepts potentially large (occasionally multi-gigabyte) files for import. Various headers (including authentication headers) are checked before reading the request body, and may result in the request being rejected. In the case it's rejected, I'd like to do that before the server responds HTTP 100 (Continue), in the case that the client expects Continue. So in this case, I don't really care (much) that the stream gets buffered at least to disk once it starts being read, but I'd like a chance to reject the request before any of the body is read, since I don't want to waste (substantial) bandwidth , IO, and filesystem cache for a request that's just going to be rejected. So far as I can tell, there is no way to do that since it looks like the whole stream is buffered even during BeforeRequest in the pipeline. I see the "PRs accepted" suggestion, and I might take you up on that, but I'm noting the use case here in case I don't. |
I'm currently testing my change: master...adamhathcock:non_buffered_requests Since I don't buffer the request body unless someone is sending multipart or a form, then you can reject the request and not read the body. |
I'm actually accepting multipart since files uploaded from a browser come in that way, but this is a great starting point thanks. I'll probably go with a fork where the builtin multipart parsing is broken if it really needs the buffering. I think the current approach for multipart goes too far trying to abstract away that what is coming in is a stream in a particular order, and it would have been fine to have to request each part in order (as they're read from the stream). That said, I can very well understand that sort of change couldn't be merged as it would break any consumers of the current approach. |
It would be easy to make things lazier and it would only parse Form or Multipart content on access. This still buffers everything in memory on access though. However, it would take care of your use-case. |
It still switches to the disk buffer if the stream is large, right? |
Yup. |
I had to add a static config option to stop always looking in the body of a POST for the http method override flag. Also, mono doesn't build and I don't know why. |
Just checked. The mono build does work now. |
Hi Unfortunately it doesn't work for multipart request changes here In two words you can create own NancyHttpRequestHandler and NancyHandler, then you need to implement class that will wrap request stream and will report that it can seek(Otherwise nancy will create writeable stream and will copy request stream data in it) and request Content-Type must be different from content types that nancy parsing automatically, for instance application/octet-stream, in this case nancy will not try to parse it and your module route handler will be invoked so you can process request stream as you want. |
I'm implementing a file upload and the pay load goes directly to another service (S3 in this case.)
I was noticing everything gets buffered and it looks like Nancy always buffers requests. I found this issue: #1014 And it looks like the conclusion was just a bug but no real answer other than my choice is between a MemoryStream or a FileStream.
The only reason I see that a seekable stream is required is for the multipart processing. Is that correct? Is there a way to enable a non-seeking stream path for a request or is everything built around having a seekable request?
Thanks.
The text was updated successfully, but these errors were encountered: