Skip to content
This repository has been archived by the owner on Jan 24, 2021. It is now read-only.

Streaming large requests #1897

Open
adamhathcock opened this issue Apr 16, 2015 · 15 comments
Open

Streaming large requests #1897

adamhathcock opened this issue Apr 16, 2015 · 15 comments

Comments

@adamhathcock
Copy link

I'm implementing a file upload and the pay load goes directly to another service (S3 in this case.)

I was noticing everything gets buffered and it looks like Nancy always buffers requests. I found this issue: #1014 And it looks like the conclusion was just a bug but no real answer other than my choice is between a MemoryStream or a FileStream.

The only reason I see that a seekable stream is required is for the multipart processing. Is that correct? Is there a way to enable a non-seeking stream path for a request or is everything built around having a seekable request?

Thanks.

@alanedwardes
Copy link

I am also interested in this.

@adamhathcock
Copy link
Author

Looking deeper at RequestStream, it looks like the primary reason this stream is seekable is for parsing multipart requests. It seems if the buffering of the request stream could be lazy, then it would solve the problem.

However, it looks like the RequestStream object itself is writable primarily for test cases. Being writable feels wrong. Would a largish change to make the RequestStream lazy and readonly be welcome at all?

@khellang
Copy link
Member

@adamhathcock It's hard to say what the implications is without seeing any code. I assume there's a reason it's done as it is today. @grumpydev?

@grumpydev
Copy link
Member

It's as it is for multipart requests where you need to be able to seek back and forth, as long as it switches to a filestream I'm not sure I understand the problem - you'd never stream directly to the "destination file", you'd want to make sure the request was complete before you potentially overwrite what was already there? If what you're building is essentially just a proxy for a request to s3 why not just upload straight to s3 with a secure url? Failing that I'd probably fall back on something more lightweight like potentially a raw OWIN middleware for handling this - using a web framework, even a light one like nancy, as a proxy seems pretty heavy to me.

@adamhathcock
Copy link
Author

S3 is just an example really. Though we do proxy files to various things.

Another scenario is that where we have a continuous stream that is an array of JSON objects. On the response side, we gather the relevant data create the object and send it on the response with zero buffering as there can be a lot of this data. We'd want to be able for a client to send a continuous stream for the server to process. The server would only have to validate a single JSON object at a time. If the stream dies early, it doesn't invalidate the successful objects received.

I guess I wanted to stick with Nancy as I can have non-buffered responses and code centered around Nancy and other things. I wanted the request side to be able to look the same.

@grumpydev
Copy link
Member

If you want to have a stab at making it lazy (making sure multipart and the test framework doesn't break) then we'll certainly bring it in if there's no downsides, just seemed a bit of an odd requirement to me :)

@bscSCORM
Copy link

Here's another use case for this requirement: I'm working on a API that among other things accepts potentially large (occasionally multi-gigabyte) files for import. Various headers (including authentication headers) are checked before reading the request body, and may result in the request being rejected. In the case it's rejected, I'd like to do that before the server responds HTTP 100 (Continue), in the case that the client expects Continue.

So in this case, I don't really care (much) that the stream gets buffered at least to disk once it starts being read, but I'd like a chance to reject the request before any of the body is read, since I don't want to waste (substantial) bandwidth , IO, and filesystem cache for a request that's just going to be rejected.

So far as I can tell, there is no way to do that since it looks like the whole stream is buffered even during BeforeRequest in the pipeline.

I see the "PRs accepted" suggestion, and I might take you up on that, but I'm noting the use case here in case I don't.

@adamhathcock
Copy link
Author

I'm currently testing my change: master...adamhathcock:non_buffered_requests

Since I don't buffer the request body unless someone is sending multipart or a form, then you can reject the request and not read the body.

@bscSCORM
Copy link

I'm actually accepting multipart since files uploaded from a browser come in that way, but this is a great starting point thanks. I'll probably go with a fork where the builtin multipart parsing is broken if it really needs the buffering.

I think the current approach for multipart goes too far trying to abstract away that what is coming in is a stream in a particular order, and it would have been fine to have to request each part in order (as they're read from the stream). That said, I can very well understand that sort of change couldn't be merged as it would break any consumers of the current approach.

@adamhathcock
Copy link
Author

It would be easy to make things lazier and it would only parse Form or Multipart content on access. This still buffers everything in memory on access though. However, it would take care of your use-case.

@bscSCORM
Copy link

It still switches to the disk buffer if the stream is large, right?

@adamhathcock
Copy link
Author

Yup.

@adamhathcock
Copy link
Author

I had to add a static config option to stop always looking in the body of a POST for the http method override flag.

Also, mono doesn't build and I don't know why.

@adamhathcock
Copy link
Author

Just checked. The mono build does work now.

@vadimline
Copy link

Hi
I found workaround how to upload large files if you host nancy using ASP.NET

Unfortunately it doesn't work for multipart request

changes here

In two words you can create own NancyHttpRequestHandler and NancyHandler, then you need to implement class that will wrap request stream and will report that it can seek(Otherwise nancy will create writeable stream and will copy request stream data in it) and request Content-Type must be different from content types that nancy parsing automatically, for instance application/octet-stream, in this case nancy will not try to parse it and your module route handler will be invoked so you can process request stream as you want.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants