-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for pipeable streams #588
Comments
@h2non streams are supported in the API, just not via var stream = fs.createReadStream('image.jpg');
s3.putObject({Bucket: ..., Body: stream}, function (err, data) {}); This format is more conducive to many of the other features provided by the SDK, namely parameter validation and bound parameters. Another important point is that S3 streams must be rewindable for retry support and signing in certain regions, which is not well supported by Node.js streams. That said, supporting |
Thanks for the reply. Sounds reasonable. I see your point and it could be definitively great introducing support for a pipeable interface. The unique challenge here is about keeping the existent interface consistency. As far I've seen after a brief code review, putObject() (and indeed almost all of the service API methods) returns an instance of Request, so maybe extending the prototype chain with the required I wrote a trivial example to demonstrate a possible approach (but breaking the current interface): It uses a small module which acts as wrapper. |
Unfortunately we can't really consider any new functionality to this SDK that introduces breaking changes. Also, we would still have to answer questions of how this approach would deal with retryability and multiple passes at the body payload (for signing and other customizations). Without an answer for those concerns, I don't think we would recommend this approach for general usage. That said, it's possible this kind of a project might just be best suited as a "third-party library" that extends the SDK within the npm ecosystem. If it became a popular library, we could consider it for inclusion in the next major version of the SDK when we can make breaking changes. How does that sound? |
Thanks for the fast reply. Let's separate things. Forget about the module. I just made made it to satisfy my needs (which are partially related to the topic we're actually discussing here, but not specifically for this purpose). Ideally no third-party library should be used to accomplish this feature. Never break interfaces. 100% agree. This is a must software principle. Note that I was just pointing into a representative example, not a real production focused implementation, and should be considered only as an inspirational start point. I can see your point about extending the SDK with a sort of plugins or middleware, but I believe that this is simply not required for the feature we're discussing here since it's a low-level feature and probably very coupled to the internal state logic like to be fully externalized. The point I was talking about is, indeed, the idea of implementing a stream Obviously my knowledge about the internal SDK core is very limited, so what's your analysis about this and possible constraints from a more low-level point of view? Thanks |
Relying on a stream-like interface would break the current contract, since "retries" are part of the SDK contract (this is just one example). Streams are inherently incompatible with retryability since they are not retryable. More on this below.
Node.js streams are uni-directional streams that have no native concept of seeking or length. Basically they simply read or write data and that's it. Unfortunately this is not sufficient for the functionality we expose. Seeking is necessary for most of the SDKs concerns, namely signing payloads, checksum validation, and potentially retrying failed requests, since in all of these cases we need to potentially read the stream multiple times (at least 2 times in order to sign and send the payload). Length is also a required for payloads to Amazon S3. Using bare streams would disable any of the above mentioned features that the SDK natively supports, all of which are core features of the SDK.
This isn't quite feasible. Many payloads, especially for interfaces that we want to use streaming operations for (namely S3), can be extremely large (500MB, 1GB, or larger), and it would not be reasonable (or possible) to buffer these streams entirely into memory in order to operate on them. Buffering into memory also defeats the purpose of streaming, i.e., to keep memory usage low. Basically, if we exposed a generic streaming interface, it would come with the caveat that many of the core SDK features would be turned off for this stream, including but not limited to signature version 4 signing.
Effectively the analysis has always been that Node.js streams are an insufficient abstraction to support our use case. We need to support seekable streams-- something we use in all other SDKs-- so that we can support reading a payload multiple times without buffering the entire payload into memory. Reading the payload multiple times is necessary for signing, generating extra checksums (a 2nd read pass in input), sending, verifying the response body for checksums, and possibly retrying if a request error occurred, which potentially requires another pass at signing the request, and definitely another one at sending it. In total, we may read through a payload as many as 15 times in a single request cycle (in the case of 3 retries)-- this is simply not possible with the standard Node.js stream interface. We work around this with file streams (fs.createReadStream) via a hack that allows us to re-open the stream, but this approach would not be applicable to the general case, and so we would never be able to promote this generic interface for the general case. Hope that explains some things! |
Absolutely!
You're totally right. It's simply non-viable. Indeed V8 max allocated memory is limited to 1GB.
In node/io.js there're duplex streams, so technically they're bi-directional if we consider it as a whole abstract data type entity. For instance, network sockets uses both directions.
That's true, but you can discovery the byte length via the Tending to simplify a bit the things. I think that currently we have: And the idea is to have something like: Thank you for the reply! PS: To be honest I feel like the potential benefit of this feature doesn't justify the invested time discussing/thinking about it, but it's definitively interesting digging into it. |
Hi, +1 for adding piping support I, too, was excited at first to see the upload method taking readable streams as parameters, and immediately changed the implementation of one of our applications to use that. fyi: the application takes a readable stream and pipes it to several writable streams at the (virtually) same time, writing files, to s3, and other destinations (all implemented in Transform streams). Anyway, as you probably guessed, stuff didn't work as expected, since the aws-sdk internally consumes the buffers (you are adding a However,
I am sorry to say, but I have to call bull$hit on this :-) This functionality is what Transform streams can be used for, after all. We've been doing this w/ "uploadPart" internally so far (buffer the part you want to retry / sign / etc). Piping streams is one of the most powerful and useful features node / iojs have to offer, I reckon all aws-sdk-js users would benefit greatly if this was supported. Cheerio! |
Btw: a quick workaround for piping a readable stream into several writables including handing over the readable stream to the "upload" method is to pipe the original readable stream into another readable (Transform) stream, then hand over the reference to the upload method (e.g. the stream.PassThrough class):
This way the buffer of s3ref will get consumed w/o interfering with the original readable stream. |
@achselschweisz Your workaround work like a charm. I am using it to to upload from one host (with the I am not understanding the necessity of to pipe the readable stream into the passThrough though. request.get('somehostname')
.on('response', response => upload({Bucket: bucket, Key: 'test-hub/UnityHubSetup.exe', Body: response}) |
@Mic75 Sorry it's been some time since your question:
So basically by using your own stream (or the pass-through) which you can pipe into the aws-sdk, you will still have control over it despite the fact that the stream on the aws-sdk's end is being consumed. |
Greetings! We’re closing this issue because it has been open a long time and hasn’t been updated in a while and may not be getting the attention it deserves. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to comment or open a new issue. |
I don't understand why this was not supported yet. Streams today are to node/io.js like classes are to C++.
It's especially useful for S3:
Definitively there aren't plans to support it?
The text was updated successfully, but these errors were encountered: