SDK v10: should support chunked encoding / unknown content length #336

rocketraman · 2018-07-05T15:49:32Z

I've been using SDK v10, and other than an issue with deps requiring a local build from source, have been very pleased. I've been able to integrate Kotlin coroutines with the RX Flowable<ByteBuffer> in the SDK v10 to create a completely async flow of bytes end-to-end through a system.

However I note what seems to be a pretty big gap: when doing async flows like this, it is quite likely that on a PUT / upload, the Content-Length of the data is not known in advance -- the bytes are written to the wire as they are created.

Currently, length / the Content-Length header on the upload API call are required fields, which means that the data being uploaded needs to be buffered, which means that we lose most of the scalability benefit of a completely async flow.

It seems like this is a limitation of the underlying Azure Storage REST API as opposed to the java SDK (https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob shows Content-Length as a required field) but I really think this needs to be improved. The API should support upload with a chunked transfer encoding.

The text was updated successfully, but these errors were encountered:

rickle-msft · 2018-07-05T19:32:05Z

Hi, @rocketraman. Thank you so much for your continued feedback! We are very pleased to hear that you have had success in building async workflows! :)

You are correct that this is unfortunately a limitation of the service itself, which does not currently accept chunked encoding. However, there is discussion about adding an uploadFromStream api, which should not require length to be known a priori. You can see the implementation in the Go SDK herefor reference. We will likely try to follow a similar design/interface/pattern, but we have not yet started the development or design for this feature as it would be implemented with Rx.

We will surely take your feedback into consideration when prioritizing our upcoming features for this library.

rickle-msft · 2018-07-05T22:59:22Z

We discussed this a bit more, and I think I misunderstood a bit. Even with this uploadFromStream API, we would have to buffer at least a block's worth of data to set the content length header. This API would just abstract that detail away from the user. Server-side support is needed in order to avoid buffering all together.

rocketraman · 2018-07-06T00:26:48Z

@rickle-msft It looks like the Go SDK uses the PutBlock API to do this -- I suppose that the java SDK could also take each ByteBuffer from the rx Flowable and use PutBlock to send it to the server. However, this seems horribly inefficient in comparison to a single PUT, with a chunked transfer encoding. Furthermore, if you wish to reduce the number of PutBlock calls by sending larger blocks, then you do have to buffer inside the SDK, as you said -- so now you have a lib which uses more memory and is also more complex to maintain.

Given you've closed this issue, I'm guessing your team has decided to wait until server-side support is available for a chunked transfer encoding, at which point the existing v10 API (less the length parameter) will be completely sufficient, as each ByteBuffer obtained from the Flowable can simply be sent as a chunk. Is that right?

FYI: I created this Azure storage feedback item: https://feedback.azure.com/forums/217298-storage/suggestions/34758091-support-chunked-transfer-encoding-on-blob-put. I'd appreciate a voice of support for this internally. Thank you.

rocketraman · 2018-07-06T00:32:47Z

And also: should this issue be left open as a reminder / placeholder / known issue for future searchers, until chunked transfer encoding server-side support is available, and this can be implemented?

rickle-msft · 2018-07-06T00:38:13Z

@rocketraman You are correct about why I closed the issue. Apologies if that was a bit abrupt. I am not sure when the team that builds the service has plans to support chunked encoding, but I have passed on the message to the team that makes such decisions and upvoted your feature request.

You are also correct that such will be the target design for our support of this feature: content streaming without the need to know the content-length first.

I don't think we need to keep this issue open because it's not currently an issue this library has any means of resolving, and I can direct anyone else looking for chunked-encoding support to this discussion. As soon as the service does support chunked encoding, we will open another work item to support it and track it that way.

Thank you again for your feedback and your interest in the new library. I will do my best to keep you posted on any news regarding this feature as it comes to me.

rickle-msft added enhancement v10 and removed enhancement v10 labels Jul 5, 2018

rickle-msft closed this as completed Jul 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK v10: should support chunked encoding / unknown content length #336

SDK v10: should support chunked encoding / unknown content length #336

rocketraman commented Jul 5, 2018 •

edited

Loading

rickle-msft commented Jul 5, 2018

rickle-msft commented Jul 5, 2018

rocketraman commented Jul 6, 2018 •

edited

Loading

rocketraman commented Jul 6, 2018

rickle-msft commented Jul 6, 2018

SDK v10: should support chunked encoding / unknown content length #336

SDK v10: should support chunked encoding / unknown content length #336

Comments

rocketraman commented Jul 5, 2018 • edited Loading

rickle-msft commented Jul 5, 2018

rickle-msft commented Jul 5, 2018

rocketraman commented Jul 6, 2018 • edited Loading

rocketraman commented Jul 6, 2018

rickle-msft commented Jul 6, 2018

rocketraman commented Jul 5, 2018 •

edited

Loading

rocketraman commented Jul 6, 2018 •

edited

Loading