Skip to content
This repository has been archived by the owner on Jul 19, 2024. It is now read-only.

SDK v10: should support chunked encoding / unknown content length #336

Closed
rocketraman opened this issue Jul 5, 2018 · 5 comments
Closed

Comments

@rocketraman
Copy link

rocketraman commented Jul 5, 2018

I've been using SDK v10, and other than an issue with deps requiring a local build from source, have been very pleased. I've been able to integrate Kotlin coroutines with the RX Flowable<ByteBuffer> in the SDK v10 to create a completely async flow of bytes end-to-end through a system.

However I note what seems to be a pretty big gap: when doing async flows like this, it is quite likely that on a PUT / upload, the Content-Length of the data is not known in advance -- the bytes are written to the wire as they are created.

Currently, length / the Content-Length header on the upload API call are required fields, which means that the data being uploaded needs to be buffered, which means that we lose most of the scalability benefit of a completely async flow.

It seems like this is a limitation of the underlying Azure Storage REST API as opposed to the java SDK (https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob shows Content-Length as a required field) but I really think this needs to be improved. The API should support upload with a chunked transfer encoding.

@rickle-msft
Copy link
Contributor

Hi, @rocketraman. Thank you so much for your continued feedback! We are very pleased to hear that you have had success in building async workflows! :)

You are correct that this is unfortunately a limitation of the service itself, which does not currently accept chunked encoding. However, there is discussion about adding an uploadFromStream api, which should not require length to be known a priori. You can see the implementation in the Go SDK herefor reference. We will likely try to follow a similar design/interface/pattern, but we have not yet started the development or design for this feature as it would be implemented with Rx.

We will surely take your feedback into consideration when prioritizing our upcoming features for this library.

@rickle-msft
Copy link
Contributor

We discussed this a bit more, and I think I misunderstood a bit. Even with this uploadFromStream API, we would have to buffer at least a block's worth of data to set the content length header. This API would just abstract that detail away from the user. Server-side support is needed in order to avoid buffering all together.

@rocketraman
Copy link
Author

rocketraman commented Jul 6, 2018

@rickle-msft It looks like the Go SDK uses the PutBlock API to do this -- I suppose that the java SDK could also take each ByteBuffer from the rx Flowable and use PutBlock to send it to the server. However, this seems horribly inefficient in comparison to a single PUT, with a chunked transfer encoding. Furthermore, if you wish to reduce the number of PutBlock calls by sending larger blocks, then you do have to buffer inside the SDK, as you said -- so now you have a lib which uses more memory and is also more complex to maintain.

Given you've closed this issue, I'm guessing your team has decided to wait until server-side support is available for a chunked transfer encoding, at which point the existing v10 API (less the length parameter) will be completely sufficient, as each ByteBuffer obtained from the Flowable can simply be sent as a chunk. Is that right?

FYI: I created this Azure storage feedback item: https://feedback.azure.com/forums/217298-storage/suggestions/34758091-support-chunked-transfer-encoding-on-blob-put. I'd appreciate a voice of support for this internally. Thank you.

@rocketraman
Copy link
Author

And also: should this issue be left open as a reminder / placeholder / known issue for future searchers, until chunked transfer encoding server-side support is available, and this can be implemented?

@rickle-msft
Copy link
Contributor

@rocketraman You are correct about why I closed the issue. Apologies if that was a bit abrupt. I am not sure when the team that builds the service has plans to support chunked encoding, but I have passed on the message to the team that makes such decisions and upvoted your feature request.

You are also correct that such will be the target design for our support of this feature: content streaming without the need to know the content-length first.

I don't think we need to keep this issue open because it's not currently an issue this library has any means of resolving, and I can direct anyone else looking for chunked-encoding support to this discussion. As soon as the service does support chunked encoding, we will open another work item to support it and track it that way.

Thank you again for your feedback and your interest in the new library. I will do my best to keep you posted on any news regarding this feature as it comes to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants