In my open source project I'm using Aws::S3::Object#get with a block to stream the S3 object on-demand:
chunks = object.enum_for(:get, **options)
chunks #=> #<Enumerator>
chunks.next # first chunk of data
chunks.next # next chunk of data
I'm wrapping this functionality into a custom IO-like object, so that people can stream using the IO#read semantics, which under the hood retrieves subsequent chunks of data when needed.
I'd like the content length to be known at the start of streaming, which would be returned when the user called #size. Currently I have to make an additional #head_object request to retrieve Aws::S3::Object#content_length. However, the #get_object request that Object#get makes already retrieves response headers before it starts streaming, which I think hold all the same information that response headers for a #head_object call would hold.
So, it would be convenient if Object#content_length, #content_type and other data was filled in on Object#get as soon as the response headers are received (before the response body is read). That way I could avoid the additional HEAD request to S3 for this use case, to retrieve data that we technically already have, but which can only be retrieved once the whole S3 object has been downloaded.
In code, I would like this behaviour:
object.get do |chunk|
# this currently makes an additional HEAD request, but doesn't have to,
# because it could fill this info in from the response headers of the
# `#get_object` call that we already retrieved at this point
object.content_length
end
I think it should be possible to somehow use the notification about response headers that Seahorse transmits:
|
resp.signal_headers(status_code, headers) |
In my open source project I'm using
Aws::S3::Object#getwith a block to stream the S3 object on-demand:I'm wrapping this functionality into a custom IO-like object, so that people can stream using the
IO#readsemantics, which under the hood retrieves subsequent chunks of data when needed.I'd like the content length to be known at the start of streaming, which would be returned when the user called
#size. Currently I have to make an additional#head_objectrequest to retrieveAws::S3::Object#content_length. However, the#get_objectrequest thatObject#getmakes already retrieves response headers before it starts streaming, which I think hold all the same information that response headers for a#head_objectcall would hold.So, it would be convenient if
Object#content_length,#content_typeand other data was filled in onObject#getas soon as the response headers are received (before the response body is read). That way I could avoid the additional HEAD request to S3 for this use case, to retrieve data that we technically already have, but which can only be retrieved once the whole S3 object has been downloaded.In code, I would like this behaviour:
I think it should be possible to somehow use the notification about response headers that Seahorse transmits:
aws-sdk-ruby/gems/aws-sdk-core/lib/seahorse/client/net_http/handler.rb
Line 80 in e5ef638