Skip to content

Feature request: Retrieving data from response headers when streaming S3 objects #1913

@janko

Description

@janko

In my open source project I'm using Aws::S3::Object#get with a block to stream the S3 object on-demand:

chunks = object.enum_for(:get, **options)
chunks #=> #<Enumerator>
chunks.next # first chunk of data
chunks.next # next chunk of data

I'm wrapping this functionality into a custom IO-like object, so that people can stream using the IO#read semantics, which under the hood retrieves subsequent chunks of data when needed.

I'd like the content length to be known at the start of streaming, which would be returned when the user called #size. Currently I have to make an additional #head_object request to retrieve Aws::S3::Object#content_length. However, the #get_object request that Object#get makes already retrieves response headers before it starts streaming, which I think hold all the same information that response headers for a #head_object call would hold.

So, it would be convenient if Object#content_length, #content_type and other data was filled in on Object#get as soon as the response headers are received (before the response body is read). That way I could avoid the additional HEAD request to S3 for this use case, to retrieve data that we technically already have, but which can only be retrieved once the whole S3 object has been downloaded.

In code, I would like this behaviour:

object.get do |chunk|
  # this currently makes an additional HEAD request, but doesn't have to,
  # because it could fill this info in from the response headers of the
  # `#get_object` call that we already retrieved at this point
  object.content_length
end

I think it should be possible to somehow use the notification about response headers that Seahorse transmits:

resp.signal_headers(status_code, headers)

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestA feature should be added or improved.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions