New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boto3 S3 StreamingBody().read() reads once and returns nothing after that #564

Closed
awlamb opened this Issue Mar 25, 2016 · 9 comments

Comments

Projects
None yet
7 participants
@awlamb

awlamb commented Mar 25, 2016

>>> a = client.get_object(Bucket='imgtest',Key='testimage1.jpg')
>>> a['Body'].read()
b'...\xadk\xc9,\xda\xe7\xcb\xb7$\x91\xf7\xb3\xd3>\xd5V...'
>>> a['Body'].read()
b''

complete bytes removed for brevity. I get an object, and read it. Then I read it again, but no bytes are returned.

If this stream acts as a normal file IO stream, how can I seek to the beginning of the stream? seek() does not seem to be a method on the streamingBody object.

@awlamb awlamb closed this Mar 25, 2016

@awlamb awlamb reopened this Mar 25, 2016

@kyleknap

This comment has been minimized.

Show comment
Hide comment
@kyleknap

kyleknap Mar 25, 2016

Member

The class is described here. We will look to see if we can get this ported over or linked in the boto3 docs.

As seen in the docs, if you call read() with no amount specified, you read all of the data. So if you call read() again, you will get no more bytes.

There is also no seek() available on the stream because we are streaming directly from the server. The only way we could add a seek() method is to store all of the data in memory, which is not a great idea as body could be GB's large.

Member

kyleknap commented Mar 25, 2016

The class is described here. We will look to see if we can get this ported over or linked in the boto3 docs.

As seen in the docs, if you call read() with no amount specified, you read all of the data. So if you call read() again, you will get no more bytes.

There is also no seek() available on the stream because we are streaming directly from the server. The only way we could add a seek() method is to store all of the data in memory, which is not a great idea as body could be GB's large.

@haizaar

This comment has been minimized.

Show comment
Hide comment
@haizaar

haizaar Oct 27, 2016

Is there any particular reason that this is still an open ticket?

haizaar commented Oct 27, 2016

Is there any particular reason that this is still an open ticket?

@awlamb awlamb closed this Oct 27, 2016

@danielmorozoff

This comment has been minimized.

Show comment
Hide comment
@danielmorozoff

danielmorozoff Aug 20, 2017

Is there a reason why the StreamingBody, is not seekable?
This becomes quite problematic when attempting to download portions of large files asynchronously. And what is the recommended way to do this?

danielmorozoff commented Aug 20, 2017

Is there a reason why the StreamingBody, is not seekable?
This becomes quite problematic when attempting to download portions of large files asynchronously. And what is the recommended way to do this?

@robehickman

This comment has been minimized.

Show comment
Hide comment
@robehickman

robehickman Dec 5, 2017

@danielmorozoff 'get_object' supports a range parameter.

client.get_object(Bucket=bucket, Key=key, Range='bytes={}-{}'.format(amount_read, amount_read + chunk_size))

robehickman commented Dec 5, 2017

@danielmorozoff 'get_object' supports a range parameter.

client.get_object(Bucket=bucket, Key=key, Range='bytes={}-{}'.format(amount_read, amount_read + chunk_size))

@alanjds

This comment has been minimized.

Show comment
Hide comment
@alanjds

alanjds Apr 5, 2018

One way to allow .seek() is by botocore' StreamingResponse to receive the _raw_stream opener (factory?), not the realized object. Then seeking to 0 would be just restarting the _raw_stream.

See: https://github.com/boto/botocore/blob/master/botocore/response.py#L42

alanjds commented Apr 5, 2018

One way to allow .seek() is by botocore' StreamingResponse to receive the _raw_stream opener (factory?), not the realized object. Then seeking to 0 would be just restarting the _raw_stream.

See: https://github.com/boto/botocore/blob/master/botocore/response.py#L42

@ryanermita

This comment has been minimized.

Show comment
Hide comment
@ryanermita

ryanermita Aug 17, 2018

is there any work around to use seek in StreamBody?

ryanermita commented Aug 17, 2018

is there any work around to use seek in StreamBody?

@ryanermita

This comment has been minimized.

Show comment
Hide comment
@ryanermita

ryanermita Aug 17, 2018

I solved this by using _raw_stream as per @alanjds comment above.
is this a good solution? or is there a better one?

raw_stream = codecs.getreader('utf-8-sig')(temp_file[u'Body'])._raw_stream.read().decode("UTF8") 
stream_csv = io.StringIO(raw_stream, newline=None)
stream_csv.seek(0)

ryanermita commented Aug 17, 2018

I solved this by using _raw_stream as per @alanjds comment above.
is this a good solution? or is there a better one?

raw_stream = codecs.getreader('utf-8-sig')(temp_file[u'Body'])._raw_stream.read().decode("UTF8") 
stream_csv = io.StringIO(raw_stream, newline=None)
stream_csv.seek(0)
@alanjds

This comment has been minimized.

Show comment
Hide comment
@alanjds

alanjds Aug 17, 2018

@ryanermita I was thinking in a way to seek and not putting the whole file in memory.

If you have no problem in filling the memory with the file, a cleaner way is to just StringIO(streaming_body.read()), then seek the StringIO as you are already doing.

alanjds commented Aug 17, 2018

@ryanermita I was thinking in a way to seek and not putting the whole file in memory.

If you have no problem in filling the memory with the file, a cleaner way is to just StringIO(streaming_body.read()), then seek the StringIO as you are already doing.

@ryanermita

This comment has been minimized.

Show comment
Hide comment
@ryanermita

ryanermita Aug 17, 2018

I will try this one, thank you @alanjds 👍

ryanermita commented Aug 17, 2018

I will try this one, thank you @alanjds 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment