S3 write timeout (Ruby 2.0) #241

ampedandwired · 2013-04-27T01:08:03Z

This is an odd one. The following code generates a timeout for me when run with ruby 2.0.0-p0 but it's OK with 1.9.3-p385. Using aws-sdk 1.9.2.

require 'aws'
AWS.config(access_key_id: 'KEY', secret_access_key: 'SECRET')
s3 = AWS::S3.new
emr = AWS::EMR.new
bucket_name = 'some-bucket'
s3.buckets[bucket_name].exists?
emr.job_flows.each { }
puts "Uploading file"
s3.buckets[bucket_name].objects['foo.sh'].write(file: '/path/to/file.txt')

The calls to bucket.exists? and job_flows.each are significant - if you remove either of these the write succeeds.

The error generated:

Uploading file
/home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/core/client.rb:360:in `return_or_raise': Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (AWS::S3::Errors::RequestTimeout)
    from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/core/client.rb:461:in `client_request'
    from (eval):3:in `put_object'
    from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/s3/s3_object.rb:1669:in `write_with_put_object'
    from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/s3/s3_object.rb:608:in `write'
    from awstest.rb:15:in `<main>'

The text was updated successfully, but these errors were encountered:

ampedandwired · 2013-04-27T07:01:05Z

Actually it turns out that the failure is somewhat intermittent. After leaving my computer idling for a while the above script actually works a few times before starting to fail again. Once it starts failing it fails pretty consistently.

Note that it always succeeds on ruby 1.9, so this is definitely something 2.0 related.

trevorrowe · 2013-04-28T18:31:48Z

Is this the entire script or are you doing anything else (threading perhaps)?

ampedandwired · 2013-04-29T01:21:33Z

Hi Trevor, that's the entire script. This is on a Linux system if that's significant. I'll try and replicate it on another box when I get some time to see if it's a local oddity.

ampedandwired · 2013-04-29T02:58:06Z

OK, so I can replicate this on an EC2 Ubuntu Server 12.04 instance set up as follows:

sudo apt-get update
sudo apt-get install git 
sudo apt-get install build-essential readline-dev libssl-dev libxml2-dev libxslt1-dev
git clone git://github.com/sstephenson/rbenv.git ~/.rbenv
git clone https://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(rbenv init -)"' >> ~/.bashrc
bash -l
rbenv install 2.0.0-p0
rbenv local 2.0.0-p0
gem install aws-sdk
ruby awstest.rb

It doesn't happen as frequently as it does on my local box, but it does happen. Most times it happens when I rerun after leaving the box idle for a few minutes.

Edit: now I've managed to get it into a state where it's happening pretty much every time I run it.

deckchairhq · 2013-04-29T10:56:51Z

I've been receiving this error alot, it turned out for me it was threading (too many concurrent CPU heavy jobs in my worker queue).

I do wonder if this may have had a hand in some of the errors I logged whilst only having one concurrent job queue (one thread per process) but they certainly weren't as common as you've mentioned. I'll be doing a big upload again very soon and will carefully monitor the failures.

One of the differences I can see between some of my code and yours is that i'm parsing in my own IO object (built on top of StringIO with some convenience methods to implement tempfiles if a path is requested). You could attempt to read the file into memory as a string if space allows and parse that in, which would remove any potential issues caused by the file from the equation, everything else in your case is vanilla, so if it's not your file it likely is the library.

G

ampedandwired · 2013-04-29T11:12:31Z

The error still occurs even if you write a simple string instead of a file, so it doesn't seem to be related to the file being uploaded at all.

schmielson · 2013-05-13T20:23:37Z

Also experiencing intermittent failures. Currently just using the following workaround in place of calls to S3Object's write method:

  # path - The String path of the file to be uploaded.
  # s3_obj - The S3Object handle to be written to.
  def upload_to_s3(path, s3_obj)
    retries = S3CONFIG[:upload_retries]
    begin
      s3_obj.write(File.open(path, 'rb', :encoding => 'BINARY'))
    rescue => ex
      retries -= 1
      if retries > 0
        puts "ERROR during S3 upload: #{ex.inspect}. Retries: #{S3CONFIG[:upload_retries] - retries}"
        retry
      else
        # oh well, we tried...
        raise
      end
    end
  end

trevorrowe · 2013-05-14T15:56:51Z

@ampedandwired The error message you are getting, Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (AWS::S3::Errors::RequestTimeout) indicates that Amazon S3 is closing the http connection because it waited to long for data. This is caused when a request indicates the content-length will by X bytes, but then sends fewer. S3 keeps waiting for the remainder bytes that aren't coming and then eventually kills the connection.

You mentioned you were able to reproduce this issue using strings (and not just files). It also does not appear to be threading related (your example does not create multiple threads).

I'm not able to reproduce this issue locally. It could be helpful if you could enable some detailed logging so I can get some more insight into what is going on. Can you try this:

# when creating s3, enable a few loggers
s3 = AWS::S3.new(:logger => Logger.new($stdout), :http_wire_trace => true)

I'm hoping this will help shed some light on what is happening so we can get this fixed. Thanks!

ampedandwired · 2013-05-16T03:47:08Z

Here's a full trace from two consecutive tests, the first one succeeded the second did not:

$ ruby test.rb
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "GET /?versioning HTTP/1.1\r\nContent-Type: \r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:02 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "x-amz-request-id: xxxxxxxxxx\r\n"
-> "Date: Thu, 16 May 2013 03:38:06 GMT\r\n"
-> "Transfer-Encoding: chunked\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
-> "71\r\n"
reading 113 bytes...
-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<VersioningConfiguration xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"/>"
read 113 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
-> "\r\n"
Conn keep-alive
I, [2013-05-16T13:38:05.396659 #12764]  INFO -- : [AWS S3 200 2.933784 0 retries] get_bucket_versioning(:bucket_name=>"xxxxx")  

Bucket exists: true
Uploading file
Conn close because of keep_alive_timeout
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "PUT /foo HTTP/1.1\r\nContent-Type: \r\nContent-Length: 19\r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:07 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "x-amz-request-id: xxxxxxxxxx\r\n"
-> "Date: Thu, 16 May 2013 03:38:11 GMT\r\n"
-> "ETag: \"8622b9718771d75e07734684d6efa1dd\"\r\n"
-> "Content-Length: 0\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
reading 0 bytes...
-> ""
read 0 bytes
Conn keep-alive
I, [2013-05-16T13:38:10.843515 #12764]  INFO -- : [AWS S3 200 3.394982 0 retries] put_object(:bucket_name=>"xxxxx",:content_length=>19,:data=>#<StringIO:0x007feabcba6dd8>,:key=>"foo")  




$ ruby test.rb
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "GET /?versioning HTTP/1.1\r\nContent-Type: \r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:12 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "x-amz-request-id: xxxxxxxxxx\r\n"
-> "Date: Thu, 16 May 2013 03:38:16 GMT\r\n"
-> "Transfer-Encoding: chunked\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
-> "71\r\n"
reading 113 bytes...
-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<VersioningConfiguration xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"/>"
read 113 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
-> "\r\n"
Conn keep-alive
I, [2013-05-16T13:38:15.724986 #12863]  INFO -- : [AWS S3 200 2.919649 0 retries] get_bucket_versioning(:bucket_name=>"xxxxx")  

Bucket exists: true
Uploading file
<- "PUT /foo HTTP/1.1\r\nContent-Type: \r\nContent-Length: 19\r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:17 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
Conn close because of error end of file reached, and retry
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "PUT /foo HTTP/1.1\r\nContent-Type: \r\nContent-Length: 19\r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:17 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 400 Bad Request\r\n"
-> "x-amz-request-id: 9606878C35487466\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "Content-Type: application/xml\r\n"
-> "Transfer-Encoding: chunked\r\n"
-> "Date: Thu, 16 May 2013 03:38:40 GMT\r\n"
-> "Connection: close\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
-> "15c\r\n"
reading 348 bytes...
-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>9606878C35487466</RequestId><HostId>xxxxxxxxxx</HostId></Error>"
read 348 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
-> "\r\n"
Conn close
I, [2013-05-16T13:38:40.555424 #12863]  INFO -- : [AWS S3 400 22.893523 0 retries] put_object(:bucket_name=>"xxxxx",:content_length=>19,:data=>#<StringIO:0x007fbe45eaad98>,:key=>"foo") AWS::S3::Errors::RequestTimeout Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

/home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/core/client.rb:360:in `return_or_raise': Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (AWS::S3::Errors::RequestTimeout)
  from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/core/client.rb:461:in `client_request'
  from (eval):3:in `put_object'
  from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/s3/s3_object.rb:1704:in `write_with_put_object'
  from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/s3/s3_object.rb:608:in `write'
  from test.rb:9:in `<main>'

The code being run is as follows:

require 'aws'
s3 = AWS::S3.new(:logger => Logger.new($stdout), :http_wire_trace => true)
emr = AWS::EMR.new
bucket_name = 'xxxxx'
puts "Bucket exists: #{s3.buckets[bucket_name].exists?}"
emr.job_flows.each { |jf| puts jf.name }
puts "Uploading file"
s3.buckets[bucket_name].objects['foo'].write("I'm a little teapot")

ampedandwired · 2013-05-16T03:55:15Z

In terms of replicating the problem, did you try running it on an EC2 Ubuntu instance? I was able to replicate the problem there on a clean box (using the setup documented in my comment above), so hopefully you'll be able to duplicate my results that way.

trevorrowe · 2013-05-16T16:14:07Z

I'm just spun up a clean EC2 Ubuntu instance and I'm going to try to replicate the issue again.

trevorrowe · 2013-05-16T16:55:23Z

I'm still yet unable to reproduce the issue. Here are the exact steps I took

Started a fresh EC2 Instance (AWS.ec2.instances.create(:image_id => 'ami-856f02ec', :key_name => '...', :security_groups => ['...'])) -- this is a 64 bit EBS backed instance running Ubuntu 12.04.2 LTS (Precise Pangolin)
SSH'd into the instance and ran the following commands:
- sudo apt-get update
- sudo apt-get install git
- sudo apt-get install build-essential libreadline6 libreadline6-dev libssl-dev libxml2-dev libxslt1-dev (readline-dev was unavaialble, so used libreadline6-dev)
- git clone git://github.com/sstephenson/rbenv.git ~/.rbenv
- git clone https://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
- echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
- echo 'eval "$(rbenv init -)"' >> ~/.bashrc
- bash -l
- rbenv install 2.0.0-p0
- rbenv local 2.0.0-p0
- gem install aws-sdk (this installed v1.10.0)
- exported my credentials to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
- cat > awstest.rb (pasted the following code)

require 'aws'
s3 = AWS::S3.new(:logger => Logger.new($stdout), :http_wire_trace => true)
emr = AWS::EMR.new
bucket_name = 'aws-sdk'
puts "Bucket exists: #{s3.buckets[bucket_name].exists?}"
emr.job_flows.each { |jf| puts jf.name }
puts "Uploading file"
s3.buckets[bucket_name].objects['foo'].write("I'm a little teapot")

I've run this 50+ times with a few pauses. I will come back to this after I've let the instance idle.

psanford · 2013-05-19T20:52:34Z

We also ran into this problem after upgrading to ruby-2.0.0-p0. We saw a significant increase in s3 upload reties immediately after the upgrade. It doesn't seem to be related to the data as our retries eventually succeed.

This test case will fail for me sometimes on ruby-2.0:

require 'aws'

s3 = AWS::S3.new(
  :access_key_id     => ENV['AWSID'],
  :secret_access_key => ENV['AWSKEY'],
)

bucket = ENV['BUCKET']

pid = Process.pid
size = 1024 * 1024 * 25

0.upto(100).each do |i|
  name = "#{pid}.#{i}"

  puts "generating #{name}"
  File.open('/dev/urandom', 'rb') do |urand|
    File.open("/tmp/#{name}", 'wb') do |out|
      out.write(urand.read(size))
    end
  end

  puts "uploading #{name}"
  File.open("/tmp/#{name}", 'rb') do |f|
    s3.buckets[bucket].objects[name].write(f,
      :multipart_threshold => 1024*1024*20,
    )
  end

  puts "finished #{name}"
end

lsegal · 2013-05-19T21:05:43Z

Since this seems to be specific to Ruby 2.0, I wonder if those who are experiencing this issue on p0 can also reproduce it on 2.0.0-p195 which was released last week. There may have been some related changes that could have fixed this issue.

ampedandwired · 2013-05-20T03:16:31Z

Some further notes:

I can readily replicate the problem using the exact steps and test code posted by Trevor above. I am using AMI ami-04ea7a3e (Ubuntu 12.04 Server) on a t1.micro instance in the Sydney region.
The region of the S3 bucket seems to make a difference for me. I was only able to produce the error when using a bucket in the Sydney region. Other regions (standard, California, Singapore) worked fine.
Using Ruby 2.0.0-p195 gives the same result
Using a different AWS account and newly created bucket gives the same result

lsegal · 2013-05-20T03:25:19Z

I wonder what regions everybody else is using?

uberllama · 2013-05-21T15:14:19Z

Hi guys. Not sure if this is related, or a separate issue: thoughtbot/paperclip#751 (comment)

Those of us using Paperclip have been encountering timeouts caused the aws-sdk gem after version 1.6.2. This is a major issue, and one that's been around for some time. I don't know if there has been any coordination on fixing it yet.

trevorrowe · 2013-05-21T20:15:08Z

@uberllama Thank you for looping us in. I'll chime in on the paperclip thread. After a quick scan of the other issue, they do not appear to be related, but I could be wrong.

endymonium · 2013-05-29T08:54:47Z

I think I got the same issue, my test case is to upload a bunch (~40) of smaller files:

Heroku + Ruby 1.9.3: does work
Heroku + Ruby 2.0.0: does not work and gives a timeout, most of the time after the 2nd file
local + Ruby 2.0.0: does work

I had the error also locally, but increasing the timeouts using

AWS.config(:http_open_timeout => 120, :http_read_timeout => 120)

helped. The curious thing is, that on Heroku the error clearly comes before the timeout runs out.

I'll now test what happens if I exchange aws-sdk with some other s3 upload gem (fog).

endymonium · 2013-05-29T10:11:58Z

Ok, if I exchange aws-sdk with fog, it works for me also on Heroku + Ruby 2.0.0

frogandcode · 2013-06-06T16:45:48Z

We're seeing this bug as well, it's causing some of our backup scripts to fail on occasion. Like a few others have mentioned above, this is a script that has run without incident for a few years, but started failing in the past week. The failures coincide with our move to Ruby 2.0. (Also: Though we use Paperclip elsewhere, this script has nothing to do with Paperclip.)

The timeout errors are sporadic, and during a given backup, one or more files may be successfully copied while later a file fails to be copied due to the timeout error. They seem to happen when there are multiple files being transferred and a particular file in the succession of writes happens to take a bit longer. I've done some logging during these writes and though my sample size is a bit small, it appears that writes where the connection is open for 20 seconds are failing, even though our timeout thresholds are well above that. Some sample output from the script with instrumentation for debugging is included below. Notable is this section:

Writing to S3: searches/production-searches-2013-06-05.txt.bz2
Elapsed: 21.825005663

which is where the timeout error occurs. So far (admittedly, in a small sample of trials), this same script works fine as long as the elapsed time stays under 20 seconds. I'll do some more experimenting and post an update if we glean new info, but I'm wondering if there's some reason, in Ruby 2.0, the http_read_timeout (or is there another setting in play when doing writes?) is actually 20 seconds, even though it is configured at a value well above that.

Here's the output:

irb(main):001:0> Backup.new.backup_application_logs
Total start: 2013-06-06 11:58:10 -0400
Start: 2013-06-06 11:58:10 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: ad_events/production-ad_events-2013-06-05.txt.bz2
Elapsed: 0.512415023
Start: 2013-06-06 11:58:10 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: attachment_views/production-attachment_views-2013-06-05.txt.bz2
Elapsed: 2.644124356
Start: 2013-06-06 11:58:13 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: profile_views/production-profile_views-2013-06-05.txt.bz2
Elapsed: 6.695234678
Start: 2013-06-06 11:58:20 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: profile_clicks/production-profile_clicks-2013-06-05.txt.bz2
Elapsed: 0.631335565
Start: 2013-06-06 11:58:20 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: project_views/production-project_views-2013-06-05.txt.bz2
Elapsed: 0.672840081
Start: 2013-06-06 11:58:21 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: project_clicks/production-project_clicks-2013-06-05.txt.bz2
Elapsed: 0.167696896
Start: 2013-06-06 11:58:21 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: searches/production-searches-2013-06-05.txt.bz2
Elapsed: 21.825005663
Total elapsed: 33.149053171
AWS::S3::Errors::RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/core/client.rb:360:in `return_or_raise'
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/core/client.rb:461:in `client_request'
    from (eval):3:in `put_object'
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/s3/s3_object.rb:1704:in `write_with_put_object'
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/s3/s3_object.rb:608:in `write'

gotwalt · 2013-06-06T18:14:02Z

Experiencing this on Ruby-2.0.0p195. Same symptoms - intermittent massive write timeout.

phstc · 2013-06-14T16:48:08Z

I'm also experiencing this issue on ruby 2.0.0p195. It was working fine on 1.9.3p194.

ruby 2.0.0p195 (2013-05-14 revision 40734) [i686-linux]

tristandunn · 2013-06-19T16:19:12Z

@frogandcode suggested trying to establish a new connection per upload, so I switched our backup to use the method below. We haven't noticed any issues so far and it appears some writes may even be faster than before. I wouldn't consider it the ultimate solution, but it might helpful if you're writing important backups.

def write(name, file)
  @bucket.objects[name].write(file)
ensure
  # Ensure the HTTP pool is emptied after each write.
  AWS.config.http_handler.pool.empty!
end

esilverberg · 2013-06-24T17:39:39Z

+1

Today I just deployed a ruby 2.0.0p195 upgrade and immediately started encountering this issue with aws sdk 1.11.3. It is an intermittent error, but I do so many S3 writes that I see it every few minutes

deckchairhq · 2013-06-24T18:32:10Z

I've reverted to the following nested retries until we can get a fix. Retrying 4 times catches 100% of the failures (~20k uploads per day);

      # Dirty dirty dirty(S3Bug)..
      begin
        tempobject.rewind
        obj = s3_bucket.objects[ uid ].write( tempobject )
      rescue Exception => e
        puts "Upload Failed once.."

        begin
          tempobject.rewind
          obj = s3_bucket.objects[ uid ].write( tempobject )
        rescue Exception => e
          puts "Upload Failed twice.."

          begin
            tempobject.rewind
            obj = s3_bucket.objects[ uid ].write( tempobject )
          rescue Exception => e
            puts "Upload Failed three times.."

            tempobject.rewind
            obj = s3_bucket.objects[ uid ].write( tempobject )
          end
        end
      end

trevorrowe · 2013-06-24T22:46:05Z

@thinkgareth Could you try to test out the master branch and see if that resolves your issue (without the nested 4x retry block)? I believe this might resolve your issue.

dytsai · 2013-06-29T09:29:57Z

I have exactly the same issue here. I tried all different methods (extend timeout, empty http pool etc.) and seems that using File or Pathname instead of IO stream may help a bit and the error rate decreases. However, no matter what I did, the intermittent errors never disappear. I am using Heroku, Ruby 2.0, and latest Sinatra, Unicorn and AWS SDK.

trevorrowe · 2014-04-25T17:13:36Z

@cdunn I'm reading through the code where request_transport is retried. There appears to be a bug in their retry logic. When they call req.exec the second time, no, attempt is made to rewind the body stream before.

Simply put, the first request succeeds in sending some bytes, but not all before it encounters an error. Net::HTTP decides to retry the request but fails to rewind the body stream, causing fewer than all of the bytes to be sent as indicated by the content-length header. S3 keeps waiting for additional bytes, but never gets them, and then fails the request.

The SDK's retry logic always rewinds the body stream before it attempt to resend a request. Net::HTTP make no such attempt. I'm going to try to file this as a bug report and see if we can't get this fixed.

tcassanego · 2014-05-12T00:32:22Z

The workaround proposed by @cdunn works for me to remove the timeout, but I still get tons of retries. They most often come from the EOF error. Oddly enough, running on a single thread produces the most retries, while they decrease as I ramp up to 5 threads. Seems like this would point to it not being congestion related...

pickerflicker · 2014-05-12T02:35:14Z

I ended up using 'fog' gem to upload to S3 and I get no more timeouts or retries.

trevorrowe · 2014-05-12T20:45:57Z

@pickerflicker Ruby Net::HTTP changed between 1.9.3 and 2.0 in a way that causes this error to occur. In 1.9.3, when the connection fails or times out, the SDK retries the error. In Ruby 2.0.0 Net::HTTP attempts to retry the failed request, but in a broken way. It retries the PUT request without first rewinding the body stream. This is okay if the request payload is a string, but the Ruby SDK uses IO objects for streaming requests.

Users of 1.9.3 are likely not reporting issues as their requests are succeeding with then SDK retries. The timeout issue happens likely (this I have difficulty verifying) when S3 closes a long running HTTP session. This may be due to congestion, or latent network issues, its hard to say.

You are not seeing this issue with fog probably for 2 reasons. First, fog uses excon for http requests and not Net::HTTP. Second, it does not use persistent HTTP connections by default. A new connection is established (and closed) for each request. You might experience the infrequent delay of s3 closing a connection if you enable persistent connections, but you likely would not experience the 2nd issue, as excon would not retry your request without rewinding the stream.

There are a number of possible work-arounds. I'm trying to get a bug report and fix put together for Net::HTTP. Until then, you can disable retries of PUT requests, following @cdunn's suggestion:

Net::HTTP::IDEMPOTENT_METHODS_.delete("PUT")

Also, it would be possible to put together a simple Excon handler for aws-sdk. A simple implementation would not use persistent connections and should avoid the problems you are experiencing. I'll try to update this thread with a sample implementation.

pickerflicker · 2014-05-12T21:23:33Z

Thanks for clarifying @trevorrowe. I wasn't exactly sure why Fog gem worked.

I'm using Ruby 2.1.0. My original issue was not an actual error, since the second PUT request would always be quick and successful. My issue was the occasional 20s timeout is just too long and ruins user experience. I've tried adding the net:http workaround into my Rails project initializers, but I still saw these occasional timeouts.

trevorrowe · 2014-05-12T21:26:35Z

@pickerflicker Here is a sample http handler that uses Excon:

require 'excon'

class ExconHandler

  def handle(req, resp, &read_block)
    options = {
      :method => req.http_method.downcase.to_sym,
      :path => req.path,
      :query => req.querystring,
      :headers => req.headers,
      :body => req.body_stream,
    }
    options[:response_block] = read_block if block_given?
    connection = Excon.new(req.endpoint)
    excon_resp = connection.request(options)
    resp.status = excon_resp.status.to_i
    resp.headers = excon_resp.headers
    resp.body = excon_resp.body unless block_given?
  end

end

AWS.config(http_handler: ExconHandler.new)

I would be interested to know if using this resolves your issue. There are a few missing features from this implementation, that all could be addressed:

Custom SSL certs
Proxy support
Custom timeouts

These could all be addressed without much effort.

pickerflicker · 2014-05-12T22:00:33Z

@trevorrowe Sorry, there's no good way for me to try this out now. This timeout issue didn't show up in QA(probably not enough requests.) and I can't experiment on production.

trevorrowe · 2014-05-12T22:04:14Z

@pickerflicker What sort of frequency were you seeing the timeouts + retries in production previously?

pickerflicker · 2014-05-12T22:13:02Z

@trevorrowe The timeouts would at least once every 2 hours. I would see a long POST request take a little over 20 seconds in New Relic. These requests all had the same signature: first PUT request timeout after 20 seconds, followed by one retry which was always successful and quick.

See aws/aws-sdk-ruby#241

mikhailov · 2014-10-13T13:47:05Z

The problem didn't solve yet, please let us know the progress please.

trevorrowe · 2014-10-13T17:01:09Z

We've been able to resolve these issues in the V2 Ruby SDK by patching Net::HTTP. These patches are:

Fix buggy expect 100-continue behavior
Fix retry logic for "idempotent" requests

You can see the patches in the v2 SDK here:

https://github.com/aws/aws-sdk-core-ruby/blob/master/aws-sdk-core/lib/seahorse/client/net_http/patches.rb

@mikhailov If you are interested to test out a patch, I'd be willing to backport these into a branch of the v1 Ruby SDK.

mikhailov · 2014-10-13T17:23:25Z

@trevorrowe that's great, we can test the patch, thanks! A question about s3_endpoint, we still use it, but it seems to be deprecated in favor of region. Can we still use s3_endpoint syntax with aws-sdk-ruby v2?

lsegal · 2014-10-13T17:28:02Z

@mikhailov you can use endpoint instead of s3_endpoint as mentioned in that blog post.

Net::HTTP has a few bugs regarding Expect-100 continue and how it retries idempotent HTTP requests. This commit enables two patches that address these issues. Without these patches, the user may experience S3 timeout issues. Unfortunately this built-in retry logic is buggy; When *some* of the body is sent before a network failure, it fails to rewind the body before sending the retry. This results in a partial body, that is fewer than "Content-Length" bytes. S3 eventually times out the request waiting for all of the bytes. This commit makes two major changes to avoid this scenario: * It enables the expect-100 continue HTTP logic BEFORE sending bytes to Amazon S3. This eliminates most of the common failure scenarios where bytes are sent and the request broken by the remote end (especially for auth-erors). * It patches the Net::HTTP transport method to rewind the request body stream if the body responds to #rewind before attempting a retry. Previously, the 100-continue bheavior and the 2nd patch were opt-in, but this problematic enough for Ruby 2 users that this enables this by default. See #241

trevorrowe · 2014-10-13T19:36:16Z

@mikhailov I've pushed a branch, s3-write-timeout-patch, which back ports some changes from the V2 SDK into this repo. Please feel free to give this a spin and let me know if it resolves the issue you are experiencing.

uberllama · 2015-01-16T21:38:41Z

Thanks for your hard work over such a long period of time on this! Very exciting.

gonzedge · 2015-01-17T00:18:31Z

👏

trevorrowe · 2015-01-17T00:36:20Z

I should add a caveat that the fix only applies for Ruby versions 1.9.3+. Ruby 1.8.7 and Ruby 1.9.2 do not support HTTP Expect 100-continue and therefore will continue to have their connections closed by Amazon S3 under certain circumstances.

justincase · 2015-01-17T00:50:16Z

Ruby 1.8.7 is retired and [1.9.3 will be EOL soon](Support for Ruby version 1.9.3 will end on February 23, 2015).

🎆 🚢 👍

For some reason aws-sdk is throwing timeout errors, and this was a known problem (aws/aws-sdk-ruby#241), which is currently supposed to be fixed. Anyway, for some reason this happens only with Tempfiles that were changed by ImageMagick, I tried calling \#fsync but it didn't work. So, to avoid this bug, we simply reopen the tempfile to get a fresh file descriptor again.

deckchairhq mentioned this issue Apr 29, 2013

AWS.eager_autoload! throws exception #237

Closed

phstc mentioned this issue Jun 16, 2013

Sidekiq hangs while processing images sidekiq/sidekiq#984

Closed

trevorrowe reopened this Apr 25, 2014

lsegal mentioned this issue May 21, 2014

RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. aws/aws-sdk-js#281

Closed

awood45 mentioned this issue Jun 6, 2014

Net::HTTP content-length Error When Yielding to Block #557

Closed

bengarney added a commit to LoomSDK/LoomSDK that referenced this issue Jul 10, 2014

Workaround for S3 timeout

b1e4893

See aws/aws-sdk-ruby#241

jarrett mentioned this issue Sep 3, 2014

undefined method `md5' for nil:NilClass jubos/fake-s3#66

Open

anero mentioned this issue Dec 10, 2014

Addressing S3 write timeout bug. controlshift/aws-sdk-ruby#1

Merged

trevorrowe closed this as completed in 0905666 Jan 16, 2015

ncarroll-mdsol mentioned this issue Feb 19, 2015

AWS SDK V2 tests jubos/fake-s3#90

Merged

RanVaknin mentioned this issue May 28, 2024

S3 client falling over after 4-5 hours of processing 200 files per minute aws/aws-sdk-go-v2#2657

Open

2 tasks

S3 write timeout (Ruby 2.0) #241

S3 write timeout (Ruby 2.0) #241

Comments

ampedandwired commented Apr 27, 2013

ampedandwired commented Apr 27, 2013

trevorrowe commented Apr 28, 2013

ampedandwired commented Apr 29, 2013

ampedandwired commented Apr 29, 2013

deckchairhq commented Apr 29, 2013

ampedandwired commented Apr 29, 2013

schmielson commented May 13, 2013

trevorrowe commented May 14, 2013

ampedandwired commented May 16, 2013

ampedandwired commented May 16, 2013

trevorrowe commented May 16, 2013

trevorrowe commented May 16, 2013

psanford commented May 19, 2013

lsegal commented May 19, 2013

ampedandwired commented May 20, 2013

lsegal commented May 20, 2013

uberllama commented May 21, 2013

trevorrowe commented May 21, 2013

endymonium commented May 29, 2013

endymonium commented May 29, 2013

frogandcode commented Jun 6, 2013

gotwalt commented Jun 6, 2013

phstc commented Jun 14, 2013

tristandunn commented Jun 19, 2013

esilverberg commented Jun 24, 2013

deckchairhq commented Jun 24, 2013

trevorrowe commented Jun 24, 2013

dytsai commented Jun 29, 2013

trevorrowe commented Apr 25, 2014

tcassanego commented May 12, 2014

pickerflicker commented May 12, 2014

trevorrowe commented May 12, 2014

pickerflicker commented May 12, 2014

trevorrowe commented May 12, 2014

pickerflicker commented May 12, 2014

trevorrowe commented May 12, 2014

pickerflicker commented May 12, 2014

mikhailov commented Oct 13, 2014

trevorrowe commented Oct 13, 2014

mikhailov commented Oct 13, 2014

lsegal commented Oct 13, 2014

trevorrowe commented Oct 13, 2014

uberllama commented Jan 16, 2015

gonzedge commented Jan 17, 2015

trevorrowe commented Jan 17, 2015

justincase commented Jan 17, 2015