Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 write timeout (Ruby 2.0) #241

Closed
ampedandwired opened this issue Apr 27, 2013 · 80 comments
Closed

S3 write timeout (Ruby 2.0) #241

ampedandwired opened this issue Apr 27, 2013 · 80 comments

Comments

@ampedandwired
Copy link

This is an odd one. The following code generates a timeout for me when run with ruby 2.0.0-p0 but it's OK with 1.9.3-p385. Using aws-sdk 1.9.2.

require 'aws'
AWS.config(access_key_id: 'KEY', secret_access_key: 'SECRET')
s3 = AWS::S3.new
emr = AWS::EMR.new
bucket_name = 'some-bucket'
s3.buckets[bucket_name].exists?
emr.job_flows.each { }
puts "Uploading file"
s3.buckets[bucket_name].objects['foo.sh'].write(file: '/path/to/file.txt')

The calls to bucket.exists? and job_flows.each are significant - if you remove either of these the write succeeds.

The error generated:

Uploading file
/home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/core/client.rb:360:in `return_or_raise': Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (AWS::S3::Errors::RequestTimeout)
    from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/core/client.rb:461:in `client_request'
    from (eval):3:in `put_object'
    from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/s3/s3_object.rb:1669:in `write_with_put_object'
    from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.9.2/lib/aws/s3/s3_object.rb:608:in `write'
    from awstest.rb:15:in `<main>'
@ampedandwired
Copy link
Author

Actually it turns out that the failure is somewhat intermittent. After leaving my computer idling for a while the above script actually works a few times before starting to fail again. Once it starts failing it fails pretty consistently.

Note that it always succeeds on ruby 1.9, so this is definitely something 2.0 related.

@trevorrowe
Copy link
Member

Is this the entire script or are you doing anything else (threading perhaps)?

@ampedandwired
Copy link
Author

Hi Trevor, that's the entire script. This is on a Linux system if that's significant. I'll try and replicate it on another box when I get some time to see if it's a local oddity.

@ampedandwired
Copy link
Author

OK, so I can replicate this on an EC2 Ubuntu Server 12.04 instance set up as follows:

sudo apt-get update
sudo apt-get install git 
sudo apt-get install build-essential readline-dev libssl-dev libxml2-dev libxslt1-dev
git clone git://github.com/sstephenson/rbenv.git ~/.rbenv
git clone https://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(rbenv init -)"' >> ~/.bashrc
bash -l
rbenv install 2.0.0-p0
rbenv local 2.0.0-p0
gem install aws-sdk
ruby awstest.rb

It doesn't happen as frequently as it does on my local box, but it does happen. Most times it happens when I rerun after leaving the box idle for a few minutes.

Edit: now I've managed to get it into a state where it's happening pretty much every time I run it.

@deckchairhq
Copy link

I've been receiving this error alot, it turned out for me it was threading (too many concurrent CPU heavy jobs in my worker queue).

I do wonder if this may have had a hand in some of the errors I logged whilst only having one concurrent job queue (one thread per process) but they certainly weren't as common as you've mentioned. I'll be doing a big upload again very soon and will carefully monitor the failures.

One of the differences I can see between some of my code and yours is that i'm parsing in my own IO object (built on top of StringIO with some convenience methods to implement tempfiles if a path is requested). You could attempt to read the file into memory as a string if space allows and parse that in, which would remove any potential issues caused by the file from the equation, everything else in your case is vanilla, so if it's not your file it likely is the library.

G

@ampedandwired
Copy link
Author

The error still occurs even if you write a simple string instead of a file, so it doesn't seem to be related to the file being uploaded at all.

@schmielson
Copy link

Also experiencing intermittent failures. Currently just using the following workaround in place of calls to S3Object's write method:

  # path - The String path of the file to be uploaded.
  # s3_obj - The S3Object handle to be written to.
  def upload_to_s3(path, s3_obj)
    retries = S3CONFIG[:upload_retries]
    begin
      s3_obj.write(File.open(path, 'rb', :encoding => 'BINARY'))
    rescue => ex
      retries -= 1
      if retries > 0
        puts "ERROR during S3 upload: #{ex.inspect}. Retries: #{S3CONFIG[:upload_retries] - retries}"
        retry
      else
        # oh well, we tried...
        raise
      end
    end
  end

@trevorrowe
Copy link
Member

@ampedandwired The error message you are getting, Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (AWS::S3::Errors::RequestTimeout) indicates that Amazon S3 is closing the http connection because it waited to long for data. This is caused when a request indicates the content-length will by X bytes, but then sends fewer. S3 keeps waiting for the remainder bytes that aren't coming and then eventually kills the connection.

You mentioned you were able to reproduce this issue using strings (and not just files). It also does not appear to be threading related (your example does not create multiple threads).

I'm not able to reproduce this issue locally. It could be helpful if you could enable some detailed logging so I can get some more insight into what is going on. Can you try this:

# when creating s3, enable a few loggers
s3 = AWS::S3.new(:logger => Logger.new($stdout), :http_wire_trace => true)

I'm hoping this will help shed some light on what is happening so we can get this fixed. Thanks!

@ampedandwired
Copy link
Author

Here's a full trace from two consecutive tests, the first one succeeded the second did not:

$ ruby test.rb
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "GET /?versioning HTTP/1.1\r\nContent-Type: \r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:02 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "x-amz-request-id: xxxxxxxxxx\r\n"
-> "Date: Thu, 16 May 2013 03:38:06 GMT\r\n"
-> "Transfer-Encoding: chunked\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
-> "71\r\n"
reading 113 bytes...
-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<VersioningConfiguration xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"/>"
read 113 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
-> "\r\n"
Conn keep-alive
I, [2013-05-16T13:38:05.396659 #12764]  INFO -- : [AWS S3 200 2.933784 0 retries] get_bucket_versioning(:bucket_name=>"xxxxx")  

Bucket exists: true
Uploading file
Conn close because of keep_alive_timeout
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "PUT /foo HTTP/1.1\r\nContent-Type: \r\nContent-Length: 19\r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:07 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "x-amz-request-id: xxxxxxxxxx\r\n"
-> "Date: Thu, 16 May 2013 03:38:11 GMT\r\n"
-> "ETag: \"8622b9718771d75e07734684d6efa1dd\"\r\n"
-> "Content-Length: 0\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
reading 0 bytes...
-> ""
read 0 bytes
Conn keep-alive
I, [2013-05-16T13:38:10.843515 #12764]  INFO -- : [AWS S3 200 3.394982 0 retries] put_object(:bucket_name=>"xxxxx",:content_length=>19,:data=>#<StringIO:0x007feabcba6dd8>,:key=>"foo")  




$ ruby test.rb
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "GET /?versioning HTTP/1.1\r\nContent-Type: \r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:12 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "x-amz-request-id: xxxxxxxxxx\r\n"
-> "Date: Thu, 16 May 2013 03:38:16 GMT\r\n"
-> "Transfer-Encoding: chunked\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
-> "71\r\n"
reading 113 bytes...
-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<VersioningConfiguration xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"/>"
read 113 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
-> "\r\n"
Conn keep-alive
I, [2013-05-16T13:38:15.724986 #12863]  INFO -- : [AWS S3 200 2.919649 0 retries] get_bucket_versioning(:bucket_name=>"xxxxx")  

Bucket exists: true
Uploading file
<- "PUT /foo HTTP/1.1\r\nContent-Type: \r\nContent-Length: 19\r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:17 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
Conn close because of error end of file reached, and retry
opening connection to xxxxx.s3.amazonaws.com:443...
opened
starting SSL for xxxxx.s3.amazonaws.com:443...
SSL established
<- "PUT /foo HTTP/1.1\r\nContent-Type: \r\nContent-Length: 19\r\nUser-Agent: aws-sdk-ruby/1.10.0 ruby/2.0.0 x86_64-linux\r\nDate: Thu, 16 May 2013 03:38:17 GMT\r\nAuthorization: AWS XXXXXXXXX:xxxxxxxxxx=\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nHost: xxxxx.s3.amazonaws.com\r\n\r\n"
-> "HTTP/1.1 400 Bad Request\r\n"
-> "x-amz-request-id: 9606878C35487466\r\n"
-> "x-amz-id-2: xxxxxxxxxx\r\n"
-> "Content-Type: application/xml\r\n"
-> "Transfer-Encoding: chunked\r\n"
-> "Date: Thu, 16 May 2013 03:38:40 GMT\r\n"
-> "Connection: close\r\n"
-> "Server: AmazonS3\r\n"
-> "\r\n"
-> "15c\r\n"
reading 348 bytes...
-> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>9606878C35487466</RequestId><HostId>xxxxxxxxxx</HostId></Error>"
read 348 bytes
reading 2 bytes...
-> "\r\n"
read 2 bytes
-> "0\r\n"
-> "\r\n"
Conn close
I, [2013-05-16T13:38:40.555424 #12863]  INFO -- : [AWS S3 400 22.893523 0 retries] put_object(:bucket_name=>"xxxxx",:content_length=>19,:data=>#<StringIO:0x007fbe45eaad98>,:key=>"foo") AWS::S3::Errors::RequestTimeout Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

/home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/core/client.rb:360:in `return_or_raise': Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (AWS::S3::Errors::RequestTimeout)
  from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/core/client.rb:461:in `client_request'
  from (eval):3:in `put_object'
  from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/s3/s3_object.rb:1704:in `write_with_put_object'
  from /home/charles/.rbenv/versions/2.0.0-p0/gemsets/awstest/gems/aws-sdk-1.10.0/lib/aws/s3/s3_object.rb:608:in `write'
  from test.rb:9:in `<main>'

The code being run is as follows:

require 'aws'
s3 = AWS::S3.new(:logger => Logger.new($stdout), :http_wire_trace => true)
emr = AWS::EMR.new
bucket_name = 'xxxxx'
puts "Bucket exists: #{s3.buckets[bucket_name].exists?}"
emr.job_flows.each { |jf| puts jf.name }
puts "Uploading file"
s3.buckets[bucket_name].objects['foo'].write("I'm a little teapot")

@ampedandwired
Copy link
Author

In terms of replicating the problem, did you try running it on an EC2 Ubuntu instance? I was able to replicate the problem there on a clean box (using the setup documented in my comment above), so hopefully you'll be able to duplicate my results that way.

@trevorrowe
Copy link
Member

I'm just spun up a clean EC2 Ubuntu instance and I'm going to try to replicate the issue again.

@trevorrowe
Copy link
Member

I'm still yet unable to reproduce the issue. Here are the exact steps I took

  • Started a fresh EC2 Instance (AWS.ec2.instances.create(:image_id => 'ami-856f02ec', :key_name => '...', :security_groups => ['...'])) -- this is a 64 bit EBS backed instance running Ubuntu 12.04.2 LTS (Precise Pangolin)
  • SSH'd into the instance and ran the following commands:
    • sudo apt-get update
    • sudo apt-get install git
    • sudo apt-get install build-essential libreadline6 libreadline6-dev libssl-dev libxml2-dev libxslt1-dev (readline-dev was unavaialble, so used libreadline6-dev)
    • git clone git://github.com/sstephenson/rbenv.git ~/.rbenv
    • git clone https://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
    • echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
    • echo 'eval "$(rbenv init -)"' >> ~/.bashrc
    • bash -l
    • rbenv install 2.0.0-p0
    • rbenv local 2.0.0-p0
    • gem install aws-sdk (this installed v1.10.0)
    • exported my credentials to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
    • cat > awstest.rb (pasted the following code)
require 'aws'
s3 = AWS::S3.new(:logger => Logger.new($stdout), :http_wire_trace => true)
emr = AWS::EMR.new
bucket_name = 'aws-sdk'
puts "Bucket exists: #{s3.buckets[bucket_name].exists?}"
emr.job_flows.each { |jf| puts jf.name }
puts "Uploading file"
s3.buckets[bucket_name].objects['foo'].write("I'm a little teapot")

I've run this 50+ times with a few pauses. I will come back to this after I've let the instance idle.

@psanford
Copy link

We also ran into this problem after upgrading to ruby-2.0.0-p0. We saw a significant increase in s3 upload reties immediately after the upgrade. It doesn't seem to be related to the data as our retries eventually succeed.

This test case will fail for me sometimes on ruby-2.0:

require 'aws'

s3 = AWS::S3.new(
  :access_key_id     => ENV['AWSID'],
  :secret_access_key => ENV['AWSKEY'],
)

bucket = ENV['BUCKET']

pid = Process.pid
size = 1024 * 1024 * 25

0.upto(100).each do |i|
  name = "#{pid}.#{i}"

  puts "generating #{name}"
  File.open('/dev/urandom', 'rb') do |urand|
    File.open("/tmp/#{name}", 'wb') do |out|
      out.write(urand.read(size))
    end
  end

  puts "uploading #{name}"
  File.open("/tmp/#{name}", 'rb') do |f|
    s3.buckets[bucket].objects[name].write(f,
      :multipart_threshold => 1024*1024*20,
    )
  end

  puts "finished #{name}"
end

@lsegal
Copy link
Contributor

lsegal commented May 19, 2013

Since this seems to be specific to Ruby 2.0, I wonder if those who are experiencing this issue on p0 can also reproduce it on 2.0.0-p195 which was released last week. There may have been some related changes that could have fixed this issue.

@ampedandwired
Copy link
Author

Some further notes:

  • I can readily replicate the problem using the exact steps and test code posted by Trevor above. I am using AMI ami-04ea7a3e (Ubuntu 12.04 Server) on a t1.micro instance in the Sydney region.
  • The region of the S3 bucket seems to make a difference for me. I was only able to produce the error when using a bucket in the Sydney region. Other regions (standard, California, Singapore) worked fine.
  • Using Ruby 2.0.0-p195 gives the same result
  • Using a different AWS account and newly created bucket gives the same result

@lsegal
Copy link
Contributor

lsegal commented May 20, 2013

I wonder what regions everybody else is using?

@uberllama
Copy link

Hi guys. Not sure if this is related, or a separate issue: thoughtbot/paperclip#751 (comment)

Those of us using Paperclip have been encountering timeouts caused the aws-sdk gem after version 1.6.2. This is a major issue, and one that's been around for some time. I don't know if there has been any coordination on fixing it yet.

@trevorrowe
Copy link
Member

@uberllama Thank you for looping us in. I'll chime in on the paperclip thread. After a quick scan of the other issue, they do not appear to be related, but I could be wrong.

@endymonium
Copy link

I think I got the same issue, my test case is to upload a bunch (~40) of smaller files:

  • Heroku + Ruby 1.9.3: does work
  • Heroku + Ruby 2.0.0: does not work and gives a timeout, most of the time after the 2nd file
  • local + Ruby 2.0.0: does work

I had the error also locally, but increasing the timeouts using

AWS.config(:http_open_timeout => 120, :http_read_timeout => 120)

helped. The curious thing is, that on Heroku the error clearly comes before the timeout runs out.

I'll now test what happens if I exchange aws-sdk with some other s3 upload gem (fog).

@endymonium
Copy link

Ok, if I exchange aws-sdk with fog, it works for me also on Heroku + Ruby 2.0.0

@frogandcode
Copy link

We're seeing this bug as well, it's causing some of our backup scripts to fail on occasion. Like a few others have mentioned above, this is a script that has run without incident for a few years, but started failing in the past week. The failures coincide with our move to Ruby 2.0. (Also: Though we use Paperclip elsewhere, this script has nothing to do with Paperclip.)

The timeout errors are sporadic, and during a given backup, one or more files may be successfully copied while later a file fails to be copied due to the timeout error. They seem to happen when there are multiple files being transferred and a particular file in the succession of writes happens to take a bit longer. I've done some logging during these writes and though my sample size is a bit small, it appears that writes where the connection is open for 20 seconds are failing, even though our timeout thresholds are well above that. Some sample output from the script with instrumentation for debugging is included below. Notable is this section:

Writing to S3: searches/production-searches-2013-06-05.txt.bz2
Elapsed: 21.825005663

which is where the timeout error occurs. So far (admittedly, in a small sample of trials), this same script works fine as long as the elapsed time stays under 20 seconds. I'll do some more experimenting and post an update if we glean new info, but I'm wondering if there's some reason, in Ruby 2.0, the http_read_timeout (or is there another setting in play when doing writes?) is actually 20 seconds, even though it is configured at a value well above that.

Here's the output:

irb(main):001:0> Backup.new.backup_application_logs
Total start: 2013-06-06 11:58:10 -0400
Start: 2013-06-06 11:58:10 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: ad_events/production-ad_events-2013-06-05.txt.bz2
Elapsed: 0.512415023
Start: 2013-06-06 11:58:10 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: attachment_views/production-attachment_views-2013-06-05.txt.bz2
Elapsed: 2.644124356
Start: 2013-06-06 11:58:13 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: profile_views/production-profile_views-2013-06-05.txt.bz2
Elapsed: 6.695234678
Start: 2013-06-06 11:58:20 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: profile_clicks/production-profile_clicks-2013-06-05.txt.bz2
Elapsed: 0.631335565
Start: 2013-06-06 11:58:20 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: project_views/production-project_views-2013-06-05.txt.bz2
Elapsed: 0.672840081
Start: 2013-06-06 11:58:21 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: project_clicks/production-project_clicks-2013-06-05.txt.bz2
Elapsed: 0.167696896
Start: 2013-06-06 11:58:21 -0400
http_open_timeout: 120
http_read_timeout: 120
http_idle_timeout: 60
http_wire_trace: true
Writing to S3: searches/production-searches-2013-06-05.txt.bz2
Elapsed: 21.825005663
Total elapsed: 33.149053171
AWS::S3::Errors::RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/core/client.rb:360:in `return_or_raise'
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/core/client.rb:461:in `client_request'
    from (eval):3:in `put_object'
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/s3/s3_object.rb:1704:in `write_with_put_object'
    from /var/www/vhosts/dribbble.com/rails/shared/bundle/ruby/2.0.0/gems/aws-sdk-1.11.0/lib/aws/s3/s3_object.rb:608:in `write'

@gotwalt
Copy link

gotwalt commented Jun 6, 2013

Experiencing this on Ruby-2.0.0p195. Same symptoms - intermittent massive write timeout.

@phstc
Copy link

phstc commented Jun 14, 2013

I'm also experiencing this issue on ruby 2.0.0p195. It was working fine on 1.9.3p194.

ruby 2.0.0p195 (2013-05-14 revision 40734) [i686-linux]

@tristandunn
Copy link

@frogandcode suggested trying to establish a new connection per upload, so I switched our backup to use the method below. We haven't noticed any issues so far and it appears some writes may even be faster than before. I wouldn't consider it the ultimate solution, but it might helpful if you're writing important backups.

def write(name, file)
  @bucket.objects[name].write(file)
ensure
  # Ensure the HTTP pool is emptied after each write.
  AWS.config.http_handler.pool.empty!
end

@esilverberg
Copy link

+1

Today I just deployed a ruby 2.0.0p195 upgrade and immediately started encountering this issue with aws sdk 1.11.3. It is an intermittent error, but I do so many S3 writes that I see it every few minutes

@deckchairhq
Copy link

I've reverted to the following nested retries until we can get a fix. Retrying 4 times catches 100% of the failures (~20k uploads per day);

      # Dirty dirty dirty(S3Bug)..
      begin
        tempobject.rewind
        obj = s3_bucket.objects[ uid ].write( tempobject )
      rescue Exception => e
        puts "Upload Failed once.."

        begin
          tempobject.rewind
          obj = s3_bucket.objects[ uid ].write( tempobject )
        rescue Exception => e
          puts "Upload Failed twice.."

          begin
            tempobject.rewind
            obj = s3_bucket.objects[ uid ].write( tempobject )
          rescue Exception => e
            puts "Upload Failed three times.."

            tempobject.rewind
            obj = s3_bucket.objects[ uid ].write( tempobject )
          end
        end
      end

@trevorrowe
Copy link
Member

@thinkgareth Could you try to test out the master branch and see if that resolves your issue (without the nested 4x retry block)? I believe this might resolve your issue.

@dytsai
Copy link

dytsai commented Jun 29, 2013

I have exactly the same issue here. I tried all different methods (extend timeout, empty http pool etc.) and seems that using File or Pathname instead of IO stream may help a bit and the error rate decreases. However, no matter what I did, the intermittent errors never disappear. I am using Heroku, Ruby 2.0, and latest Sinatra, Unicorn and AWS SDK.

@trevorrowe
Copy link
Member

@cdunn I'm reading through the code where request_transport is retried. There appears to be a bug in their retry logic. When they call req.exec the second time, no, attempt is made to rewind the body stream before.

Simply put, the first request succeeds in sending some bytes, but not all before it encounters an error. Net::HTTP decides to retry the request but fails to rewind the body stream, causing fewer than all of the bytes to be sent as indicated by the content-length header. S3 keeps waiting for additional bytes, but never gets them, and then fails the request.

The SDK's retry logic always rewinds the body stream before it attempt to resend a request. Net::HTTP make no such attempt. I'm going to try to file this as a bug report and see if we can't get this fixed.

@trevorrowe trevorrowe reopened this Apr 25, 2014
@tcassanego
Copy link

The workaround proposed by @cdunn works for me to remove the timeout, but I still get tons of retries. They most often come from the EOF error. Oddly enough, running on a single thread produces the most retries, while they decrease as I ramp up to 5 threads. Seems like this would point to it not being congestion related...

@pickerflicker
Copy link

I ended up using 'fog' gem to upload to S3 and I get no more timeouts or retries.

@trevorrowe
Copy link
Member

@pickerflicker Ruby Net::HTTP changed between 1.9.3 and 2.0 in a way that causes this error to occur. In 1.9.3, when the connection fails or times out, the SDK retries the error. In Ruby 2.0.0 Net::HTTP attempts to retry the failed request, but in a broken way. It retries the PUT request without first rewinding the body stream. This is okay if the request payload is a string, but the Ruby SDK uses IO objects for streaming requests.

Users of 1.9.3 are likely not reporting issues as their requests are succeeding with then SDK retries. The timeout issue happens likely (this I have difficulty verifying) when S3 closes a long running HTTP session. This may be due to congestion, or latent network issues, its hard to say.

You are not seeing this issue with fog probably for 2 reasons. First, fog uses excon for http requests and not Net::HTTP. Second, it does not use persistent HTTP connections by default. A new connection is established (and closed) for each request. You might experience the infrequent delay of s3 closing a connection if you enable persistent connections, but you likely would not experience the 2nd issue, as excon would not retry your request without rewinding the stream.

There are a number of possible work-arounds. I'm trying to get a bug report and fix put together for Net::HTTP. Until then, you can disable retries of PUT requests, following @cdunn's suggestion:

Net::HTTP::IDEMPOTENT_METHODS_.delete("PUT")

Also, it would be possible to put together a simple Excon handler for aws-sdk. A simple implementation would not use persistent connections and should avoid the problems you are experiencing. I'll try to update this thread with a sample implementation.

@pickerflicker
Copy link

Thanks for clarifying @trevorrowe. I wasn't exactly sure why Fog gem worked.

I'm using Ruby 2.1.0. My original issue was not an actual error, since the second PUT request would always be quick and successful. My issue was the occasional 20s timeout is just too long and ruins user experience. I've tried adding the net:http workaround into my Rails project initializers, but I still saw these occasional timeouts.

@trevorrowe
Copy link
Member

@pickerflicker Here is a sample http handler that uses Excon:

require 'excon'

class ExconHandler

  def handle(req, resp, &read_block)
    options = {
      :method => req.http_method.downcase.to_sym,
      :path => req.path,
      :query => req.querystring,
      :headers => req.headers,
      :body => req.body_stream,
    }
    options[:response_block] = read_block if block_given?
    connection = Excon.new(req.endpoint)
    excon_resp = connection.request(options)
    resp.status = excon_resp.status.to_i
    resp.headers = excon_resp.headers
    resp.body = excon_resp.body unless block_given?
  end

end

AWS.config(http_handler: ExconHandler.new)

I would be interested to know if using this resolves your issue. There are a few missing features from this implementation, that all could be addressed:

  • Custom SSL certs
  • Proxy support
  • Custom timeouts

These could all be addressed without much effort.

@pickerflicker
Copy link

@trevorrowe Sorry, there's no good way for me to try this out now. This timeout issue didn't show up in QA(probably not enough requests.) and I can't experiment on production.

@trevorrowe
Copy link
Member

@pickerflicker What sort of frequency were you seeing the timeouts + retries in production previously?

@pickerflicker
Copy link

@trevorrowe The timeouts would at least once every 2 hours. I would see a long POST request take a little over 20 seconds in New Relic. These requests all had the same signature: first PUT request timeout after 20 seconds, followed by one retry which was always successful and quick.

@mikhailov
Copy link

The problem didn't solve yet, please let us know the progress please.

@trevorrowe
Copy link
Member

We've been able to resolve these issues in the V2 Ruby SDK by patching Net::HTTP. These patches are:

  • Fix buggy expect 100-continue behavior
  • Fix retry logic for "idempotent" requests

You can see the patches in the v2 SDK here:

https://github.com/aws/aws-sdk-core-ruby/blob/master/aws-sdk-core/lib/seahorse/client/net_http/patches.rb

@mikhailov If you are interested to test out a patch, I'd be willing to backport these into a branch of the v1 Ruby SDK.

@mikhailov
Copy link

@trevorrowe that's great, we can test the patch, thanks! A question about s3_endpoint, we still use it, but it seems to be deprecated in favor of region. Can we still use s3_endpoint syntax with aws-sdk-ruby v2?

@lsegal
Copy link
Contributor

lsegal commented Oct 13, 2014

@mikhailov you can use endpoint instead of s3_endpoint as mentioned in that blog post.

trevorrowe added a commit that referenced this issue Oct 13, 2014
Net::HTTP has a few bugs regarding Expect-100 continue and how it
retries idempotent HTTP requests. This commit enables two patches
that address these issues. Without these patches, the user may
experience S3 timeout issues.

Unfortunately this built-in retry logic is buggy; When *some* of the
body is sent before a network failure, it fails to rewind the body
before sending the retry. This results in a partial body, that is
fewer than "Content-Length" bytes. S3 eventually times out the
request waiting for all of the bytes.

This commit makes two major changes to avoid this scenario:

* It enables the expect-100 continue HTTP logic BEFORE sending
  bytes to Amazon S3. This eliminates most of the common failure
  scenarios where bytes are sent and the request broken by the
  remote end (especially for auth-erors).

* It patches the Net::HTTP transport method to rewind the
  request body stream if the body responds to #rewind before
  attempting a retry.

Previously, the 100-continue bheavior and the 2nd patch were
opt-in, but this problematic enough for Ruby 2 users that this
enables this by default.

See #241
@trevorrowe
Copy link
Member

@mikhailov I've pushed a branch, s3-write-timeout-patch, which back ports some changes from the V2 SDK into this repo. Please feel free to give this a spin and let me know if it resolves the issue you are experiencing.

@uberllama
Copy link

Thanks for your hard work over such a long period of time on this! Very exciting.

@gonzedge
Copy link

👏

@trevorrowe
Copy link
Member

I should add a caveat that the fix only applies for Ruby versions 1.9.3+. Ruby 1.8.7 and Ruby 1.9.2 do not support HTTP Expect 100-continue and therefore will continue to have their connections closed by Amazon S3 under certain circumstances.

@justincase
Copy link

Ruby 1.8.7 is retired and [1.9.3 will be EOL soon](Support for Ruby version 1.9.3 will end on February 23, 2015).

🎆 🚢 👍

janko added a commit to janko/image_processing that referenced this issue Oct 17, 2015
For some reason aws-sdk is throwing timeout errors, and this was a known
problem (aws/aws-sdk-ruby#241), which is
currently supposed to be fixed. Anyway, for some reason this happens
only with Tempfiles that were changed by ImageMagick, I tried calling
\#fsync but it didn't work. So, to avoid this bug, we simply reopen the
tempfile to get a fresh file descriptor again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests