Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookbook downloads fail when using Ruby AWS SDK gem (`aws-sdk`) #8466

Open
dayglojesus opened this issue May 6, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@dayglojesus
Copy link

commented May 6, 2019

Description

Overview

Chef's lib/http.rb relies on Ruby net/http's builtin retry routine. This routine and the unexpected behaviour it produces is outlined in the following links:

https://github.com/ruby/ruby/blob/trunk/lib/net/http.rb#L1527-L1532
https://engineering.wework.com/ruby-users-be-wary-of-net-http-f284747288b2
https://jvns.ca/blog/2016/03/04/whats-up-with-ruby-http-libraries/

When this native retry routine is disabled, Chef::HTTP#streaming_request will fail to retry on normally recoverable EOF errors:

#<EOFError: end of file reached>

Background

We recently introduced a new custom Ohai plugin to our deployment that exposed this behaviour. This plugin leveraged the aws-sdk gem to fetch instance metadata.

When the plugin loads and the EC2 client establishes a connection, the "seahorse" client activates a patch on the native Ruby net/http ...

Net::HTTP::IDEMPOTENT_METHODS_.clear

https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-core/lib/seahorse/client/net_http/patches.rb#L11-L27

Chef Version

I think this will be reproducable on ALL version of Chef, but was primarily tested on Chef version 14.8.12.

Platform Version

ALL -- reproducable on Windows, RHEL and Debian variants

Replication Case

This behaviour is difficult to isolate and reproduce. It seems to only occur when the resultant cookbook downloads reach a point of critical mass.

I can reproduce it with certain combinations of cookbooks, but not others -- the larger the runlist, the better the chance of seeing an EOF error.

I have not been able to fully discern the common denominator.

  1. setup a new Chef node
  2. apply a very large runlist and allow it to finish
  3. install the aws-sdk gem: bundle install -N aws-sdk
  4. install this ohai plugin to /etc/chef/ohai/plugin/dummy.rb
    Ohai.plugin(:Aws) do
      provides 'aws'
      collect_data(:default) do
        require 'aws-sdk'
        aws Mash.new
        Aws::EC2::Resource.new(region: region)
      end
    end
  5. test it: ohai -d /etc/chef/ohai/plugins aws
  6. purge the cookbooks cache: rm -rf /etc/chef/cache/cookbooks
  7. re-apply the very large runlist

Client Output

The only relevant output on a standard run is this:

  ================================================================================
  Error Syncing Cookbooks:
  ================================================================================

  Authentication Error:
  ---------------------
  Received an EOF on transport socket.  This almost always indicates a network
  error external to chef-client.  Some causes include:

    - Blocking ICMP Dest Unreachable (breaking Path MTU Discovery)
    - IPsec or VPN tunnelling / TCP Encapsulation MTU issues
    - Jumbo frames configured only on one side (breaking Path MTU)
    - Jumbo frames configured on a LAN that does not support them
    - Proxies or Load Balancers breaking large POSTs
    - Broken TCP offload in network drivers/hardware

...

  System Info:
  ------------
  chef_version=14.8.12
  platform=centos
  platform_version=7.6.1810
  ruby=ruby 2.4.5p335 (2018-10-18 revision 65137) [x86_64-linux]
  program_name=/bin/chef-client
  executable=/usr/bin/chef-client
  • Debug run produces nothing relevant.
  • Subsequent Chef runs work (presumably because the cookbooks are already partially downloaded)

Stacktrace

No stack trace file is produced when this exeception occurs, but by inserting a pry binding into the rescue block of Chef::HTTP#streaming_request, you can see what's happeneing...

[1] pry(#<Chef::ServerAPI>)> /usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:199:in `input_readline'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:185:in `block in read_line'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:130:in `handle_read_errors'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:171:in `read_line'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:98:in `read'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:68:in `block in repl'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:67:in `loop'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:67:in `repl'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:38:in `block in start'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/input_lock.rb:59:in `__with_ownership'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/input_lock.rb:77:in `with_ownership'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:38:in `start'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/repl.rb:13:in `start'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/pry_class.rb:200:in `start'
/usr/lib64/ruby/gems/2.4.0/gems/pry-0.12.2/lib/pry/core_extensions.rb:43:in `pry'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/http.rb:268:in `rescue in streaming_request'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/http.rb:219:in `streaming_request'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/cookbook/synchronizer.rb:298:in `download_file'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/cookbook/synchronizer.rb:274:in `sync_file'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/cookbook/synchronizer.rb:161:in `block (2 levels) in sync_cookbooks'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/util/threaded_job_queue.rb:52:in `block (3 levels) in process'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/util/threaded_job_queue.rb:50:in `loop'
/usr/lib64/ruby/gems/2.4.0/gems/chef-14.8.12/lib/chef/util/threaded_job_queue.rb:50:in `block (2 levels) in process'
[2] pry(#<Chef::ServerAPI>)> e
=> #<EOFError: end of file reached>

Once the "seahorse" patch is applied and the native Ruby http.rb retry routine is neutralized, Chef catches the EOF exception here:

https://github.com/chef/chef/blob/master/lib/chef/http.rb#L257

@knightorc

This comment has been minimized.

Copy link

commented May 6, 2019

Of note:

We install chef via gems on Linux. But this was also the case with the Omnibus RPM too.

@tas50

This comment has been minimized.

Copy link
Member

commented May 6, 2019

@dayglojesus we could potentially work around this, but this seems like an issue for Amazon to fix. They're abusing http in a way that impacts other consumers.

@dayglojesus

This comment has been minimized.

Copy link
Author

commented May 6, 2019

@tas50 Agreed, we cannot call this a Chef problem. Injecting a patch on Ruby core was a poor decision, but given the precedence for this, it might be prudent for Chef to protect its users against this. I would be happy to submit a lil retry patch for EOF akin to the other HTTP error rescue block.

@dayglojesus

This comment has been minimized.

Copy link
Author

commented May 6, 2019

@tas50 wow, i'm looking at the issues in aws-sdk and it appears this issue has been around for awhile...
aws/aws-sdk-ruby#1167
aws/aws-sdk-ruby#1752
aws/aws-sdk-ruby#1979

I am going open another issue over there, but if they haven't fixed it by now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.