Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CredHub interactions take 97% of ATC CPU time #2373

Closed
DanielJonesEB opened this issue Jul 12, 2018 · 5 comments
Closed

CredHub interactions take 97% of ATC CPU time #2373

DanielJonesEB opened this issue Jul 12, 2018 · 5 comments

Comments

@DanielJonesEB
Copy link

Bug Report

Given a Concourse with secrets backed by CredHub, we were seeing 97% of the ATC's CPU time going to TLS handshakes when it was looking up secrets.

This is because credhub-cli (understandably, as a CLI library) was not using HTTP Keepalive, forcing each secret lookup to use a new connection, and this a new TLS handshake. The issue was fixed in this PR: cloudfoundry/credhub-cli#45

This has since been fixed as of commit cloudfoundry/credhub-cli@0887387. Bumping to this version will see a significant performance improvement for those using CredHub.

  • Concourse version: 3.14.1
  • Deployment type (BOSH/Docker/binary): BOSH
  • Infrastructure/IaaS: AWS
  • Did this used to work? No.

Before:
screen shot 2018-07-06 at 13 27 30

After:
screen shot 2018-07-06 at 13 27 37

Difference:
screen shot 2018-07-06 at 13 28 11

@jama22
Copy link
Member

jama22 commented Jul 12, 2018

Pretty sure this is similar, if not the same, as #2300

@DanielJonesEB
Copy link
Author

DanielJonesEB commented Jul 12, 2018

'ello @jama-pivotal! We think this is orthogonal. Proper engineers wot write proper code could tell you more, but it's to do with the way the stream is being read or something, so instantiation isn't relevant (EDIT: unless it establishes a connection on construction?).

@takeyourhatoff
Copy link
Contributor

@jama-pivotal Hi, without the fix for #2300 then cloudfoundry/credhub-cli#45 will have no effect on ATC. They are two separate problems however. I actually had to downgrade concourse to v3.13 to expose this issue. The reason #2300 was so bad was because as part of the client construction credhub-cli reads and parses all of the system CA files.

With neither issue fixed, the flame graph is all parsing system root certificates, with #2300 fixed, it's all in TLS handshakes to credhub, and with both fixed ATC can actually get on and do work 🎉

@DanielJonesEB
Copy link
Author

Hey @jama-pivotal,

We (read: @takeyourhatoff) just realised the fix in #2300 doesn't work because the method that does the lazy initialisation had a non-pointer receiver, so was acting on a fresh instance of the lazyCredhub struct each time.

We're just testing a fix for that, so disregard this until we've had a chance to address #2300. We'll verify it with a profiler before raising a PR.

@jama22
Copy link
Member

jama22 commented Aug 13, 2018

thanks for the context @takeyourhatoff @DanielJonesEB . Gonna watch the PR closely for the fix, looking forward to it 🙏

@vito vito added this to the v4.0.0 milestone Aug 28, 2018
@vito vito closed this as completed Aug 28, 2018
Operations automation moved this from Icebox to Done Aug 28, 2018
@vito vito added the accepted label Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants