Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage as a result of constant to CredHub client construction #2300

Closed
vito opened this issue Jun 19, 2018 · 6 comments
Closed

High CPU usage as a result of constant to CredHub client construction #2300

vito opened this issue Jun 19, 2018 · 6 comments

Comments

@vito
Copy link
Member

vito commented Jun 19, 2018

A number of users have reported high CPU usage with v3.14.1. This seems to be coming from the lazy CredHub client construction, which was introduced to fix #2154 but seems to be unintentionally creating a CredHub client repeatedly rather than just once (on first use).

I think this is happening because this doesn't return a pointer: https://github.com/concourse/atc/blob/master/creds/credhub/manager.go#L98 ... and so the client is always assigned on a copy.

We should also make sure this is safe for concurrent use. I believe the current code won't be happy with -race once we make it a pointer. This may be a good time to use *sync.Once: https://golang.org/pkg/sync/#Once

  • Concourse version: v3.14.1
  • Deployment type (BOSH/Docker/binary): any that use CredHub
  • Infrastructure/IaaS: any
  • Browser (if applicable): n/a
  • Did this used to work? yes
@jmcarp
Copy link
Contributor

jmcarp commented Jun 20, 2018

😱

mhuangpivotal added a commit to vmware-archive/atc that referenced this issue Jun 20, 2018
concourse/concourse#2300

Signed-off-by: Nader Ziada <nziada@pivotal.io>
nader-ziada added a commit that referenced this issue Jun 27, 2018
#2300

Submodule src/github.com/concourse/atc ef5c21f..ca5bba4:
  > add once.Do around creation of credhub client and make lazyCredhub a pointer

Signed-off-by: Mark Huang <mhuang@pivotal.io>
@engrun
Copy link

engrun commented Jun 28, 2018

We have stumbled upon this issue (see discuss)
Does anyone have an estimated date for a fix?

@jama22
Copy link
Member

jama22 commented Jun 28, 2018

Update for ya'll, our fix moved through the pipeline so it should be coming in the next release

@iferunsewe
Copy link

We don't think the proposed fix works after having investigated with a profiler. More details are in this PR: vmware-archive/atc#291

@heycait
Copy link

heycait commented Jul 13, 2018

We (Pivotal RelEng team) deploy with a nightly build containing the initial fix here. But our ATC VMs are still working pretty hard (~30% CPU across 8 cores and 4 ATC VMs) and we're frequently seeing "too many open files" so it may be that the issue isn't completely fixed in the two commits linked above.

@vito vito added the accepted label Jul 16, 2018
@vito vito closed this as completed Jul 16, 2018
@vito
Copy link
Member Author

vito commented Jul 16, 2018

Yep, we've merged vmware-archive/atc#291 now which should actually fix it. Thanks again @iferunsewe!

@vito vito modified the milestone: v4.0.0 Jul 16, 2018
@vito vito modified the milestone: v4.0.0 Jul 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants