-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concourse worker making excessive writes to disk #2906
Comments
Some more things we learned:
Can you suggest other data we should be collecting, or other things to look for? We are starting to impact other teams in the data center with our excessive disk usage so we would like to get to the bottom of this. |
We are having somewhat similar issues on 4.2.2. I'm not sure if it's the same but we also get various failed-to-destroy and other volume management errors. This causes the worker to become stalled. |
Could you try switching the volume driver to Just a quick note: this sounds more like a support request than a discrete bug report - I would suggest using the forums or our Discord channel for cases like this where more investigation is required and the behavior doesn't seem particularly related to Concourse code. This just makes our lives easier and keeps the issues from being an ever-growing backlog. 🙂 Thanks for doing the investigation you've done so far, though! |
Bug Report
We have a Linux worker running in vsphere that seems to be making excessive writes to disk - on average of 50,000 Kbps. Redeploying the worker fixed the issue for perhaps 24 hours (bringing the disk iops down to ~7,000 Kbps) before it spiked up again.
Steps to Reproduce
We are using the following opsfiles from concourse-bosh-deployment:
cluster/external-worker.yml
cluster/operations/worker-ephemeral-disk.yml
cluster/operations/windows-worker.yml
cluster/operations/windows-worker-ephemeral-disk.yml
At some point, we removed our Windows worker to see if that was making a difference, but the high iops persist.
When we ssh onto the worker, using
iotop
we see that the offending process appears to be using theloop0
device:# iotop -o -d 5 -a
Also relevant,
loop0
is being used by baggageclaim:Expected Results
50,000 Kbps seems very high. We were expecting lower iops activity. This worker is only running a
cf push
every 5 minutes and acurl
every 1 minute. It's hard to imagine what all it could be writing to disk!Actual Results
Version Info
We don't know :(Actually this did used to work - the worker was up for 30+ days before we started seeing the issue (judging from performance charts in vCenter). However when we did a fresh redeploy, the issue reoccurred within ~24 hours.The text was updated successfully, but these errors were encountered: