-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit logging by cloudprober container #679
Comments
Hi @ls692, We changed the underlying logger in Aug, 2023: #462 to use Go's structured logger (log/slog) instead of Google logger (https://github.com/golang/glog). Interestingly the new logger doesn't write to disk at all. It simply emits logs to stderr and it does that regardless of whether logs are going to cloud-logging or not. Does the internal version have the change above? Do you see any logs being written to /tmp at all ... there should not be any, but I am wondering if system logger ( There is also a way to configure journald instead of dropping everything: |
It does look like stderr is being pulled in by journald, atleast from the VMs that are functioning fine. Its not going to tmp but to /var/log/journald and it is being reported as the output of "cloudprober_updater.sh" I will try disabling logging for this module to get back to the old behavior. |
After some debugging, I am now fairly confident that it is the cloudprober logger that is causing disk usage issues. journald has protection for SystemMaxUse (1GB) and SystemKeepFree(default of 15% of disk size), so journald should not have used all of the disk. I do see disk usage when inspecting a sample container (details at the bottom). docker documentation (link in description) also indicates that the default json-file logger can use up all disk space but again I could not verify from an existing "bad" VM. The only reason we seem to be running into this is that some of the probes are very noisy (e.g., during initialization time when things are broken) so in most other cases the default logger is generates pretty reasonable size of logs. However, given the high cost of fixing (recreating the VM), limiting logging might be a good tradeoff.
|
Thanks Lenin, for looking further! I had no idea that docker writes container logs to a local file with no default size limit (this is problematic in general). Looking at https://docs.docker.com/config/containers/logging/configure/, we've the following options:
We can restore the old behavior but it had a problem that if logging fails for some reason (e.g. if logging API is not enabled or out of quota), user may not come to know what's going on -- or maybe logging package will output something when that happens, I'll try to find out. Also, it might make sense to change the startup script option to set |
As per Google's cloud logging package:
So it seems it will print to stderr if there is a problem flushing logs. |
I prefer journald a bit more because it allows for better configuration of logging. Let me propose a CL for that. |
Lenin, assuming this resolved. |
Describe the bug
We are seeing several cases of cloudprober fully using the local disk.
This is happening with versions from early Nov and we are not sure of the root-cause as we are unable to log into the broken VM :) due to disk-full errors. I suspect that cloud-logging calls fail or is backed up and we start logging to disk. And once disk gets full, we are unable to login.
From https://docs.docker.com/config/containers/logging/configure/ it seems that we can configure a max size. Is there any risk to adding something like say
docker run -e "SYSVARS=${VARS}" --env-file <(env | grep CLOUDPROBER_)
--log-opt "max-size=100m" --log-opt "max-file=3" --log-driver "local"
--net host --privileged -v /tmp:/tmp "${IMAGE}:${VERSION}"
to third_party/cloudprober/tools/cloudprober_startup.sh to use the local logging driver, limit to 3 files and limit to 100M size as a quick fix and work on making this configurable later on.
The text was updated successfully, but these errors were encountered: