-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.1.4] Wild logging when disk full #21756
Comments
That "no space left on device" is the error message associated with ENOSPC. I agree with your assessment, it appears that this got into a recursive loop trying to write the "no space left on device" log message and then got a stack overflow. @a-robinson is this the kind of thing you've been looking at, or a new behavior? |
This is not what I've been looking at. I haven't seen this before. It should be easy to track down, though -- from the logs it looks like the logging exit function is trying to log something, which is calling the logging exit function, and so on. |
Trying to write to a file when we're out of disk will trigger exitLocked, but exitLocked tries to write to its file one last time in order to help users understand why the process is exiting. This is very valuable most of the time, when the problem isn't that the machine is out of disk, but shouldn't cause a stack overflow when the machine is out of space. Fixes cockroachdb#21756 Release note (bug fix): fix a stack overflow in the code for shutting down a server when out of disk space
Trying to write to a file when we're out of disk will trigger exitLocked, but exitLocked tries to write to its file one last time in order to help users understand why the process is exiting. This is very valuable most of the time, when the problem isn't that the machine is out of disk, but shouldn't cause a stack overflow when the machine is out of space. Fixes cockroachdb#21756 Release note (bug fix): fix a stack overflow in the code for shutting down a server when out of disk space
BUG REPORT
Please supply the header (i.e. the first few lines) of your most recent
log file for each node in your cluster. On most unix-based systems
running with defaults, this boils down to the output of
grep -F '[config]' cockroach-data/logs/cockroach.log
When log files are not available, supply the output of
cockroach version
and all flags/environment variables passed to
cockroach start
instead.What did you do?
I have a 3-node cluster.
I filled up the 80GB partition on the 1st node, the one on which the cockroachdb store directory is located.
Requests were succeeding.
I then started filling up the same partition on the 2nd node but when I got halfway I tried another request on node 1 and found that it took a very long time to respond.
I checked the logs and found that they appeared to be logging in a tight loop.
After a few seconds the process crashed with the attached logs and stacktrace.
It looks like it tries to log an error if it fails to write a log.
What did you expect to see?
I expect CockroachDB to exit on ENOSPC.
What did you see instead?
Wild logging followed by a crash.
logs.txt
The text was updated successfully, but these errors were encountered: