New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional dbfarm corruption upon database restart #7152
Comments
this is a known problem that occasionally happens. unfortunately, we've never got sufficient information to be able to find its cause. if you can give us any more information, we're more than happy to investigate. A workaround (don't forget to create a backup of the current dbfarm first!) is to create the missing file and fill it with dummy data until it has reached the expected size. the BBP.dir file should tell you the data type. in this way, you can at least get the database restarted to save the remaining data. |
Thanks Jennie. Just a thought, wouldn't it be useful to make the workaround automatic, and then inform the user that tables x,y,w are corrupt? |
Maybe it's not much, but something else I noticed:
|
checked in a possible fix (rolled forward changes from jun branch) |
We made some fixes recently on the Jul2021 branch. Please check once the Jul2021-SP1 comes out if this still happens. |
Unfortunately this still happens (Jan2022, git head). A seemingly fine database was running on a system that somehow was leaking 11G of disk space. Then when I try to restart the db, it refused with:
So this is most likely the file that held those 11G. It had already been deleted from disk, but it was still open in MonetDB. I'm not sure what I can do to help debug this, but it is quite serious. |
When you notice this again on a still running database, could you attach a debugger and call
It would then be interesting to correlate the output with the files present in the database, so if you could also list all files inside the database at the same time (i.e. when the server is stopped in the debugger), and upload those two results, that would (hopefully) be helpful. |
Describe the bug
For the third time, on different databases, it happened that a properly shut down database would not restart, with the following errors found in
merovingian.log
:Storage is local SSD, I tend to exclude related issues.
To Reproduce
Unfortunately I am not able to reproduce it reliably. I can only say it never happened before Oct2020, and now it already happened 3 times, so I guess there is a bug in the storage layer triggered by some corner-case.
I know it's hard to find the cause without a test, I just hope this can ring a bell.
Software versions
The text was updated successfully, but these errors were encountered: