Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing BAT for a column leads to crash in gtr_update_delta #3647

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

missing BAT for a column leads to crash in gtr_update_delta #3647

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2015-01-10 00:21:37 +0100
From: sorear
To: SQL devs <>
Version: 11.19.7 (Oct2014-SP1)
CC: @njnes

Last updated: 2015-05-07 12:37:43 +0200

Comment 20551

Date: 2015-01-10 00:21:37 +0100
From: sorear

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
Build Identifier:

One of our MonetDB servers recently entered a state where a particular column existed in sys._columns but was missing from sql_catalog_nme. This lead to several crashes, as follows. create_col (sql/storage/bat/bat_storage.c:965) will return LOG_ERROR and create an invalid (?) sql_delta record with a zero bat it. load_column (sql/storage/store.c:517; line numbers for Oct2014SP1) does not check this error message, and the server starts normally.

However, any attempt to read the batless column results in a server segfault, and (much more problematically) all checkpoints will crash, since gtr_update walks over all columns and gtr_update_delta (./sql/storage/bat/bat_storage.c:1475) does not check whether the inserts-BAT actually exists before checking to see if it has elements.

After crashing, MonetDB reloads the same invalid state, and crashes in the same way on the next attempt to checkpoint (translating into crashes every 30 seconds triggered by the store_update timer).

I was able to get the system back into a stable state by running DROP COLUMN on the offending columns.

Detailed instructions for corrupting a MonetDB database in this way with gdb will be created on request.

I do not currently understand how the corruption was created in the first place; if I can reproduce the corruption I will open a separate ticket for it. If it helps, the column in question had just been dropped and the transaction where the ALTER TABLE DROP COLUMN was run had committed successfully. There may have been an unrelated crash during the gtr_update run that was supposed to flush the DROP COLUMN which set this in motion.

Reproducible: Always

Comment 20575

Date: 2015-01-26 18:54:30 +0100
From: @njnes

the bat should exist. A possible cause for loosing the bat was fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant