Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible sql_catalog corruption due to unclean backuped tail #4036

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Possible sql_catalog corruption due to unclean backuped tail #4036

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2016-07-07 14:46:13 +0200
From: anthonin.bonnefoy
To: GDK devs <>
Version: 11.21.5 (Jul2015)

Last updated: 2016-07-22 09:56:12 +0200

Comment 22240

Date: 2016-07-07 14:46:13 +0200
From: anthonin.bonnefoy

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build Identifier:

sql_catalog_nme and sql_catalog_bid can be backuped du to a GDKupgradevarheap.
On commit, they can be moved (with logger_switch_bat bm_subcommit), however, the backuped tail is never cleaned.
After enough commit, they can grab an old bat id which have an old backup.
If a restart occur at this moment, the backup will be restored and a discrepancy will (probably) occur between BBP.dir and the real size.

Reproducible: Always

Steps to Reproduce:

  1. Start a farm, preferably on a tmpfs
    mkdir /tmp/farm
    sudo mount -t tmpfs -o size=10G tmpfs /tmp/farm
    monetdbd create /tmp/farm
    monetdbd start /tmp/farm
    monetdb create toto
    monetdb release toto

  2. Lower commit time for faster testing
    in sql/storage/store.c

  •           for (t = 30000; t > 0 && !need_flush; t -= timeout) {
    
  •           for (t = 2000; t > 0 && !need_flush; t -= timeout) {
    
  1. Run this script

!/bin/bash
set -e

seq 5000 | awk 'BEGIN { OFS = "\t" } ; { print $1,$1,$1,$1,$1,$1,$1,$1,$1,$1 }' > /tmp/data.csv

base="/tmp/farm/toto/"
mclient toto -s "create schema sact;" 2> /dev/null || true

check_bat()
{
local name="$1"
local bid="$(grep "$name" "$base/bat/BACKUP/BBP.dir" | awk '{print $5}' | xargs basename)"
if [[ -f $base/bat/BACKUP/${bid}.tail ]]; then
echo "Got $bid for $name"
pkill mserver5
exit 0
fi
}

for (( i = 0; i < 100000; i++ )); do

 table="sact.table_${i}"

 create_query="CREATE TABLE $table (toto TEXT, toto2 TEXT, toto3 TEXT, toto4 TEXT, toto5 TEXT, toto6 TEXT, toto7 TEXT, toto8 TEXT, toto9 TEXT, toto10 TEXT);"

 mclient toto -s "$create_query" > /dev/null

 mclient toto -s "COPY INTO $table FROM '/tmp/data.csv' DELIMITERS '\t','\n','\"' NULL AS '<NULL>';" > /dev/null
 mclient toto -s "drop table $table;" > /dev/null

 check_bat "sql_catalog_bid"
 check_bat "sql_catalog_nme"

done

mserver5 should be killed while an invalid backup is present. On restart, mserver will crash with "!FATAL: logger_load: inconsistent database, catalog does not exist"
It might be necessary to launch the script several times (I usually got the problem after 2 or 3 times).

Comment 22241

Date: 2016-07-08 15:51:29 +0200
From: MonetDB Mercurial Repository <>

Changeset 5ea6939edb61 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=5ea6939edb61

Changeset description:

Fix for bug #4036.
GDKupgradevarheap creates a backup for persistent BATs in the BACKUP
directory.  If the BAT is then made transient in the same transaction,
the subcommit that does the work ignored the BACKUP file for the BAT,
so it remained, causing havoc later when the BAT ID was reused.  Now,
subcommit will move the file to the SUBCOMMIT directory which gets
renamed and then deleted when the transaction commits.

Comment 22242

Date: 2016-07-08 15:52:35 +0200
From: @sjoerdmullender

This was a great bug report. Thanks!

It took me a while, but I was able to nail the problem and implement a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant