Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.reuse() corrupts data #3544

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

sys.reuse() corrupts data #3544

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2014-08-18 15:27:16 +0200
From: Richard Hughes <<richard.monetdb>>
To: SQL devs <>
Version: 11.17.21 (Jan2014-SP3)
CC: @njnes

Last updated: 2014-10-31 14:14:56 +0100

Comment 20050

Date: 2014-08-18 15:27:16 +0200
From: Richard Hughes <<richard.monetdb>>

I know that vacuum/reuse/shrink are not officially supported, but I'm hoping that either a quick fix or an alternate way of accomplishing what we want is possible. We're not using foreign keys at all. A rationale for why I'm using unsupported functions is at the bottom of this bug report.

Test script:

START TRANSACTION;
CREATE TABLE "sys"."test_201306" (
"id" BIGINT,
"s1" VARCHAR(127),
"s2" VARCHAR(127),
"s3" VARCHAR(127),
"i1" BIGINT,
"s4" VARCHAR(127),
"s5" VARCHAR(127)
);
COPY 5 RECORDS INTO "sys"."test_201306" FROM stdin USING DELIMITERS '\t','\n','"';
4906 "1-4520.10.1.4.82" "Et8oB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979992 "" ""
4907 "1-4564.10.1.4.82" "vn5nB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979991 "" ""
4908 "1-4576.10.1.4.82" "sp3pB.Kf0vmo40-8yTpHAdf" "10.1.4.82" 78911881069979993 "" ""
4909 "1-2560.10.1.4.82" "p5dsV5B_3vmo40-8yTpHAdf" "10.1.4.82" 78911885093920348 "" ""
4910 "1-1596.10.1.4.82" "8Q1Z7RnE4vmo40-8yTpHAdf" "10.1.4.82" 78911885778309629 "" ""
COMMIT;
create table x as select * from test_201306 with data;
select id from x;
call sys.reuse('sys','x');
select id from x;
call sys.reuse('sys','x');
select * from x;

Expected results:

+------+
| id |
+======+
| 4906 |
| 4907 |
| 4908 |
| 4909 |
| 4910 |
+------+
5 tuples
+------+
| id |
+======+
| 4906 |
| 4907 |
| 4908 |
| 4909 |
| 4910 |
+------+
5 tuples
+------+--------------+---------------+-----------+-------------------+------+
| id | s1 | s2 | s3 | i1 | s4 |>
+======+==============+===============+===========+===================+======+
| 4906 | 1-4520.10.1. | Et8oB.Kf0vmo4 | 10.1.4.82 | 78911881069979992 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4907 | 1-4564.10.1. | vn5nB.Kf0vmo4 | 10.1.4.82 | 78911881069979991 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4908 | 1-4576.10.1. | sp3pB.Kf0vmo4 | 10.1.4.82 | 78911881069979993 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4909 | 1-2560.10.1. | p5dsV5B_3vmo4 | 10.1.4.82 | 78911885093920348 | |
: : 4.82 : 0-8yTpHAdf : : : :
| 4910 | 1-1596.10.1. | 8Q1Z7RnE4vmo4 | 10.1.4.82 | 78911885778309629 | |
: : 4.82 : 0-8yTpHAdf : : : :
+------+--------------+---------------+-----------+-------------------+------+
5 tuples !1 column dropped!

Actual results:

+------+
| id |
+======+
| 4906 |
| 4907 |
| 4908 |
| 4909 |
| 4910 |
+------+
5 tuples
+----------------------+
| id |
+======================+
| 4910 |
| 4907 |
| 4908 |
| 4909 |
| -2604246222170760229 |
+----------------------+
5 tuples
+----------------------+------------------+----------------------+-----------+
| id | s1 | s2 | s3 |>
+======================+==================+======================+===========+
| 4910 | 1-1596.10.1.4.82 | 8Q1Z7RnE4vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| 4907 | 1-4564.10.1.4.82 | vn5nB.Kf0vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| 4908 | 1-4576.10.1.4.82 | sp3pB.Kf0vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| 4909 | 1-2560.10.1.4.82 | p5dsV5B_3vmo40-8yTpH | 10.1.4.82 |
: : : Adf : :
| -2604246222170760229 | | | |
+----------------------+------------------+----------------------+-----------+
5 tuples !3 columns dropped!

The conditions under which it goes wrong are extremely random. If you can't reproduce it then try adding or removing some columns or calling reuse() some more times. It always seems to be the last row of the table that is corrupted.

Happens with both Jan2014SP3 release and current Jan2014 HEAD.

Our rationale for using reuse() is twofold: we are replicating from a standard row-store into MonetDB for the obvious analytics performance reasons. Our database is mostly-insert but very occasionally a bunch of rows will need to be updated. We used to handle this with UPDATE statements but they were taking on the order of 100ms per row and were causing our replication to get embarrassingly far behind reality; the fix for this was to switch to DELETE+INSERT.

Having made that switch we noticed that performance would dive for queries on any table which had ever had a DELETE on it; the performance dive seemed to be associated with an inability to keep an index on that table.

Unrelated to the deletions, we also observed that some of our columns were taking up far more disk space than we expected. Notably, varchar columns which have a large number of distinct values (but not all-distinct) appear to switch into a mode whereby they would treat all values as distinct, and thereby consume the full amount of disk space required for that scenario. A call to sys.reuse() would rehash the values of these columns and hence reclaim several hundred GB of disk space.

Before jumping for the undocumented features we tried using "create table bar as select * from foo with data" to repack the table but, in addition to being considerably slower than reuse(), this did nothing to rehash varchar columns.

I sincerely apologize for my evilness, but hope that you can forgive me enough to look in to the problem in any case.

Comment 20054

Date: 2014-08-18 17:39:06 +0200
From: Richard Hughes <<richard.monetdb>>

I just crashed mserver5 (Jan2014 HEAD). I'm not sure whether it's related to this or not.

(gdb) bt
0 DELTAproject (result=0x7fe3933d6190, sub=,
col=, uid=, uval=0x7fe3933d6170,
ins=) at sql.c:2175
1 0x00007fe87c8a247b in runMALsequence (cntxt=0x9cc0, mb=0x7fe392f43b40,
startpc=0, stoppc=0, stk=0x7fe3933d48c0, env=0x7fe3d4096630,
pcicaller=0x0) at mal_interpreter.c:654
2 0x00007fe87c8a42c5 in DFLOWworker (T=0x9cc0) at mal_dataflow.c:358
3 0x00007fe87b2360a4 in start_thread (arg=0x7fe3d309f700)
at pthread_create.c:309
4 0x00007fe87af6b04d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p res
$1 = (BAT *) 0x0

I was getting "BATproject: does not match always" from queries for a few minutes before this finally happened. The interesting thing was that I was getting that from queries on one table where sys.reuse() had been called recently and from queries on one table where sys.reuse() had never been called.

Comment 20055

Date: 2014-08-19 14:47:44 +0200
From: Richard Hughes <<richard.monetdb>>

Created attachment 295
Proposed fix

How's this for a fix?

Attached file: reuse-bug3544.patch (text/plain, 954 bytes)
Description: Proposed fix

Comment 20094

Date: 2014-08-26 08:31:46 +0200
From: @njnes

patch checked into oct2014 which solves this bug.

Comment 20397

Date: 2014-10-31 14:14:56 +0100
From: @sjoerdmullender

Oct2014 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant