Wrong answer from merge table after content changes #6374
Closed
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Date: 2017-07-17 14:07:00 +0200
From: Richard Hughes <<richard.monetdb>>
To: SQL devs <>
Version: 11.25.23 (Dec2016-SP5)
CC: @njnes, shawpolakcrax12
Last updated: 2019-06-07 09:47:18 +0200
Comment 25492
Date: 2017-07-17 14:07:00 +0200
From: Richard Hughes <<richard.monetdb>>
Build: default branch 8125f999c846 (2017-7-14)
Reproducible: always
sql>create table sub1 (i int);
operation successful (27.770ms)
sql>create table sub2 (i int);
operation successful (13.348ms)
sql>create merge table mt as select * from sub1 with no data;
operation successful (12.755ms)
sql>alter table mt add table sub1;
operation successful (5.060ms)
sql>alter table mt add table sub2;
operation successful (5.471ms)
sql>insert into sub1 values (1),(2);
2 affected rows (6.190ms)
sql>insert into sub2 values (11),(12);
2 affected rows (6.607ms)
sql>alter table sub1 set read only;
operation successful (6.666ms)
sql>alter table sub2 set read only;
operation successful (6.624ms)
sql>analyze sys.sub1;
sql>analyze sys.sub2;
sql>plan select count() from mt where i >= 5 and i < 100;
+------------------------------------------------------------------+
| rel |
+==================================================================+
| project ( |
| | group by ( |
| | | select ( |
| | | | table(sys.sub2) [ "mt"."i", "mt"."%TID%" NOT NULL ] COUNT |
| | | ) [ int "5" <= "mt"."i" < int "100" ] |
| | ) [ ] [ sys.count() NOT NULL as "L3"."L3" ] |
| ) [ "L3" NOT NULL as "L4"."L3" ] |
+------------------------------------------------------------------+
7 tuples (1.274ms)
sql>select count() from mt where i >= 5 and i < 100;
+------+
| L3 |
+======+
| 2 |
+------+
1 tuple (3.162ms)
sql>alter table sub1 set read write;
operation successful (13.473ms)
sql>insert into sub1 values (7),(8);
2 affected rows (6.402ms)
sql>alter table sub1 set read only;
operation successful (6.637ms)
sql>analyze sys.sub1;
sql>plan select count() from mt where i >= 5 and i < 100;
+------------------------------------------------------------------+
| rel |
+==================================================================+
| project ( |
| | group by ( |
| | | select ( |
| | | | table(sys.sub2) [ "mt"."i", "mt"."%TID%" NOT NULL ] COUNT |
| | | ) [ int "5" <= "mt"."i" < int "100" ] |
| | ) [ ] [ sys.count() NOT NULL as "L3"."L3" ] |
| ) [ "L3" NOT NULL as "L4"."L3" ] |
+------------------------------------------------------------------+
7 tuples (1.264ms)
sql>select count() from mt where i >= 5 and i < 100;
+------+
| L3 |
+======+
| 2 |
+------+
1 tuple (4.480ms)
sql>select T.name,S.* from sys.statistics S join sys.columns C on S.column_id=C.id join sys.tables T on T.id=C.table_id where C.name='i' and T.name like 'sub_';
+------+-----------+------+-------+----------------------------+--------+-------+--------+------+--------+--------+--------+-----------+
| name | column_id | type | width | stamp | sample | count | unique | nils | minval | maxval | sorted | revsorted |
+======+===========+======+=======+============================+========+=======+========+======+========+========+========+===========+
| sub1 | 8265 | int | 4 | 2017-07-17 11:15:28.000000 | 4 | 4 | 4 | 0 | 1 | 8 | true | false |
| sub2 | 8268 | int | 4 | 2017-07-17 11:15:28.000000 | 2 | 2 | 2 | 0 | 11 | 12 | true | false |
+------+-----------+------+-------+----------------------------+--------+-------+--------+------+--------+--------+--------+-----------+
2 tuples (66.370ms)
Notice that the final plan is wrong, resulting in the wrong count(*) immediately below. The plan should include 'sub1' and the count should be 4. The contents of sys.statistics is correct.
What's happening is that there's an in-memory cache of the statistics in sql_column::min/max. The following patch, which simply bodges out all usage of those values, confirms this by making the above test work:
diff -r 8125f999c846 sql/storage/store.c
--- a/sql/storage/store.c Fri Jul 14 16:22:22 2017 +0200
+++ b/sql/storage/store.c Mon Jul 17 12:28:45 2017 +0100
@@ -4724,11 +4724,6 @@
sql_schema *sys = find_sql_schema(tr, "sys");
sql_table *stats = find_sql_table(sys, "statistics");
I'm not sure what the right fix is, however I've looked at all the uses of sql_trans_ranges() and concluded that the cache isn't used enough to be important for performance for us. Removing it entirely, therefore, might not necessarily be wrong.
Comment 25522
Date: 2017-07-28 20:47:43 +0200
From: @njnes
added cleanup code in the sql_analyze call.
Comment 25524
Date: 2017-07-28 22:45:49 +0200
From: MonetDB Mercurial Repository <>
Changeset 6e732bac4683 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.
For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=6e732bac4683
Changeset description:
Comment 25530
Date: 2017-07-31 11:28:10 +0200
From: Richard Hughes <<richard.monetdb>>
This fix doesn't work, I'm afraid.
Running the original test across two connections, one doing all the writes and one read-only, still reproduces the problem:
=====connection 1======
create table sub1 (i int);
create table sub2 (i int);
create merge table mt as select * from sub1 with no data;
alter table mt add table sub1;
alter table mt add table sub2;
insert into sub1 values (1),(2);
insert into sub2 values (11),(12);
alter table sub1 set read only;
alter table sub2 set read only;
analyze sys.sub1;
analyze sys.sub2;
=====connection 2======
sql>plan select count() from mt where i >= 5 and i < 100;
+------------------------------------------------------------------+
| rel |
+==================================================================+
| project ( |
| | group by ( |
| | | select ( |
| | | | table(sys.sub2) [ "mt"."i", "mt"."%TID%" NOT NULL ] COUNT |
| | | ) [ int "5" <= "mt"."i" < int "100" ] |
| | ) [ ] [ sys.count() NOT NULL as "L3"."L3" ] |
| ) [ "L3" NOT NULL as "L4"."L3" ] |
+------------------------------------------------------------------+
7 tuples (1.488ms)
sql>select count() from mt where i >= 5 and i < 100;
+------+
| L3 |
+======+
| 2 |
+------+
1 tuple (3.514ms)
=====connection 1======
alter table sub1 set read write;
insert into sub1 values (7),(8);
alter table sub1 set read only;
analyze sys.sub1;
=====connection 2======
sql>plan select count() from mt where i >= 5 and i < 100;
+------------------------------------------------------------------+
| rel |
+==================================================================+
| project ( |
| | group by ( |
| | | select ( |
| | | | table(sys.sub2) [ "mt"."i", "mt"."%TID%" NOT NULL ] COUNT |
| | | ) [ int "5" <= "mt"."i" < int "100" ] |
| | ) [ ] [ sys.count() NOT NULL as "L3"."L3" ] |
| ) [ "L3" NOT NULL as "L4"."L3" ] |
+------------------------------------------------------------------+
7 tuples (1.322ms)
sql>select count() from mt where i >= 5 and i < 100;
+------+
| L3 |
+======+
| 2 |
+------+
1 tuple (3.568ms)
It looks to me like sql_analyze() is wiping out only its own copy of sql_column, not the one(s) belonging to other connections.
Comment 25650
Date: 2017-09-20 08:55:50 +0200
From: @njnes
the reset_column now also clears the cache on column changes.
The text was updated successfully, but these errors were encountered: