Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent hashes stored and then ignored. Storage info not in sync with actual indices. #6492

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2017-12-11 16:36:48 +0100
From: @swingbit
To: GDK devs <>
Version: 11.27.9 (Jul2017-SP2)

Last updated: 2018-02-12 16:12:05 +0100

Comment 25982

Date: 2017-12-11 16:36:48 +0100
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36
Build Identifier:

Source code compiled with

define PERSISTENTHASH 1
undef DISABLE_PARENT_HASH

Working on this table (any table with many strings will do)
sql>\d bva_obj_dict
CREATE TABLE "spinque"."bva_obj_dict" (
"id" INTEGER NOT NULL,
"idstr" CHARACTER LARGE OBJECT NOT NULL,
"prob" FLOAT(51) NOT NULL
);

Moreover, the table is marked as READ ONLY.

sql>select * from storage where "table"='bva_obj_dict' and column='idstr';
+---------+--------------+--------+------+----------+----------+----------+-----------+------------+-----------+--------+-------+----------+--------+-----------+--------+----------+
| schema | table | column | type | mode | location | count | typewidth | columnsize | heapsize | hashes | phash | imprints | sorted | revsorted | unique | orderidx |
+=========+==============+========+======+==========+==========+==========+===========+============+===========+========+=======+==========+========+===========+========+==========+
| spinque | bva_obj_dict | idstr | clob | readonly | 12/1271 | 14414658 | 58 | 57658632 | 464470386 | 0 | false | 0 | false | false | null | 0 |
+---------+--------------+--------+------+----------+----------+----------+-----------+------------+-----------+--------+-------+----------+--------+-----------+--------+----------+

If I run:

SELECT * from bva_obj_dict where idstr = 'something';

this creates a hash for column idstr. It takes 8 seconds, and here I can see my hash:

sql>select * from storage where "table"='bva_obj_dict' and column='idstr';
+---------+--------------+--------+------+----------+----------+----------+-----------+------------+-----------+-----------+-------+----------+--------+-----------+--------+----------+
| schema | table | column | type | mode | location | count | typewidth | columnsize | heapsize | hashes | phash | imprints | sorted | revsorted | unique | orderidx |
+=========+==============+========+======+==========+==========+==========+===========+============+===========+===========+=======+==========+========+===========+========+==========+
| spinque | bva_obj_dict | idstr | clob | readonly | 12/1271 | 14414658 | 58 | 57658632 | 464470386 | 124780584 | false | 0 | false | false | null | 0 |
+---------+--------------+--------+------+----------+----------+----------+-----------+------------+-----------+-----------+-------+----------+--------+-----------+--------+----------+

However, this hash is not persistent and is gone after a few seconds.

The issue is that it should have been persistent, as I compiled with PERSISTENDHASH = 1 (and I'm not even working on a view).

I followed the hash creation with gdb, it actually DOES get into BAThashsync and persist it.
I can even see the file on my filesystem:

$ ll bat/12/1271*
-rw-------. 1 roberto roberto 57671680 Nov 12 12:39 bat/12/1271.tail
-rw-------. 1 roberto roberto 124846080 Dec 11 16:27 bat/12/1271.thash
-rw-------. 1 roberto roberto 464846848 Dec 11 15:45 bat/12/1271.theap

However it seems to be forgotten.
Next time the hash is needed, it is recreated and persisted again.

Side-question: why is the modification time of the heap so recent? That table has been created on Nov 12, made READ ONLY and never modified again. Can the heap file have been modified?

Reproducible: Always

$ mserver5 --version
MonetDB 5 server v11.27.12 (64-bit, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2017 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 15.6GiB available memory, 8 available cpu cores
Libraries:
libpcre: 8.41 2017-07-05 (compiled with 8.41)
openssl: OpenSSL 1.1.0g 2 Nov 2017 (compiled with OpenSSL 1.1.0g-fips 2 Nov 2017)
libxml2: 2.9.4 (compiled with 2.9.4)
Compiled by: roberto@photon.hq.spinque.com (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wduplicated-cond -Wduplicated-branches -Wrestrict -Wnested-externs -Wunreachable-code
Linking : /usr/bin/ld -m elf_x86_64

Comment 25983

Date: 2017-12-11 16:38:40 +0100
From: @swingbit

And, for what is worth, this is with sequential_pipe

Comment 25984

Date: 2017-12-11 20:59:19 +0100
From: @swingbit

The same thing happens with ORDERED INDEX.

I create an index, it shows in storage (its size), and after a few seconds it's gone.

Comment 25985

Date: 2017-12-11 21:12:54 +0100
From: @swingbit

Correction. It is not "after a few seconds", but when the mclient session is closed.

sql>\d x
CREATE TABLE "sys"."x" (
"idstr" CHARACTER LARGE OBJECT,
CONSTRAINT "x_idstr_unique" UNIQUE ("idstr")
);

sql>select column,"unique",orderidx from storage where "table"='x' and column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 0 |
+--------+--------+----------+
1 tuple (7.360ms)
sql>create ordered index idx on x(idstr);
operation successful (230.850ms)

sql>select column,"unique",orderidx from storage where "table"='x' and column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 3177888 |
+--------+--------+----------+
1 tuple (12.909ms)

sql>select column,"unique",orderidx from storage where "table"='x' and column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 3177888 |
+--------+--------+----------+
1 tuple (10.924ms)

$ mclient ...

Welcome to mclient, the MonetDB/SQL interactive terminal (unreleased)
Database: MonetDB v11.27.12 (unreleased), 'mapi:monetdb://photon.hq.spinque.com:55500/pacombi01'
Type \q to quit, ? for a list of available commands
auto commit mode: on
sql>select column,"unique",orderidx from storage where "table"='x' and column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 0 |
+--------+--------+----------+
1 tuple (15.250ms)
sql>

Also: why is column "unique" from storage always null (even after running ANALYZE), give that this column has UNIQUE CONSTRAINT?

Comment 25987

Date: 2017-12-12 10:07:31 +0100
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 0)

Side-question: why is the modification time of the heap so recent? That
table has been created on Nov 12, made READ ONLY and never modified again.
Can the heap file have been modified?

When the server starts up, it has no way of knowing in what state the database was left when the previous session ended. When you were to append data to a table, the SQL layer might well append the data to the bats and also write in the write-ahead log. If that transaction was never finished due to a crash, the bat was appended to, but the count remained as it was. For fixed-sized data types that's totally fine. For variable-sized data, the offset heap is also fine, but the data heap may have been changed in a way that is not consistent with the bat size. In particular, in string heaps, the hash table at the start of the string heap may have been modified. So at startup, the variable-sized heaps are repaired (a fairly simple operation). This involves rewriting the hash table. In your case, the repair code recognizes that nothing was in fact modified (the newly created hash table is identical to what it was), so it doesn't explicitly write the changes back to disk. However, since the heap is most likely memory-mapped and the repair code did rewrite the hash table, the OS does write the hash table back. This causes the update in the modification time.

Comment 25988

Date: 2017-12-12 10:19:28 +0100
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 3)

Correction. It is not "after a few seconds", but when the mclient session is
closed.

sql>\d x
CREATE TABLE "sys"."x" (
"idstr" CHARACTER LARGE OBJECT,
CONSTRAINT "x_idstr_unique" UNIQUE ("idstr")
);

sql>select column,"unique",orderidx from storage where "table"='x' and
column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 0 |
+--------+--------+----------+
1 tuple (7.360ms)
sql>create ordered index idx on x(idstr);
operation successful (230.850ms)

sql>select column,"unique",orderidx from storage where "table"='x' and
column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 3177888 |
+--------+--------+----------+
1 tuple (12.909ms)

sql>select column,"unique",orderidx from storage where "table"='x' and
column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 3177888 |
+--------+--------+----------+
1 tuple (10.924ms)

$ mclient ...

Welcome to mclient, the MonetDB/SQL interactive terminal (unreleased)
Database: MonetDB v11.27.12 (unreleased),
'mapi:monetdb://photon.hq.spinque.com:55500/pacombi01'
Type \q to quit, ? for a list of available commands
auto commit mode: on
sql>select column,"unique",orderidx from storage where "table"='x' and
column='idstr';
+--------+--------+----------+
| column | unique | orderidx |
+========+========+==========+
| idstr | null | 0 |
+--------+--------+----------+
1 tuple (15.250ms)
sql>

Also: why is column "unique" from storage always null (even after running
ANALYZE), give that this column has UNIQUE CONSTRAINT?

This may well be a red herring. After client disconnect, the bats are most likely unloaded, including the index heaps. When you then ask about this information again, the server doesn't see a loaded index heap, so it doesn't report a size for it. However, when the server (GDK layer) actually wants to use the ordered index, it checks for its existence and loads it if it is (still) valid. Try a point select. After this, when you look in sys.storage you'll find the ordered index again.

By the way, you can also use sys.storage('sys','bva_obj_dict', 'idstr').

The UNIQUE constraint doesn't say anything about NULL: you can still have multiple NULLs in the column, so the UNIQUE constraint doesn't map to the key property (called unique by sys.storage). The key property reflects the knowledge the GDK layer has about the uniqueness of the column, and in this case, it just doesn't know anything yet.

Comment 25989

Date: 2017-12-12 10:49:29 +0100
From: MonetDB Mercurial Repository <>

Changeset 394dbcf8bb99 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=394dbcf8bb99

Changeset description:

Don't ignore persisted hash tables.
This fixes bug #6492.

Comment 25991

Date: 2017-12-12 11:25:40 +0100
From: @swingbit

Hi Sjoerd,

About the persistent hash: thanks for the fix.

About the ordered index: I had the impression it was related to the persistent hash issue, but apparently I was wrong. Thanks for the explanation, I see why it happens. I would argue it is still something to fix because the catalogue not reflecting the actual state of the db can be very confusing. Especially because \d tells you that an ordered index exists, while the catalogue says the opposite. But the index works indeed, so it is definitely lower priority than I had thought.

About the unique column in the storage table: you are right that there was no NOT NULL constraint. However, even with both UNIQUE and NOT NULL constraints, and after running ANALYZE, the "unique" column in storage still shows null. Another non-critical but to-be-fixed point?

As a side note, I actually wasn't sure how NULLs would be treated with a UNIQUE constraint. MonetDB allows them through, following the principle that two NULLs are not equal. That is, as far as I understand the behaviour described in SQL standard. Other vendors, including Microsoft and Oracle, follow this principle in most operations, but not in the unique constraint, allowing at most one NULL. That is pretty ugly and confusing - I'm glad MonetDB cares about consistency.

Comment 25992

Date: 2017-12-12 11:26:28 +0100
From: @swingbit

Oh and thanks for the detailed explanation about the heap file modification time.

Comment 25993

Date: 2017-12-12 14:45:09 +0100
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 8)

Oh and thanks for the detailed explanation about the heap file modification
time.

That's also fixed.

Comment 25994

Date: 2017-12-12 14:59:47 +0100
From: @sjoerdmullender

(In reply to Roberto Cornacchia from comment 7)

Hi Sjoerd,

About the persistent hash: thanks for the fix.

About the ordered index: I had the impression it was related to the
persistent hash issue, but apparently I was wrong. Thanks for the
explanation, I see why it happens. I would argue it is still something to
fix because the catalogue not reflecting the actual state of the db can be
very confusing. Especially because \d tells you that an ordered index
exists, while the catalogue says the opposite. But the index works indeed,
so it is definitely lower priority than I had thought.

Now that ordered indexes are (also) created in other cases than just when you use CREATE ORDERED INDEX, and also since they are dropped when the column changes, the \d doesn't reflect whether an ordered index actually exists. We should probably recreate the index after the column was modified if it was created using the CREATE ORDERED INDEX statement.

It would be nice if sys.storage would at least acknoledge the indexes when they're not yet loaded. The question is, should it check whether the unloaded indexes are actually valid (that happens when we want to use it)?

About the unique column in the storage table: you are right that there was
no NOT NULL constraint. However, even with both UNIQUE and NOT NULL
constraints, and after running ANALYZE, the "unique" column in storage still
shows null. Another non-critical but to-be-fixed point?

Even when UNIQUE and NOT NULL are both in effect, there can still be duplicate values in the BAT. Namely ones of which all duplicates were deleted at the SQL level (but not at the BAT level since we don't delete).

As a side note, I actually wasn't sure how NULLs would be treated with a
UNIQUE constraint. MonetDB allows them through, following the principle that
two NULLs are not equal. That is, as far as I understand the behaviour
described in SQL standard. Other vendors, including Microsoft and Oracle,
follow this principle in most operations, but not in the unique constraint,
allowing at most one NULL. That is pretty ugly and confusing - I'm glad
MonetDB cares about consistency.

Anyway, I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant