Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow INSERT INTO <table> SELECT #7253

Closed
swingbit opened this issue Feb 18, 2022 · 36 comments
Closed

Extremely slow INSERT INTO <table> SELECT #7253

swingbit opened this issue Feb 18, 2022 · 36 comments
Assignees
Labels
bug
Milestone

Comments

@swingbit
Copy link

@swingbit swingbit commented Feb 18, 2022

Describe the bug
An INSERT INTO <table> SELECT .. statement that takes less than 1 second in 11.39 takes about 45 minutes in 11.43.

I could not yet make a reproducible script for the issue. I see it happening regularly on a production ETL.
The query is

INSERT INTO relations SELECT * FROM newlinks;

where:

sql> \d newlinks
CREATE VIEW newlinks AS
SELECT contract as subject, (SELECT id FROM dict WHERE idstr = 'supplier') as predicate, partner as object, CAST(1 as DOUBLE) as prob
FROM contract_partner;

sql>\d relations
CREATE TABLE "spinque"."relations" (
	"subject"   INTEGER,
	"predicate" INTEGER,
	"object"    INTEGER,
	"prob"      DOUBLE
);
ALTER TABLE "spinque"."relations" ALTER COLUMN "prob" SET DEFAULT 1.000000;

sql>\d contract_partner
CREATE TABLE "spinque"."contract_partner" (
	"cotract"   INTEGER,
	"partner" INTEGER,
);

What gdb says:

#0  HASHgetlink (i=143692107, h=0x7ffa800aec30) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/gdk/gdk_hash.h:137
#1  HASHdelete_locked (b=b@entry=0x7ffab8150100, p=p@entry=85696165, v=v@entry=0x7ff831007a94) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/gdk/gdk_hash.c:1314
#2  0x00007ffadd4ab8a9 in BATappend_or_update (b=b@entry=0x7ffab8150100, p=p@entry=0x7ffa8fe18cb0, positions=0x7ffa35d40ee8, positions@entry=0x0, n=n@entry=0x7ffa8fe1b050, mayappend=mayappend@entry=true, autoincr=<optimized out>, autoincr@entry=false, force=true) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/gdk/gdk_batop.c:1556
#3  0x00007ffadd4ae0ab in BATupdate (b=b@entry=0x7ffab8150100, p=p@entry=0x7ffa8fe18cb0, n=n@entry=0x7ffa8fe1b050, force=force@entry=true) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/gdk/gdk_batop.c:1622
#4  0x00007ffacee6230c in delta_append_bat (tr=tr@entry=0x7ffaa0213630, batp=batp@entry=0x7ffac47ac6a8, id=<optimized out>, offset=<optimized out>, offset@entry=0, offsets=offsets@entry=0x7ffa8fe18cb0, i=i@entry=0x7ffa8fe1b050, storage_type=0x0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:2074
#5  0x00007ffacee651f1 in append_col_execute (cnt=134430, tt=5, storage_type=<optimized out>, incoming_data=0x7ffa8fe1b050, offsets=0x7ffa8fe18cb0, offset=0, id=<optimized out>, delta=0x7ffac47ac6a8, tr=0x7ffaa0213630) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:2183
#6  append_col_execute (storage_type=<optimized out>, tt=5, cnt=134430, incoming_data=0x7ffa8fe1b050, offsets=0x7ffa8fe18cb0, offset=0, id=<optimized out>, delta=0x7ffac47ac6a8, tr=0x7ffaa0213630) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:2174
#7  append_col (tr=0x7ffaa0213630, c=0x7ffab814ffd0, offset=0, offsets=0x7ffa8fe18cb0, data=0x7ffa8fe1b050, cnt=134430, tpe=5) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:2212
#8  0x00007ffaced80645 in mvc_append_wrap (cntxt=<optimized out>, mb=<optimized out>, stk=<optimized out>, pci=<optimized out>) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/backends/monet5/sql.c:1784
#9  0x00007ffadd8f9ba3 in runMALsequence (cntxt=cntxt@entry=0x13a1dd0, mb=mb@entry=0x7ffaa0150090, startpc=startpc@entry=1, stoppc=stoppc@entry=0, stk=stk@entry=0x7ffa93a1ac30, env=env@entry=0x0, pcicaller=0x0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_interpreter.c:656
#10 0x00007ffadd8fb59b in runMAL (cntxt=cntxt@entry=0x13a1dd0, mb=mb@entry=0x7ffaa0150090, mbcaller=mbcaller@entry=0x0, env=env@entry=0x0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_interpreter.c:335
#11 0x00007ffaced8d85b in SQLrun (c=c@entry=0x13a1dd0, m=m@entry=0x7ffaa015c4f0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/backends/monet5/sql_execute.c:246
#12 0x00007ffaced8e79b in SQLengineIntern (c=0x13a1dd0, be=0x7ffaa0213730) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/backends/monet5/sql_execute.c:699
#13 0x00007ffadd90b297 in runPhase (phase=4, c=0x13a1dd0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_scenario.c:451
#14 runPhase (phase=4, c=0x13a1dd0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_scenario.c:446
#15 runScenarioBody (once=<optimized out>, c=<optimized out>) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_scenario.c:477
#16 runScenario (c=c@entry=0x13a1dd0, once=once@entry=0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_scenario.c:508
#17 0x00007ffadd90b7e2 in MSserveClient (c=c@entry=0x13a1dd0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_session.c:531
#18 0x00007ffadd90bea7 in MSscheduleClient (command=command@entry=0x7ffaa010a5b0 "", challenge=challenge@entry=0x7ffac47acd83 "L7BcGu7iGu9", fin=0x7ffac80063e0, fout=fout@entry=0x7ffac8003de0, protocol=protocol@entry=PROTOCOL_9, blocksize=blocksize@entry=8190) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_session.c:388
#19 0x00007ffadd98c8ae in doChallenge (data=<optimized out>) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/modules/mal/mal_mapi.c:220
#20 0x00007ffadd4f98aa in THRstarter (a=0x7ffac80062c0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/gdk/gdk_utils.c:1638
#21 0x00007ffadd539229 in thread_starter (arg=0x7ffac800c130) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/gdk/gdk_system.c:833
#22 0x00007ffadcedfa87 in start_thread (arg=<optimized out>) at pthread_create.c:435
#23 0x00007ffadcf638d4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

This is b:

(gdb) p *b
$2 = {hseqbase = 0, creator_tid = 65, batCacheid = 439, batCopiedtodisk = true, batDirtyflushed = false, batDirtydesc = false, batTransient = false, batRestricted = 1 '\001', batRole = PERSISTENT, unused = 0, batSharecnt = 0, batInserted = 221697766, batCount = 221697766, batCapacity = 232488960, T = {
    id = 0x7ffadd6d3aa3 "t", width = 4, type = 6 '\006', shift = 2 '\002', varsized = false, key = false, nonil = true, nil = false, sorted = false, revsorted = false, nokey = {0, 0}, nosorted = 0, norevsorted = 0, minpos = 9223372036854775807, maxpos = 9223372036854775807, unique_est = 30862161, 
    seq = 9223372036854775808, heap = 0x7ffab81502d0, baseoff = 0, vheap = 0x0, hash = 0x7ffa800aec30, imprints = 0x0, orderidx = 0x0, strimps = 0x0, props = 0x0}, theaplock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
          __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, name = "heaplock439", '\000' <repeats 20 times>}, thashlock = {lock = {__data = {__readers = 3, __writers = 0, __wrphase_futex = 1, __writers_futex = 1, __pad3 = 0, __pad4 = 0, __cur_writer = 6208, __shared = 0, __rwelision = 0 '\000', 
        __pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, __size = "\003\000\000\000\000\000\000\000\001\000\000\000\001", '\000' <repeats 11 times>, "@\030", '\000' <repeats 29 times>, __align = 3}, name = "hashlock439", '\000' <repeats 20 times>}, batIdxLock = {lock = {__data = {__lock = 0, 
        __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, name = "BATlock439", '\000' <repeats 21 times>}}

This is n:

(gdb) p *n
$4 = {hseqbase = 0, creator_tid = 38450, batCacheid = 515, batCopiedtodisk = false, batDirtyflushed = false, batDirtydesc = true, batTransient = true, batRestricted = 1 '\001', batRole = TRANSIENT, unused = 0, batSharecnt = 0, batInserted = 0, batCount = 134430, batCapacity = 134656, T = {id = 0x7ffadd6d3aa3 "t", 
    width = 4, type = 6 '\006', shift = 2 '\002', varsized = false, key = false, nonil = true, nil = false, sorted = true, revsorted = false, nokey = {0, 0}, nosorted = 0, norevsorted = 0, minpos = 9223372036854775807, maxpos = 9223372036854775807, unique_est = 0, seq = 9223372036854775808, heap = 0x7ffa8fe55fa0, 
    baseoff = 0, vheap = 0x0, hash = 0x0, imprints = 0x0, orderidx = 0x0, strimps = 0x0, props = 0x0}, theaplock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
    name = "heaplock515", '\000' <repeats 20 times>}, thashlock = {lock = {__data = {__readers = 0, __writers = 0, __wrphase_futex = 0, __writers_futex = 0, __pad3 = 0, __pad4 = 0, __cur_writer = 0, __shared = 0, __rwelision = 0 '\000', __pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, 
      __size = '\000' <repeats 55 times>, __align = 0}, name = "hashlock515", '\000' <repeats 20 times>}, batIdxLock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
      __align = 0}, name = "BATlock515", '\000' <repeats 21 times>}}

This is p:

(gdb) p *p
$5 = {hseqbase = 0, creator_tid = 38450, batCacheid = 520, batCopiedtodisk = false, batDirtyflushed = false, batDirtydesc = true, batTransient = true, batRestricted = 1 '\001', batRole = TRANSIENT, unused = 0, batSharecnt = 0, batInserted = 0, batCount = 134430, batCapacity = 134656, T = {id = 0x7ffadd6d3aa3 "t", 
    width = 8, type = 7 '\a', shift = 3 '\003', varsized = false, key = true, nonil = true, nil = false, sorted = true, revsorted = false, nokey = {0, 0}, nosorted = 0, norevsorted = 0, minpos = 9223372036854775807, maxpos = 9223372036854775807, unique_est = 0, seq = 9223372036854775808, heap = 0x7ffa8fe4ff60, 
    baseoff = 0, vheap = 0x0, hash = 0x0, imprints = 0x0, orderidx = 0x0, strimps = 0x0, props = 0x0}, theaplock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
    name = "heaplock520", '\000' <repeats 20 times>}, thashlock = {lock = {__data = {__readers = 0, __writers = 0, __wrphase_futex = 0, __writers_futex = 0, __pad3 = 0, __pad4 = 0, __cur_writer = 0, __shared = 0, __rwelision = 0 '\000', __pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, 
      __size = '\000' <repeats 55 times>, __align = 0}, name = "hashlock520", '\000' <repeats 20 times>}, batIdxLock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
      __align = 0}, name = "BATlock520", '\000' <repeats 21 times>}}

It seems that the time is spent appending the new values one by one, calling HASHdelete_locked() and HASHinsert_locked() every time.

I don't really understand why this operation doesn't just result in a simple BATappend().
What in the original query should trigger a row-at-a-time update?

Software versions

  • MonetDB version number 11.43.10
  • OS and version: Fedora 35
  • self-installed and compiled
@PedroTadim
Copy link
Contributor

@PedroTadim PedroTadim commented Feb 18, 2022

Two questions. Can you show the MAL plan? Is the insert also slow on version 11.41?

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 18, 2022

Inserting data into a table nowadays does a sort of "auto vacuum". Unused slots (due to previous deletes) may get reused for new data. That is why we don't simply do an append, but do an update instead. The update may well end up appending the data if no slots were available.
Given that your stack trace shows HASHdelete_locked, I'm guessing hash maintenance is taking a lot of time. Perhaps we should simple delete the hash if too many values are updated. Then of course the question is, where is the boundary?

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

Here's the trace (75 minutes this time):

sql>trace INSERT INTO relations SELECT * FROM newlinks;
134430 affected rows
+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| usec       | statement                                                                                                                                                                                  |
+============+============================================================================================================================================================================================+
|          9 |     X_1=0@0:void := querylog.define("trace insert into \"relations\" select * from newlinks;":str, "sequential_pipe":str, 42:int);                                                        |
|          2 |     X_4=0:int := sql.mvc();                                                                                                                                                                |
|         29 |     C_5=[134430]:bat[:oid] := sql.tid(X_4=0:int, "spinque":str, "contract_partner":str);                                                                                                   |
|         21 |     X_8=[134430]:bat[:int] := sql.bind(X_4=0:int, "spinque":str, "contract_partner":str, "contract":str, 0:int);                                                                           |
|          7 |     X_11=[134430]:bat[:int] := sql.bind(X_4=0:int, "spinque":str, "contract_partner":str, "partner":str, 0:int);                                                                           |
|          6 |     C_13=[56645471]:bat[:oid] := sql.tid(X_4=0:int, "spinque":str, "globaldict":str);                                                                                                      |
|          8 |     X_15=[56645471]:bat[:str] := sql.bind(X_4=0:int, "spinque":str, "globaldict":str, "idstr":str, 0:int);                                                                                 |
|          6 |     (X_17=[0]:bat[:oid], X_18=[0]:bat[:str]) := sql.bind(X_4=0:int, "spinque":str, "globaldict":str, "idstr":str, 2:int);                                                                  |
|          2 |     X_20=[56645471]:bat[:str] := sql.delta(X_15=[56645471]:bat[:str], X_17=[0]:bat[:oid], X_18=[0]:bat[:str]);                                                                             |
|         10 |     X_21=[56645471]:bat[:int] := sql.bind(X_4=0:int, "spinque":str, "globaldict":str, "id":str, 0:int);                                                                                    |
|          5 |     (X_23=[0]:bat[:oid], X_24=[0]:bat[:int]) := sql.bind(X_4=0:int, "spinque":str, "globaldict":str, "id":str, 2:int);                                                                     |
|          6 |     X_26=[56645471]:bat[:int] := sql.delta(X_21=[56645471]:bat[:int], X_23=[0]:bat[:oid], X_24=[0]:bat[:int]);                                                                             |
|         38 |     C_29=[1]:bat[:oid] := algebra.thetaselect(X_20=[56645471]:bat[:str], C_13=[56645471]:bat[:oid], "supplier":str, "==":str); # hashselect on parent                                      |
|          9 |     X_32=[1]:bat[:int] := algebra.projection(C_29=[1]:bat[:oid], X_26=[56645471]:bat[:int]);                                                                                               |
|          4 |     X_33=[134430]:bat[:int] := algebra.projection(C_5=[134430]:bat[:oid], X_8=[134430]:bat[:int]);                                                                                         |
|          4 |     X_34=[134430]:bat[:int] := algebra.projection(C_5=[134430]:bat[:oid], X_11=[134430]:bat[:int]);                                                                                        |
|       2503 |     (X_35=[134430]:bat[:oid], X_36=[134430]:bat[:oid]) := algebra.crossproduct(X_33=[134430]:bat[:int], X_32=[1]:bat[:int], true:bit);                                                     |
|         13 |     C_38=[134430]:bat[:oid] := bat.mirror(X_33=[134430]:bat[:int]);                                                                                                                        |
|         80 |     C_39=[0]:bat[:oid] := algebra.difference(C_38=[134430]:bat[:oid], X_35=[134430]:bat[:oid], nil:BAT, nil:BAT, false:bit, false:bit, nil:lng); # leftjoin; mergejoin_void; select: dense |
|          9 |     X_45=[134430]:bat[:int] := algebra.projection(X_35=[134430]:bat[:oid], X_33=[134430]:bat[:int]);                                                                                       |
|          6 |     X_46=[0]:bat[:int] := algebra.projection(C_39=[0]:bat[:oid], X_33=[134430]:bat[:int]);                                                                                                 |
|        195 |     X_47=[134430]:bat[:int] := bat.append(X_45=[134430]:bat[:int], X_46=[0]:bat[:int], true:bit);                                                                                          |
|          6 |     X_48=[134430]:bat[:int] := algebra.projection(X_35=[134430]:bat[:oid], X_34=[134430]:bat[:int]);                                                                                       |
|          5 |     X_49=[0]:bat[:int] := algebra.projection(C_39=[0]:bat[:oid], X_34=[134430]:bat[:int]);                                                                                                 |
|        198 |     X_50=[134430]:bat[:int] := bat.append(X_48=[134430]:bat[:int], X_49=[0]:bat[:int], true:bit);                                                                                          |
|        391 |     X_51=[134430]:bat[:int] := algebra.projection(X_36=[134430]:bat[:oid], X_32=[1]:bat[:int]); # project1_int                                                                             |
|         11 |     X_54=[0]:bat[:int] := algebra.project(C_39=[0]:bat[:oid], nil:int);                                                                                                                    |
|        140 |     X_55=[134430]:bat[:int] := bat.append(X_51=[134430]:bat[:int], X_54=[0]:bat[:int], true:bit);                                                                                          |
|        172 |     X_59=[134430]:bat[:dbl] := algebra.project(X_47=[134430]:bat[:int], 1:dbl);                                                                                                            |
|          3 |     X_60=134430:lng := aggr.count(X_47=[134430]:bat[:int]);                                                                                                                                |
|     235699 |     (X_61=0@0:oid, X_62=[134430]:bat[:oid]) := sql.claim(X_4=0:int, "spinque":str, "relations":str, X_60=134430:lng);                                                                     |
|    2806401 |     X_66=0:int := sql.append(X_4=0:int, "spinque":str, "relations":str, "subject":str, X_61=0@0:oid, X_62=[134430]:bat[:oid], X_47=[134430]:bat[:int]);                                   |
|     155852 |     X_69=0:int := sql.append(X_4=0:int, "spinque":str, "relations":str, "predicate":str, X_61=0@0:oid, X_62=[134430]:bat[:oid], X_55=[134430]:bat[:int]);                                 |
| 4476792972 |     X_72=0:int := sql.append(X_4=0:int, "spinque":str, "relations":str, "object":str, X_61=0@0:oid, X_62=[134430]:bat[:oid], X_50=[134430]:bat[:int]);                                    |
|     274721 |     X_75=0:int := sql.append(X_4=0:int, "spinque":str, "relations":str, "prob":str, X_61=0@0:oid, X_62=[134430]:bat[:oid], X_59=[134430]:bat[:dbl]);                                      |
|         35 |     X_77=0@0:void := sql.depend("spinque":str, "relations":str, X_60=134430:lng);                                                                                                         |
|         32 |     X_79=0:int := sql.affectedRows(X_75=0:int, X_60=134430:lng);                                                                                                                           |
+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
37 tuples

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

@sjoerdmullender Indeed many rows were deleted from the receiving table.

@PedroTadim at the moment I don't have a 11.41 deployment where to test this and it takes hours to get to the state where this happens. So unless really necessary I would avoid that.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 18, 2022

It looks like some 134000 rows were added. How big is the receiving table?

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

The receiving table is 130301154 221697766

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 18, 2022

While in the debugger, can you also print *b->thash (aka *b->T.hash -- I recommend compiling with -ggdb3)

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

This operation was in a transaction. To test again and check what you asked, I rolled back and restarted the same transaction.
This time it took 4 seconds.

Is it perhaps because the receiving bat had had its gaps filled by the previous attempt? Even though that transaction was rolled back?

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

Is CMAKE_BUILD_TYPE=RelWithDebInfo enough to enable ggdb3? I'm afraid not, right?

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 18, 2022

Also add -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -ggdb3"

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

This happened yesterday already and I hoped it was a glitch. But it seems to happen consistently after this INSERT INTO takes place.
If I stop the db and restart it, it hangs here (with 100% CPU):

#0  0x00007f920bd37b3c in seg_delete_range (tr=tr@entry=0x21717d70, t=t@entry=0x21aafcf0, s=s@entry=0x21ab0440, Seg=Seg@entry=0x7ffccc16b488, start=10290288, cnt=1) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:2311
#1  0x00007f920bd3bba4 in delete_range (cnt=<optimized out>, start=<optimized out>, s=0x21ab0440, t=0x21aafcf0, tr=0x21717d70) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:2337
#2  load_storage (id=<optimized out>, s=0x21ab0440, t=0x21aafcf0, tr=0x21717d70) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:3065
#3  create_del (tr=0x21717d70, t=0x21aafcf0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/bat/bat_storage.c:3121
#4  0x00007f920bd1bcca in load_table (tid=8763, rt_triggercols=<optimized out>, rt_triggers=<optimized out>, rt_keycols=<optimized out>, rt_keys=<optimized out>, rt_idxcols=<optimized out>, rt_idx=<optimized out>, rt_cols=<optimized out>, rt_parts=<optimized out>, rt_tables=<optimized out>, s=<optimized out>, 
    tr=0x21717d70) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/store.c:717
#5  load_schema (tr=tr@entry=0x21717d70, rt_schemas=rt_schemas@entry=0x200f62d0, rt_tables=rt_tables@entry=0x2184bf60, rt_parts=rt_parts@entry=0x21801910, rt_cols=rt_cols@entry=0x21826870, rt_idx=rt_idx@entry=0x20024ca0, rt_idxcols=<optimized out>, rt_keys=<optimized out>, rt_keycols=<optimized out>, 
    rt_triggers=<optimized out>, rt_triggercols=<optimized out>) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/store.c:1112
#6  0x00007f920bd24bbe in load_trans (tr=0x21717d70) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/store.c:1319
#7  store_load (pa=<optimized out>, store=<optimized out>) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/store.c:2037
#8  store_init (debug=debug@entry=0, store_tpe=store_tpe@entry=store_bat, readonly=readonly@entry=0, singleuser=singleuser@entry=0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/storage/store.c:2125
#9  0x00007f920bd5f293 in mvc_init (debug=0, store_tpe=store_bat, ro=ro@entry=0, su=su@entry=0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/server/sql_mvc.c:141
#10 0x00007f920bc643d7 in SQLinit (c=0x11682c0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/backends/monet5/sql_scenario.c:408
#11 SQLprelude (cntxt=0x11682c0, mb=<optimized out>, stk=<optimized out>, pci=<optimized out>) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/sql/backends/monet5/sql_scenario.c:138
#12 0x00007f921a7e747c in malIncludeModules (c=c@entry=0x11682c0, modules=modules@entry=0x7ffccc16bd00, listing=listing@entry=0, no_mapi_server=no_mapi_server@entry=0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_prelude.c:488
#13 0x00007f921a7e57b6 in malBootstrap (modules=modules@entry=0x7ffccc16bd00, embedded=embedded@entry=0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal_session.c:57
#14 0x00007f921a7c1ebc in mal_init (modules=modules@entry=0x7ffccc16bd00, embedded=embedded@entry=0) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/monetdb5/mal/mal.c:90
#15 0x00000000004031a5 in main (argc=<optimized out>, av=<optimized out>) at /opt/monetdb/SpinqueMonetDB-11.43.202202171906/tools/mserver/mserver5.c:809

When it happened yesterday I let it go the whole night, but it never completed.

Can it be related?

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

Ouch!

The problem above (stuck at startup) doesn't happen after the INSERT INTO, but after the big receiving table has had the deletions.

I had taken a tar of the database in the state it was just before the INSERT INTO, to make sure I could reproduce the same conditions.
Unfortunately it doesn't load anymore (issue above at startup).

So I need to recreate the db from scratch, it will take a couple of hours :/

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 18, 2022

Did you tar a running database? If so, you might want to consider the snaphot command in monetdb.

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

Yes, I did it with the hot snapshot.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 18, 2022

One of the things I wanted to know is how many duplicates there are, which translates into an estimate of the hash chain lengths. But I realized I might be able to get this information from the already provided information.
b->batCount = 221697766, b->unique_est = 30862161, so if the estimate is somewhat accurate, and there aren't too many real collisions (different values, same bucket) then the average chain length is at least 7.
Both the HASHdelete and the HASHinsert functions need to go through at least part of the chain for every time they get called. They need to find the value being deleted and keep the chain in the correct order when inserting, and they need to see whether there are other elements in the chain with the same value (to maintain the nunique value in the hash structure).

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

I will be able to be back in gdb in a few minutes I think.

In the meantime, I am also worried about the startup issue, which does not depend on the insert.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 18, 2022

Yes, doesn't sound too good.
Can you make a separate ticket, and perhaps somehow share the tar (privately)?

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

Yes, I'll make a separate ticket for the other issue.

Here it is:

(gdb) p *b->T.hash
$1 = {type = 6, width = 4 '\004', mask1 = 67108863, mask2 = 134217727, nbucket = 121867298, nunique = 30861887, nheads = 30861887, Bckt = 0x7feabe8f0038, Link = 0x7feadb9e0000, heaplink = {free = 886791064, size = 886833152, base = 0x7feadb9e0000 "\377\377\377\377\377\377\377\377", 
    filename = "05/542.thashl", '\000' <repeats 26 times>, refs = 0, farmid = 0 '\000', cleanhash = false, dirty = true, remove = false, wasempty = false, storage = STORE_MMAP, newstorage = STORE_MMAP, parentid = 354}, heapbckt = {free = 487469248, size = 487522304, base = 0x7feabe8f0000 "\004", 
    filename = "05/542.thashb", '\000' <repeats 26 times>, refs = 0, farmid = 0 '\000', cleanhash = false, dirty = true, remove = false, wasempty = false, storage = STORE_MMAP, newstorage = STORE_MMAP, parentid = 354}}

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 18, 2022

Possibly related (and reproducible): #7254

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 24, 2022

@sjoerdmullender is this information enough? I needed to rollback the deployment where I observed this, but I can reproduce it if we need more info.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 24, 2022

Can you run the slow query (and perhaps also the fast one) after the query SELECT sys.debug(3145728); (revert to normal behavior with SELECT sys.debug(0);)? You should find the output in the file mdbtrace.log inside the database directory.
Alternatively, start mserver5 with option -d3145728, but all other queries will then also produce copious output.

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 24, 2022

Result of

SELECT sys.debug(3145728);
INSERT INTO relations SELECT * FROM newlinks;
2022-02-24 14:58:48    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.claim
2022-02-24 14:58:48    M_DEBUG ALGO     client1              gdk/gdk_bat.c:290                              COLnew2              -> tmp_573#0@0[oid]TSRN
2022-02-24 14:58:48    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 14:58:48    M_DEBUG ACCELERATOR client1              gdk/gdk_hash.c:543                             BATcheckhash         tmp_540#221697766@0[int]P!s!r!kH: reusing persisted hash
2022-02-24 14:58:48    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_540#221697766@0[int]PH,tmp_573#134430@0[oid]TSKN,tmp_607#134430@0[int]TN) 206598 usec
2022-02-24 14:58:48    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 14:58:49    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_541#221697766@0[int]P,tmp_573#134430@0[oid]TSKN,tmp_702#134430@0[int]TSR) 91419 usec
2022-02-24 14:58:49    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 14:58:49    M_DEBUG ACCELERATOR client1              gdk/gdk_hash.c:543                             BATcheckhash         tmp_542#221697766@0[int]P!s!rH: reusing persisted hash
2022-02-24 15:36:19    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_542#221697766@0[int]PH,tmp_573#134430@0[oid]TSKN,tmp_572#134430@0[int]TSN) 2250483646 usec
2022-02-24 15:36:19    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 15:36:19    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_543#221697766@0[dbl]P,tmp_573#134430@0[oid]TSKN,tmp_616#134430@0[dbl]TSRN) 167507 usec
2022-02-24 15:36:19    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.depend
2022-02-24 15:36:19    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.affectedRows

Then the same, after a rollback:

2022-02-24 15:45:35    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.claim
2022-02-24 15:45:35    M_DEBUG ALGO     client1              gdk/gdk_bat.c:290                              COLnew2              -> tmp_704#0@0[oid]TSRN
2022-02-24 15:45:35    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 15:45:35    M_DEBUG ACCELERATOR client1              gdk/gdk_hash.c:585                             BATcheckhash         tmp_540#221697766@0[int]PH: already has hash, waited 0 usec
2022-02-24 15:45:35    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_540#221697766@0[int]PH,tmp_704#134430@0[oid]TSKN,tmp_573#134430@0[int]TN) 98759 usec
2022-02-24 15:45:35    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 15:45:35    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_541#221697766@0[int]P,tmp_704#134430@0[oid]TSKN,tmp_611#134430@0[int]TSR) 23394 usec
2022-02-24 15:45:35    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 15:45:35    M_DEBUG ACCELERATOR client1              gdk/gdk_hash.c:585                             BATcheckhash         tmp_542#221697766@0[int]PH: already has hash, waited 0 usec
2022-02-24 15:45:38    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_542#221697766@0[int]PH,tmp_704#134430@0[oid]TSKN,tmp_607#134430@0[int]TSN) 2536721 usec
2022-02-24 15:45:38    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-24 15:45:38    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_543#221697766@0[dbl]P,tmp_704#134430@0[oid]TSKN,tmp_546#134430@0[dbl]TSRN) 37990 usec
2022-02-24 15:45:38    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.depend
2022-02-24 15:45:38    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.affectedRows

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 25, 2022

I'm trying to replicate the insert/replace issue in a reproducible script, but no luck so far.

The script should more or less be like:

  • create large bat
  • delete a sizeable portion
  • append another sizeable bat
  • everything with hash maintenance on the receiving bat

The debug output above would suggest that the occurs when a persistent hash is available on the receiving bat.
I can trigger the creation of a hash table with a point query, but how to trigger that it is persistent? analyze didn't help either.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 25, 2022

The hash should be persistent automatically, but only if the table on which it is created is already persistent. And there probably lies the rub. It may well be that you have to restart the server inbetween, i.e. after creating and filling the table, and before the point query.

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 25, 2022

This is the script I have so far (still not triggering the issue):

  • create 100M random numbers with 10M unique values
start transaction;

drop table if exists x;
CREATE TABLE x AS
  SELECT rand() % (10 * 1000 * 1000) as a1
  FROM sys.generate_series(1, 100 * 1000 * 1000);

commit;
  • Trigger creation of hash table
select count(*) from x where a1=1000;
  • Delete about 10%
start transaction;

DELETE FROM x
  WHERE a1 IN (
    SELECT rand() % (10 * 1000 * 1000)
    FROM sys.generate_series(1, 1000 * 1000)
  );

commit;
  • create a 100K table and append it to the previous one
start transaction;

drop table if exists t;
CREATE TABLE t AS
  SELECT rand() as a1
  FROM sys.generate_series(1, 100 * 1000);

set optimizer='sequential_pipe';
trace INSERT INTO x SELECT * FROM t;

rollback;

I have tried to stop ad restart the server in between each of these steps, the hash table never seems to be persistent (I can see the hash with sys.storage() after the point query, but then it disappears).

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 25, 2022

  1. create and fill the table as above.
  2. restart server.
  3. select * from x where a1 = 10*1000*1000; -- does not occur, but mserver doesn't know
    after this I have a persistent hash table.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 25, 2022

The difference between the queries is that the count(*) splits the table in mitosis pieces, whereas the straight select doesn't. Why the difference, I don't know.

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 25, 2022

Just to be sure: how much should I trust what sys.storage() says?

If I do what you said, after the point query I can see a hash in sys.storage().
But if I stop and restart the server again, hashes is back to 0.

In any case, the 3 steps above + deletion + insert still do not trigger the issue unfortunately.

@njnes njnes added the bug label Feb 25, 2022
@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 25, 2022

I didn't look at sys.storage(), I looked at the file system. After my set of queries, there were two large hash related files for a bat (*.thashl and *.tashb) which stay around.
I guess sys.storage only looks at loaded hash files, and persistent hashes aren't necessarily loaded (they will be when they're getting used).

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 25, 2022

Indeed, I can confirm that the hashes are persistent.

But apparently something is still missing to trigger the issue.
This is the debug output on the INSERT INTO with the non-failing script:

2022-02-25 16:14:54    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.claim
2022-02-25 16:14:54    M_DEBUG ALGO     client1              gdk/gdk_bat.c:290                              COLnew2              -> tmp_501#0@0[oid]TSRN
2022-02-25 16:14:54    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.append
2022-02-25 16:14:54    M_DEBUG ACCELERATOR client1              gdk/gdk_hash.c:543                             BATcheckhash         tmp_471#99999999@0[int]P!s!rNH: reusing persisted hash
2022-02-25 16:14:55    M_DEBUG ALGO     client1              gdk/gdk_batop.c:1594                           BATappend_or_update  BATreplace(tmp_471#99999999@0[int]PNH,tmp_501#99999@0[oid]TSKN,tmp_504#99999@0[int]VN) 580179 usec
2022-02-25 16:14:55    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.depend
2022-02-25 16:14:55    M_DEBUG ALGO     client1              monetdb5/mal/mal_interpreter.c:655             runMALsequence       calling sql.affectedRows

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 25, 2022

However, now that the issue slow start issue is solved, I can bootstrap from the tar backup of the original db, so I can have the issue under gdb in a few minutes.

Please let me know if there is something I can still check.
Actually, if you are willing to, we can even arrange a remote gdb session.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 25, 2022

A remote gdb session would probably be fruitful. But let's not do that today, it's getting a bit late.

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 25, 2022

No, of course.

I am actually right now trying a different approach.
I cannot share the whole tar, but if I load the tar and drop all tables except those two involved in the insert, that would be fine (they are only integers).
Then I could take another tar and share it.
Trying that, will let you know.

@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 28, 2022

I now know what causes the problem. You have a lot of duplicate values in the column of which a whole bunch were deleted. Now that you're trying to insert new values, those slots get reused. It is at this point that the old value is deleted from the hash table. In order to find the value in the chain, we have to go through a long list because of all the duplicates. This apparently happens for a lot of slots, so that is taking a lot of time. If you're adding new values that also already occur many times, we will need to go through the list again to find the proper place to add a link in the chain (the chains are ordered).
I have to think about how to fix this (the fix will involve throwing out the hash table, the thinking is about when to do that and when not).

I just checked, the value that had been deleted and is now being replaced is 23607200. There are still 15643023 occurrences of that value in the table, not counting the already deleted ones. That's a very long chain to have to go through, so perhaps this is a possible trigger to delete the hash.

@swingbit swingbit changed the title Extremely slow INSERT INTO AS SELECT Extremely slow INSERT INTO <table> SELECT Feb 28, 2022
monetdb-team pushed a commit that referenced this issue Feb 28, 2022
@sjoerdmullender
Copy link
Member

@sjoerdmullender sjoerdmullender commented Feb 28, 2022

I pushed a fix. The fix involves throwing away the hash table in certain conditions (more than 1000 links in the collission/duplicates list). This value of 1000 is tunable using the --set hash_destroy_chain_length=<n> option.
So, give it a whirl.

@swingbit
Copy link
Author

@swingbit swingbit commented Feb 28, 2022

Yep, seems fixed indeed, thanks!

@sjoerdmullender sjoerdmullender added this to the NEXTRELEASE milestone Feb 28, 2022
@sjoerdmullender sjoerdmullender removed this from the NEXTRELEASE milestone Apr 8, 2022
@sjoerdmullender sjoerdmullender added this to the Jan2022-SP2 milestone Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug
Projects
None yet
Development

No branches or pull requests

4 participants