Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.storage call can take a long time #4040

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

sys.storage call can take a long time #4040

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2016-07-26 16:18:44 +0200
From: anthonin.bonnefoy
To: SQL devs <>
Version: 11.23.7 (Jun2016-SP1)
CC: frederic.jolliton+monetdb, @mlkersten, @njnes

Last updated: 2016-12-21 13:08:28 +0100

Comment 22261

Date: 2016-07-26 16:18:44 +0200
From: anthonin.bonnefoy

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build Identifier:

On database with a lot of tables with few datas, a sys.storage call can take a lot of time (from 10 minutes to 1 hour).

Reproducible: Always

Steps to Reproduce:

1.Fill a database with the given script

!/bin/bash
set -e

data="TOTO 1467287718762007 1467287721544850 7 250 149825120988417 198620281134081 0 0 3241196578 3227017985 64483 57837 0 0 0 0 92 0 0 Ethernet/IPv4/IPv6/TCP be9dccdb-61ed-40e4-96d9-c958ec421096 314 0 3 0 0 0 3 0 314 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 144936151 20211589026057001 0 0 0 0 0 0 0 0 0 "

rm -f /tmp/data
for (( i = 0; i < 100; i++ )); do
echo "$data" >> /tmp/data
done

for (( i = 0; i < 10000; i++ )); do
table="test_$i"
query="$query CREATE TABLE $table ( a1 TEXT, a2 BIGINT, a3 BIGINT, a4 SMALLINT, a5 SMALLINT, a6 INT, a7 INT, a8 BIGINT, a9 BIGINT, a10 INT, a11 INT, a12 BIGINT, a13 HUGEINT, a14 BIGINT, a15 HUGEINT, a16 BIGINT, a17 HUGEINT, a18 INT, a19 INT, a20 SMALLINT, a21 SMALLINT, a22 SMALLINT, a23 SMALLINT, a24 INT, a25 INT, a26 TEXT, a27 INT, a28 TEXT, a29 UUID, a30 BIGINT, a31 BIGINT, a32 BIGINT, a33 BIGINT, a34 BIGINT, a35 BIGINT, a36 BIGINT, a37 BIGINT, a38 BIGINT, a39 BIGINT, a40 BIGINT, a41 BIGINT, a42 BIGINT, a43 BIGINT, a44 BIGINT, a45 BIGINT, a46 BIGINT, a47 BIGINT, a48 BIGINT, a49 BIGINT, a50 BIGINT, a51 INT, a52 INT, a53 BIGINT, a54 BIGINT, a55 HUGEINT, a56 BIGINT, a57 BIGINT, a58 HUGEINT, a59 BIGINT, a60 BIGINT, a61 HUGEINT, a62 BIGINT, a63 BIGINT, a64 HUGEINT, a65 BIGINT, a66 BIGINT, a67 HUGEINT, a68 BIGINT, a69 BIGINT, a70 HUGEINT, a71 BIGINT, a72 BIGINT, a73 HUGEINT, a74 BIGINT, a75 BIGINT, a76 HUGEINT, a77 UUID);"

 query="$query COPY INTO $table FROM '/tmp/data' DELIMITERS ' ','\n','\"' NULL AS '<NULL>';"

 if [[ $((i % 100)) == 0 ]]; then
     SECONDS=0
     mclient nova -s "$query" > /dev/null
     echo "Took $SECONDS seconds to insert"
     query=""
 fi
 if [[ $((i % 1000)) == 0 ]]; then
     time mclient nova -s "select * from storage;" > /dev/null
 fi

done

A backtrace during "select count(*) from sys.storage;" query gives most of the time

Thread 2 (Thread 0x7f84964f2700 (LWP 17652)):
0 0x00007f84fb8c9add in read () at ../sysdeps/unix/syscall-template.S:81
1 0x00007f84fca0ab01 in read (__nbytes=, __buf=, __fd=) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
2 GDKload (farmid=23, nme=0x7f8160b12070 "", ext=0x7f84fcbd52ab "theap", size=8760, maxsize=0x7f8160b12070, mode=1622220912) at gdk_storage.c:480
3 0x00007f84fc94c5cc in HEAPload_intern (trunc=, suffix=, ext=, nme=, h=) at gdk_heap.c:668
4 HEAPload (h=0x1874a50, nme=0x1874b10 "05/00/50020", ext=0x7f84fcbd52ab "theap", trunc=1618412448) at gdk_heap.c:678
5 0x00007f84fca0bdb7 in BATload_intern (i=i@entry=20496, lock=lock@entry=1) at gdk_storage.c:889
6 0x00007f84fc94011f in getBBPdescriptor (i=, lock=) at gdk_bbp.c:2527
7 0x00007f84fc9452cb in BBPdescriptor (i=) at gdk_bbp.c:2543
8 0x00007f84f8a904a6 in BATdescriptor (i=) at ../../../gdk/gdk.h:2673
9 temp_descriptor (b=) at bat_utils.c:35
10 0x00007f84f8a8d7e7 in delta_bind_bat (temp=0, access=, bat=0x7f84964f1770) at bat_storage.c:172
11 bind_col (tr=0x194cfe0, c=0x1ff, access=) at bat_storage.c:192
12 0x00007f84f8979572 in sql_storage (cntxt=, mb=, stk=0x7f81866792d0, pci=) at sql.c:4828
13 0x00007f84fcf4f1a0 in runMALsequence (cntxt=0x17, mb=0x7f8161f9efb0, startpc=8760, stoppc=-1, stk=0x7f81866792d0, env=0x28021, pcicaller=0x0) at mal_interpreter.c:631
14 0x00007f84fcf506bb in callMAL (cntxt=0x17, cntxt@entry=0x7f84f8f4a950, mb=0x7f8160b12070, mb@entry=0x7f8161f9efb0, env=0x7f84964f1bc8, argv=0x7f84964f1bd0, debug=111 'o') at mal_interpreter.c:447
15 0x00007f84f898022c in SQLexecutePrepared (c=0x7f84f8f4a950, be=be@entry=0x7f8177311370, q=0x7f8160f00d20) at sql_execute.c:328
16 0x00007f84f89807e9 in SQLengineIntern (c=0x7f84f8f4a950, be=0x7f8177311370) at sql_execute.c:390
17 0x00007f84fcf6c6f7 in runPhase (phase=4, c=0x7f84f8f4a950) at mal_scenario.c:515
18 runScenarioBody (c=c@entry=0x7f84f8f4a950) at mal_scenario.c:559
19 0x00007f84fcf6d43d in runScenario (c=c@entry=0x7f84f8f4a950) at mal_scenario.c:579
20 0x00007f84fcf6d968 in MSserveClient (dummy=dummy@entry=0x7f84f8f4a950) at mal_session.c:439
21 0x00007f84fcf6dfcf in MSscheduleClient (command=command@entry=0x7f84c086a770 "", challenge=challenge@entry=0x7f84964f1e60 "3eh7JgdWm", fin=0x7f8161c3d250, fout=fout@entry=0x7f81778d4cd0) at mal_session.c:319
22 0x00007f84fd01f336 in doChallenge (data=) at mal_mapi.c:184
23 0x00007f84fca24513 in thread_starter (arg=) at gdk_system.c:459
24 0x00007f84fb8c30a4 in start_thread (arg=0x7f84964f2700) at pthread_create.c:309
25 0x00007f84fb5f887d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Since most bat are small, they have a storage mode of STORE_MEM, so they are loaded in memory.
So, most of the tail files of the database are read.
Moreover, the loaded base is never used for sys.storage.

Comment 22311

Date: 2016-08-19 17:00:11 +0200
From: @njnes

Is this still the case on the Jun2016 version?

Comment 22338

Date: 2016-08-30 17:28:13 +0200
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>

Are you referring to the time it takes for querying storage()? Yes, it is still extremely slow with Jun2016.

After the script given in this issue is started and about 300-600 tables are created, the "select * from storage" query take a huge amount of time (while for fewer tables, the response was below 1s). After 30 minutes without a response, I gave up and stopped the process.

I was unable to run gdb due to some limitation I haven't figured, but however I checked some stats with strace for a few minutes range and I get:

$ strace -p -f -c
% time seconds usecs/call calls errors syscall


78.81 5.165505 71 73075 select
21.18 1.388000 1236 1123 fdatasync
0.01 0.000387 0 1243 close
0.01 0.000337 0 1260 17 open
0.00 0.000159 0 1110 write
0.00 0.000000 0 119 fstat
0.00 0.000000 0 136 119 mkdir


100.00 6.554388 78066 136 total

It seems that MonetDB perform lot and lot of fdatasync at this point, which slow the process enormously.

I assume the ~79% of select are for all threads waiting on some locks hold by the thread performing the fdatasync.

The version tested:

MonetDB 5 server v11.23.8 (64-bit, 64-bit oids, 128-bit integers)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2016 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 15.6GiB available memory, 4 available cpu cores
Libraries:
libpcre: 8.39 2016-06-14 (compiled with 8.39)
openssl: OpenSSL 1.0.2h 3 May 2016 (compiled with OpenSSL 1.0.2h 3 May 2016)
libxml2: 2.9.4 (compiled with 2.9.4)
Compiled by: fjolliton@localhost (x86_64-pc-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wjump-misses-init -Wmissing-include-dirs -Wlogical-op -Wunreachable-code -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64

Comment 22339

Date: 2016-08-30 17:28:27 +0200
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>

Version changed.

Comment 22340

Date: 2016-08-30 21:16:35 +0200
From: @mlkersten

FYI the storage() function has to access all columns of all tables.
So multiply your number of tables 600 with say 77 and you have 46200 store_funcs.bind_col(tr, c, RDONLY) calls effectively searching and opening a file.

Storage is not an internally maintained table, because it reflects the actual state of the column.

Comment 22341

Date: 2016-08-30 22:06:31 +0200
From: Frédéric Jolliton <<frederic.jolliton+monetdb>>

To give more data about what we observe, here is the pattern we got, using a modified version of the provided script (only the "echo" are changed):

Tables creation: 3 s
Total tables: 100
Querying sys.storage(): 1 s
Tables creation: 4 s
Total tables: 200
Querying sys.storage(): 0 s
Tables creation: 4 s
Total tables: 300
Querying sys.storage(): 843 s
Tables creation: 5 s
Total tables: 400
Querying sys.storage(): 0 s
Tables creation: 5 s
Total tables: 500
Querying sys.storage(): 0 s
Tables creation: 4 s
Total tables: 600
Querying sys.storage(): 1 s
Tables creation: 4 s
Total tables: 700
Querying sys.storage(): 1 s
Tables creation: 4 s
Total tables: 800
Querying sys.storage(): 1 s
Tables creation: 5 s
Total tables: 900
Querying sys.storage(): 1693 s
[..]

Usually it is very fast to query the whole storage table (under a second for ~900 tables and up to 6s for ~6000 tables). But sometimes, it take up to 20 minutes (in my last run).

Now, maybe it is not sys.storage() by itself, but the fact that the journal might be played at the same time (seeing all theses fdatasync) or something like that.

It you think it's a normal behavior, feel free to close the issue.

Comment 24463

Date: 2016-10-06 08:25:09 +0200
From: @njnes

the peaks are caused by the persistent log flush. Indeed in that case many fdatasyncs are done (which is needed to make sure the data reached the disk).
I do think however that we have some problem, because in your described case, there are no changes, ie no costly flush should be done.

Comment 24464

Date: 2016-10-06 08:28:58 +0200
From: @mlkersten

Just to be sure, running the script should include the result of uptime to assess if there is no external process running on your box that causes resource fights. Think for example about backup-scripts running in the background.

Comment 24471

Date: 2016-10-10 08:18:55 +0200
From: @njnes

Frédéric I think we should close this bug report. The observed behaviour is indeed caused by the log flushes. These give all these fdatasyncs. Once all tables are created and flushed, I don't see (extreem) times spend in the
storage query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant