You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36
Build Identifier:
We are trying to improve performance of a particular pattern of queries that is going against a 100 million row, 50 column Monet table (all columns defined as int datatype).
We are hosting the table on a server that has 132 cores and close to 1 TB of memory and data sitting on a high speed SSD.
The pattern of queries as follows –
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
where column 1 is the primary key for the table.
Above query takes an hour and half to complete. So we attempted various optimization approaches – we partitioned the table, ran order index on primary key but were not able to bring it to less than 40 mins.
We did notice that this particular query pattern with large list of members in IN clause is not using more than 1 core when executing. Other query patterns are using multiple cores.
Any suggestions on what we can possibly tune or look into?
Thanks
Reproducible: Always
Steps to Reproduce:
1.We should have table of around 100 million rows and 50-100 columns(all columns defined as int datatype).
2.SQL queries with a large list of members in IN clause as per below example
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
Actual Results:
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
query takes an hour and half to complete.
So we attempted various optimization approaches – we partitioned the table, ran order index on primary key but were not able to bring it to less than 40 mins.
We are noticing that particular query pattern with large list of members in IN clause is not using more than 1 core when executing. Other query patterns are using multiple cores.
I want to add few more details to what Akhilesh has provided. This is also seen by me on both "Jul2017-SP4" and "Aug2018-SP2".
Incidentally, The SQL is pretty much the same. That is Select with IN Clause with 150,000 IDS. Tried out with both sequential_pipe and default_pipe on both "Jul2017-SP4" and "Aug2018-SP2", and for both performance is same. Please do note that default_pipe on Aug2018-SP2 the performance does increase by mere 10 seconds not much.
Any hints/workaround for this kind of case. 150 Id's is for one of our smaller tables. The standard tables the size is pretty huge and it takes hours to complete. I have provided a 5000 Id's example in "reproducer" section, which takes greater than 2 mins. We tried debugging to see where it is getting slowed down and that info is provided below. We unfortunately cannot wait this long for results and would like your views/workaround on this.
Thanks,
Arshad
Debugging/System Details below
The SQL is very similar like below which is already given:
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
Tracing the SQL execution, what was seen was maximum time was taken in candscan_int(). The real wait is done under macro scan_sel(candscan, o = candlist++, w = (BUN) (((oid *) Tloc(s,q?(q - 1):0)) + 1)). I am not sure if this is tree/BUN build up or build up/processing both.
(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y 0x00007f69c3952200 in candscan_int at gdk_select.c:850
breakpoint already hit 74100 times
c
2 breakpoint keep y 0x00007f69b2b2aa90 in exp_bin at rel_bin.c:345
breakpoint already hit 74631 times
c
3 breakpoint keep y 0x00007f69c396a9e0 in BATselect at gdk_select.c:1186
breakpoint already hit 74113 times
c
(gdb)
What was observed was that rel_bin() is first executed. Once that is finished candscan_int()->scan_sel() is called. The below top snapshot is taken when scan_sel() is running. This takes up 99% of the CPU.
$ mserver5 --version
MonetDB 5 server v11.31.13 "Aug2018-SP2" (64-bit)
Copyright (c) 1993 - July 2008 CWI
Copyright (c) August 2008 - 2019 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 1007.3GiB available memory, 128 available cpu cores
Libraries:
libpcre: 8.32 2012-11-30 (compiled with 8.32)
openssl: OpenSSL 1.0.2k 26 Jan 2017 (compiled with OpenSSL 1.0.2k-fips 26 Jan 2017)
libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: monetadmin@lnx1490.ch3.prod.i.com (x86_64-pc-linux-gnu)
Compilation: gcc -std=gnu99 -g -O2
Linking : /bin/ld -m elf_x86_64 -Wl,-Bsymbolic-functions
Reproducible: Always
Steps to Reproduce:
/* Running SQL with 150K IN ids is very slow. Here query.sql has just 5000 IDS and it takes more than two minutes */
1.mclient -is -d DB -u USR < query.sql
This is fixed in the current default branch. The current situation is that for every value in the in-value-list a separate select query is executed and merged into a result holder. The new implementation uses join mechanics which is much faster.
Anytime soon new source tar ball will be released (SP3)? I currently do not have a hg setup with me, I always prefer a stable tar ball source release. So it will take some time before I could verify this. If you could please provide the bug ID against which the code went in, I can hand pick and patch the SP2 and test out.
Thanks
Arshad
Comment 26941
Date: 2019-03-21 15:52:47 +0100
From: Martin van Dinther <<martin.van.dinther>>
As a workaround (till the fix has been released) for the query
Select c2, c3, c4 from X
where c1 IN (1,2,3,4,5,6,7,8,9,10);
you could try to use following query construct:
Select c2, c3, c4 from X
join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) as invalues(inval) on c1 = inval;
This will do a join instead of the IN (...).
If this workaround does not improve your query performance you could also use a (temporary) table to store the values to match and next join with that table:
create local temporary table invalues(inval int primary key) on commit preserve rows;
insert into invalues values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
select * from invalues;
Next you can do:
Select c2, c3, c4 from X
join invalues on c1 = inval;
Anytime soon new source tar ball will be released (SP3)? I currently do not
have a hg setup with me, I always prefer a stable tar ball source release.
So it will take some time before I could verify this. If you could please
provide the bug ID against which the code went in, I can hand pick and patch
the SP2 and test out.
Thanks
Arshad
As Aris said, this is the the default branch. The plans are to release from the Apr2019 branch this coming April. Some as yet undetermined time after that we will start working on releasing from what is currently the default branch. So I'm afraid it's still quite some time until this change gets packaged in a release.
The text was updated successfully, but these errors were encountered:
Date: 2019-03-01 07:07:04 +0100
From: Akhilesh <<akhilesh.visouriya>>
To: SQL devs <>
Version: 11.27.13 (Jul2017-SP4)
CC: @aris-koning, arshad.super, martin.van.dinther, @njnes
Last updated: 2019-11-28 10:00:05 +0100
Comment 26913
Date: 2019-03-01 07:07:04 +0100
From: Akhilesh <<akhilesh.visouriya>>
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36
Build Identifier:
We are trying to improve performance of a particular pattern of queries that is going against a 100 million row, 50 column Monet table (all columns defined as int datatype).
We are hosting the table on a server that has 132 cores and close to 1 TB of memory and data sitting on a high speed SSD.
The pattern of queries as follows –
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
where column 1 is the primary key for the table.
Above query takes an hour and half to complete. So we attempted various optimization approaches – we partitioned the table, ran order index on primary key but were not able to bring it to less than 40 mins.
We did notice that this particular query pattern with large list of members in IN clause is not using more than 1 core when executing. Other query patterns are using multiple cores.
Any suggestions on what we can possibly tune or look into?
Thanks
Reproducible: Always
Steps to Reproduce:
1.We should have table of around 100 million rows and 50-100 columns(all columns defined as int datatype).
2.SQL queries with a large list of members in IN clause as per below example
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
Actual Results:
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
query takes an hour and half to complete.
So we attempted various optimization approaches – we partitioned the table, ran order index on primary key but were not able to bring it to less than 40 mins.
We are noticing that particular query pattern with large list of members in IN clause is not using more than 1 core when executing. Other query patterns are using multiple cores.
Comment 26916
Date: 2019-03-04 08:06:34 +0100
From: Arshad <<arshad.super>>
Hello,
I want to add few more details to what Akhilesh has provided. This is also seen by me on both "Jul2017-SP4" and "Aug2018-SP2".
Incidentally, The SQL is pretty much the same. That is Select with IN Clause with 150,000 IDS. Tried out with both sequential_pipe and default_pipe on both "Jul2017-SP4" and "Aug2018-SP2", and for both performance is same. Please do note that default_pipe on Aug2018-SP2 the performance does increase by mere 10 seconds not much.
Any hints/workaround for this kind of case. 150 Id's is for one of our smaller tables. The standard tables the size is pretty huge and it takes hours to complete. I have provided a 5000 Id's example in "reproducer" section, which takes greater than 2 mins. We tried debugging to see where it is getting slowed down and that info is provided below. We unfortunately cannot wait this long for results and would like your views/workaround on this.
Thanks,
Arshad
Debugging/System Details below
The SQL is very similar like below which is already given:
Select (column 2), (column 3), (column4) from table X where column 1 IN (x,y,z,…………..100k ids) ;
Tracing the SQL execution, what was seen was maximum time was taken in candscan_int(). The real wait is done under macro scan_sel(candscan, o = candlist++, w = (BUN) (((oid *) Tloc(s,q?(q - 1):0)) + 1)). I am not sure if this is tree/BUN build up or build up/processing both.
(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y 0x00007f69c3952200 in candscan_int at gdk_select.c:850
breakpoint already hit 74100 times
c
2 breakpoint keep y 0x00007f69b2b2aa90 in exp_bin at rel_bin.c:345
breakpoint already hit 74631 times
c
3 breakpoint keep y 0x00007f69c396a9e0 in BATselect at gdk_select.c:1186
breakpoint already hit 74113 times
c
(gdb)
(gdb) Breakpoint 1, candscan_int (b=b@entry=0x7f699ed8b880, s=s@entry=0x7f699e8ece70, bn=bn@entry=0x7f699f0c31d0, tl=tl@entry=0x7f691d2ee818, th=th@entry=0x7f691d2ee818, equi=equi@entry=1, anti=anti@entry=0, r=r@entry=0, q=q@entry=82831, cnt=cnt@entry=0, off=off@entry=0, dst=dst@entry=0x7f691d32cde0, candlist=candlist@entry=0x7f691d4bbe00, maximum=maximum@entry=82831, use_imprints=use_imprints@entry=0, hval=, lval=, hi=, li=) at gdk_select.c:850
850 scan_sel(candscan, o = candlist++, w = (BUN) (((oid *) Tloc(s,q?(q - 1):0)) + 1))
What was observed was that rel_bin() is first executed. Once that is finished candscan_int()->scan_sel() is called. The below top snapshot is taken when scan_sel() is running. This takes up 99% of the CPU.
top - 05:09:31 up 314 days, 6:26, 7 users, load average: 2.84, 2.86, 2.56
Threads: 7 total, 1 running, 6 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.4 us, 1.0 sy, 0.0 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 10562755+total, 982952 free, 56946828+used, 48582438+buff/cache
KiB Swap: 33554428 total, 9442496 free, 24111932 used. 48075398+avail Mem
436748 monetad+ 20 0 26.939g 0.026t 0.026t R 99.3 2.6 1:06.53 mserver5
436482 monetad+ 20 0 26.939g 0.026t 0.026t S 0.3 2.6 0:00.07 mserver5
436463 monetad+ 20 0 26.939g 0.026t 0.026t S 0.0 2.6 0:00.06 mserver5
My Server/system details:
$ mserver5 --version
MonetDB 5 server v11.31.13 "Aug2018-SP2" (64-bit)
Copyright (c) 1993 - July 2008 CWI
Copyright (c) August 2008 - 2019 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 1007.3GiB available memory, 128 available cpu cores
Libraries:
libpcre: 8.32 2012-11-30 (compiled with 8.32)
openssl: OpenSSL 1.0.2k 26 Jan 2017 (compiled with OpenSSL 1.0.2k-fips 26 Jan 2017)
libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: monetadmin@lnx1490.ch3.prod.i.com (x86_64-pc-linux-gnu)
Compilation: gcc -std=gnu99 -g -O2
Linking : /bin/ld -m elf_x86_64 -Wl,-Bsymbolic-functions
Reproducible: Always
Steps to Reproduce:
/* Running SQL with 150K IN ids is very slow. Here query.sql has just 5000 IDS and it takes more than two minutes */
1.mclient -is -d DB -u USR < query.sql
Actual Results:
password:
+------+
| L3 |
+======+
| 7408 |
+------+
1 tuple (133.6s)
Expected Results:
It should be much quicker than 2 minutes.
Comment 26931
Date: 2019-03-19 09:13:34 +0100
From: @aris-koning
Hi Arshad,
This is fixed in the current default branch. The current situation is that for every value in the in-value-list a separate select query is executed and merged into a result holder. The new implementation uses join mechanics which is much faster.
Kind regards,
Aris
Comment 26940
Date: 2019-03-21 09:40:41 +0100
From: Arshad <<arshad.super>>
Hi Aris,
Thanks for the fix!!
Anytime soon new source tar ball will be released (SP3)? I currently do not have a hg setup with me, I always prefer a stable tar ball source release. So it will take some time before I could verify this. If you could please provide the bug ID against which the code went in, I can hand pick and patch the SP2 and test out.
Thanks
Arshad
Comment 26941
Date: 2019-03-21 15:52:47 +0100
From: Martin van Dinther <<martin.van.dinther>>
As a workaround (till the fix has been released) for the query
Select c2, c3, c4 from X
where c1 IN (1,2,3,4,5,6,7,8,9,10);
you could try to use following query construct:
Select c2, c3, c4 from X
join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) as invalues(inval) on c1 = inval;
This will do a join instead of the IN (...).
If this workaround does not improve your query performance you could also use a (temporary) table to store the values to match and next join with that table:
create local temporary table invalues(inval int primary key) on commit preserve rows;
insert into invalues values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
select * from invalues;
Next you can do:
Select c2, c3, c4 from X
join invalues on c1 = inval;
Comment 26943
Date: 2019-03-22 14:51:36 +0100
From: @sjoerdmullender
(In reply to Arshad from comment 3)
As Aris said, this is the the default branch. The plans are to release from the Apr2019 branch this coming April. Some as yet undetermined time after that we will start working on releasing from what is currently the default branch. So I'm afraid it's still quite some time until this change gets packaged in a release.
The text was updated successfully, but these errors were encountered: