Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIKE: batstr.like+algebra.uselect called instead of pcre.like_filter #3179

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

LIKE: batstr.like+algebra.uselect called instead of pcre.like_filter #3179

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Assignees
Labels

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2012-11-07 15:09:36 +0100
From: @swingbit
To: @njnes
Version: 11.13.3 (Oct2012)
CC: bugs-sql, @njnes

Last updated: 2013-02-19 13:17:57 +0100

Comment 17882

Date: 2012-11-07 15:09:36 +0100
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.60 Safari/537.11
Build Identifier:

Take the following query:

sql>select name,func from functions where name like '%optimizers%';
+------------+----------------------------------------------------+
| name | func |
+============+====================================================+
| optimizers | -- MONETDB KERNEL SECTION |
: : -- optimizer pipe catalog :
: : create function optimizers () :
: : returns table (name string, def string, status st :
: : ring) :
: : external name sql.optimizers; :
+------------+----------------------------------------------------+
1 tuple (2.156ms)

And its explain (I used the sequential_pipe to get a more readable explain):

sql>explain select name,func from functions where name like '%optimizers%';
+-----------------------------------------------------------------------+
| mal |
+=======================================================================+
| function user.s14_1{autoCommit=true}(A0:str):void; |
| X_23 := nil:bat[:oid,:str]; |
| X_33 := nil:bat[:oid,:str]; |
| barrier X_62 := language.dataflow(); |
| X_3 := sql.mvc(); |
| X_4:bat[:oid,:str] := sql.bind(X_3,"sys","functions","name",0); |
| X_9:bat[:oid,:str] := sql.bind(X_3,"sys","functions","name",2); |
| X_11 := algebra.kdifference(X_4,X_9); |
| X_12 := algebra.kunion(X_11,X_9); |
| X_13:bat[:oid,:str] := sql.bind(X_3,"sys","functions","name",1); |
| X_15 := algebra.kunion(X_12,X_13); |
| X_16:bat[:oid,:oid] := sql.bind_dbat(X_3,"sys","functions",1); |
| X_17 := bat.reverse(X_16); |
| X_18 := algebra.kdifference(X_15,X_17); |
| X_19 := batcalc.str(X_18); |
| X_20 := pcre.like_filter(X_19,A0,"":str); |
| X_21 := algebra.markT(X_20,0@0:oid); |
| X_22 := bat.reverse(X_21); |
| X_23 := algebra.leftjoin(X_22,X_15); |
| X_24:bat[:oid,:str] := sql.bind(X_3,"sys","functions","func",0); |
| X_26:bat[:oid,:str] := sql.bind(X_3,"sys","functions","func",2); |
| X_27 := algebra.kdifference(X_24,X_26); |
| X_28 := algebra.kunion(X_27,X_26); |
| X_29:bat[:oid,:str] := sql.bind(X_3,"sys","functions","func",1); |
| X_32 := algebra.kunion(X_28,X_29); |
| X_33 := algebra.leftjoin(X_22,X_32); |
| exit X_62; |
| X_34 := sql.resultSet(2,1,X_23); |
| sql.rsColumn(X_34,"sys.functions","name","varchar",256,0,X_23); |
| sql.rsColumn(X_34,"sys.functions","func","varchar",8196,0,X_33); |
| X_45 := io.stdout(); |
| sql.exportResult(X_45,X_34); |
| end s14_1; |
+-----------------------------------------------------------------------+
33 tuples (2.937ms)

Notice that pcre.like_filter() is used.

Now, I define the following function:

create function contains(str string, substr string)
returns boolean
begin
return str like '%'||substr||'%';
end;

And get the explain for the following query (equivalent to the previous one):

sql>explain select name,func from functions where contains(name, 'optimizers');
+-----------------------------------------------------------------------+
| mal |
+=======================================================================+
| function user.s21_1{autoCommit=true}(A0:str):void; |
| X_22 := nil:bat[:oid,:str]; |
| X_36 := nil:bat[:oid,:str]; |
| barrier X_70 := language.dataflow(); |
| X_3 := sql.mvc(); |
| X_4:bat[:oid,:str] := sql.bind(X_3,"sys","functions","name",0); |
| X_9:bat[:oid,:str] := sql.bind(X_3,"sys","functions","name",2); |
| X_11 := algebra.kdifference(X_4,X_9); |
| X_12 := algebra.kunion(X_11,X_9); |
| X_13:bat[:oid,:str] := sql.bind(X_3,"sys","functions","name",1); |
| X_15 := algebra.kunion(X_12,X_13); |
| X_16:bat[:oid,:oid] := sql.bind_dbat(X_3,"sys","functions",1); |
| X_17 := bat.reverse(X_16); |
| X_18 := algebra.kdifference(X_15,X_17); |
| X_59 := calc.+("%":str,A0); |
| X_60 := calc.+(X_59,"%":str); |
| X_61 := batstr.like(X_18,X_60); |
| X_19 := algebra.uselect(X_61,true:bit); |
| X_20 := algebra.markT(X_19,0@0:oid); |
| X_21 := bat.reverse(X_20); |
| X_22 := algebra.leftjoin(X_21,X_18); |
| X_23:bat[:oid,:str] := sql.bind(X_3,"sys","functions","func",0); |
| X_28:bat[:oid,:str] := sql.bind(X_3,"sys","functions","func",2); |
| X_30 := algebra.kdifference(X_23,X_28); |
| X_31 := algebra.kunion(X_30,X_28); |
| X_32:bat[:oid,:str] := sql.bind(X_3,"sys","functions","func",1); |
| X_34 := algebra.kunion(X_31,X_32); |
| X_35 := algebra.kdifference(X_34,X_17); |
| X_36 := algebra.leftjoin(X_21,X_35); |
| exit X_70; |
| X_37 := sql.resultSet(2,1,X_22); |
| sql.rsColumn(X_37,"sys.functions","name","varchar",256,0,X_22); |
| sql.rsColumn(X_37,"sys.functions","func","varchar",8196,0,X_36); |
| X_45 := io.stdout(); |
| sql.exportResult(X_45,X_37); |
| end s21_1; |
+-----------------------------------------------------------------------+
36 tuples (3.790ms)

Notice that, this time, pcre.like_filter() is replaced now with batstr.like() + algebra.uselect().

In principle I wouldn't consider this as a bug, as long as this difference comes for a reason.
However, I mark it as a bug because:

  1. the batstr.like() + algebra.uselect() version is MUCH slower, so it should not be used unless there is a good reason
  2. when I tried both versions on larger tables, the batstr.like() + algebra.uselect() MISSED RESULTS

Reproducible: Always

Steps to Reproduce:

  1. try both versions mentioned above
  2. preferably on large string columns

MonetDB 5 server v11.11.12 (64-bit, 64-bit oids)
This is an unreleased version
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2012 MonetDB B.V., all rights reserved
Visit http://www.monetdb.org/ for further information
Found 35.5GiB available memory, 8 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 1.0.0d 8 Feb 2011 (compiled with OpenSSL 1.0.0d-fips 8 Feb 2011)
libxml2: 2.7.7 (compiled with 2.7.7)
Compiled by: roberto@spinque01.ins.cwi.nl (x86_64-unknown-linux-gnu)
Compilation: gcc -g -Werror -Wall -Wextra -W -Werror-implicit-function-declaration -Wpointer-arith -Wdeclaration-after-statement -Wformat=2 -Wno-format-nonliteral -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-format-attribute -Wmissing-prototypes -Wold-style-definition -Wpacked -Wunknown-pragmas -Wvariadic-macros -fstack-protector-all -Wstack-protector -Wpacked-bitfield-compat -Wsync-nand -Wmissing-include-dirs
Linking : /usr/bin/ld -m elf_x86_64

Comment 18000

Date: 2012-11-25 15:20:19 +0100
From: @njnes

The mal plan with function will first inline the contains function, then
rewrite it. The later works on bases of equal names, but different module name
(ie batstr instead of str.

So we need to align names of single value, bulk and select operators,
str.like, batstr.like and likesubselect (former filter).
To go from batstr.like + subselect (former uselect) a new optimizer should
be introduced.

Comment 18021

Date: 2012-11-27 10:05:39 +0100
From: @grobian

keep alias in the loop

Comment 18084

Date: 2012-11-27 14:35:11 +0100
From: @njnes

Changeset 89a9ebc92eda made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=89a9ebc92eda

Changeset description:

added tests for Bugs 3193, 3179 and 3172

Fixed bug #3191, we now allow the order by column expressions to refer to
the lower level projection. This is very limited support voor order by with
expressions. Aliases is still preferred.

Fixed bug #3179 (or feature request). We now rewrite batstr.*like + subselect into
*likeselect. Also some cleanup of pushselect optimizer and mergetable related
helper functions in opt_support.c

Fixed bug #3172. We still don't allow multi row input in table functions. But
we now don't crash on it anymore. Still work on mal.multiplex needed (requires
multiple outputs)!

Comment 18510

Date: 2013-02-19 13:17:57 +0100
From: @sjoerdmullender

Feb2013 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants