Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAT sortedness info ignored on ORDER BY and TOPN #3368

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

BAT sortedness info ignored on ORDER BY and TOPN #3368

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2013-09-24 10:52:23 +0200
From: @swingbit
To: SQL devs <>
Version: 11.15.15 (Feb2013-SP4)
CC: @njnes

Last updated: 2013-12-03 13:59:38 +0100

Comment 19170

Date: 2013-09-24 10:52:23 +0200
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36
Build Identifier:

The query below triggers a topN on a SORTED column. The column is sorted as it only contains '1' values. I indeed checked with the debugger, for this column the sortedness properties read as:

[ 33@0, "tsorted", "1" ]
[ 34@0, "trevsorted", "1" ]

Still, the time taken by pqueue.topn_max() instruction clearly shows that real sorting is taking place.

The trace:

-- Use sequential optimizer just to have a more readable trace. This does not affect the bug report.
sql>set optimizer='sequential_pipe';
operation successful (0.331ms)

-- This triggers a TOPN on a SORTED column
sql>trace select * from dict order by prob limit 10;
+--------------------------------+------+---------+--------------------------+
| idstr | id | type | prob |
+================================+======+=========+==========================+
| entity.hashtag:skoob | 1 | hashtag | 1 |
| entity.lang:en | 2 | lang | 1 |
| entity.lang:es | 3 | lang | 1 |
| entity.lang:ja | 4 | lang | 1 |
| entity.tweet:32629192428359680 | 5 | tweet | 1 |
| entity.tweet:32629397479493632 | 6 | tweet | 1 |
| entity.tweet:32629410402140161 | 7 | tweet | 1 |
| entity.tweet:32629426382438401 | 8 | tweet | 1 |
| entity.tweet:32629756331560960 | 9 | tweet | 1 |
| entity.hashtag:icantdateyou | 0 | hashtag | 1 |
+--------------------------------+------+---------+--------------------------+
10 tuples (339.959ms)
+--------+----------------------------------------------------------------------------------------------------------------+
| ticks | stmt |
+========+================================================================================================================+
| 4 | X_2 := sql.mvc(); |
| 19 | X_3=<tmp_7605>[10430345] := sql.bind(X_2=0,"spinque","dict","prob",0); |
| 329160 | X_9=<tmp_22416>[10] := pqueue.topn_max(X_3=<tmp_7605>[10430345],10:wrd); |
| 23 | X_10=<tmpr_22157>[10] := algebra.subslice(X_9=<tmp_22416>[10],0:wrd,9:wrd); |
| 14 | X_11=<tmp_7574>[10430345] := sql.bind(X_2=0,"spinque","dict","idstr",0); |
| 29 | X_13=<tmp_22416>[10] := algebra.leftfetchjoin(X_10=<tmpr_22157>[10],X_11=<tmp_7574>[10430345]); |
| 17 | X_14=<tmpr_22607>[10] := algebra.subslice(X_13=<tmp_22416>[10],0:wrd,9:wrd); |
| 9 | X_15=<tmp_22362>[10] := algebra.leftfetchjoin(X_14=<tmpr_22607>[10],X_13=<tmp_22416>[10]); |
| 10 | X_58=<tmp_22204>[10] := algebra.leftfetchjoin(X_14=<tmpr_22607>[10],X_10=<tmpr_22157>[10]); |
| 14 | X_25:bat[:oid,:dbl] =<tmp_22157>[10] := algebra.leftfetchjoin(X_58=<tmp_22204>[10],X_3=<tmp_7605>[10430345]); |
| 7 | X_22=<tmp_7502>[10430345] := sql.bind(X_2=0,"spinque","dict","type",0); |
| 18 | X_24:bat[:oid,:str] =<tmp_22607>[10] := algebra.leftfetchjoin(X_58=<tmp_22204>[10],X_22=<tmp_7502>[10430345]); |
| 7 | X_17=<tmp_7504>[10430345] := sql.bind(X_2=0,"spinque","dict","id",0); |
| 13 | X_21:bat[:oid,:int] =<tmp_22473>[10] := algebra.leftfetchjoin(X_58=<tmp_22204>[10],X_17=<tmp_7504>[10430345]); |
| 7 | X_27 := sql.resultSet(4,1,X_15=<tmp_22362>[10]); |
| 7 | sql.rsColumn(X_27=1,"spinque.dict","idstr","clob",0,0,X_15=<tmp_22362>[10]); |
| 5 | sql.rsColumn(X_27=1,"spinque.dict","id","int",32,0,X_21=<tmp_22473>:bat[:oid,:int][10]); |
| 4 | sql.rsColumn(X_27=1,"spinque.dict","type","clob",0,0,X_24=<tmp_22607>:bat[:oid,:str][10]); |
| 4 | sql.rsColumn(X_27=1,"spinque.dict","prob","double",51,0,X_25=<tmp_22157>:bat[:oid,:dbl][10]); |
| 2 | X_43 := io.stdout(); |
| 99 | sql.exportResult(X_43=="104d2":streams,X_27=1); |
| 3 | end s1_7; |
| 330642 | X_5:void := user.s1_7(); |
+--------+----------------------------------------------------------------------------------------------------------------+
23 tuples (340.215ms)

The same happens with algebra.subsort(). Here as well, sorting should be a no-op. Again, I have verified with the debugger that the "prob" column has both "tsorted" and "trevsorted" set to 1, as it is constant.

sql>trace select * from dict order by prob;

[.... long result ....]

+----------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| ticks | stmt |
+==========+===============================================================================================================================================+
| 3 | X_2 := sql.mvc(); |
| 18 | X_3=<tmp_7605>[10430345] := sql.bind(X_2=0,"spinque","dict","prob",0); |
| 1044287 | (X_8=<tmp_22157>[10430345],r1_12=<tmp_22607>[10430345],r2_12=<tmp_22416>[10430345]) := algebra.subsort(X_3=<tmp_7605>[10430345],false,false); |
| 19 | X_12=<tmp_7574>[10430345] := sql.bind(X_2=0,"spinque","dict","idstr",0); |
| 179487 | X_14=<tmp_22416>[10430345] := algebra.leftfetchjoin(r1_12=<tmp_22607>[10430345],X_12=<tmp_7574>[10430345]); |
| 300968 | X_21=<tmp_22157>[10430345] := algebra.leftfetchjoin(r1_12=<tmp_22607>[10430345],X_3=<tmp_7605>[10430345]); |
| 16 | X_18=<tmp_7502>[10430345] := sql.bind(X_2=0,"spinque","dict","type",0); |
| 162502 | X_20=<tmp_22362>[10430345] := algebra.leftfetchjoin(r1_12=<tmp_22607>[10430345],X_18=<tmp_7502>[10430345]); |
| 15 | X_15=<tmp_7504>[10430345] := sql.bind(X_2=0,"spinque","dict","id",0); |
| 175569 | X_17=<tmp_22473>[10430345] := algebra.leftfetchjoin(r1_12=<tmp_22607>[10430345],X_15=<tmp_7504>[10430345]); |
| 15 | X_23 := sql.resultSet(4,1,X_14=<tmp_22416>[10430345]); |
| 7 | sql.rsColumn(X_23=3,"spinque.dict","idstr","clob",0,0,X_14=<tmp_22416>[10430345]); |
| 4 | sql.rsColumn(X_23=3,"spinque.dict","id","int",32,0,X_17=<tmp_22473>[10430345]); |
| 5 | sql.rsColumn(X_23=3,"spinque.dict","type","clob",0,0,X_20=<tmp_22362>[10430345]); |
| 5 | sql.rsColumn(X_23=3,"spinque.dict","prob","double",51,0,X_21=<tmp_22157>[10430345]); |
| 2 | X_38 := io.stdout(); |
| 54714576 | sql.exportResult(X_38=="104d2":streams,X_23=3); |
| 3 | end s2_7; |
| 56647146 | X_5:void := user.s2_7(); |

Same issue when using "row_number() OVER(ORDER BY prob)", with "prob" again being a constant column (thus sorted).

Reproducible: Always

Comment 19199

Date: 2013-09-27 11:57:08 +0200
From: @njnes

We need to add these cases to the (pqueue) topn code (ie use slice (0/n or count-n, count) depending on order/min or max que).

Comment 19201

Date: 2013-09-27 15:29:56 +0200
From: @njnes

within the pqueue code we use slice now.

Comment 19202

Date: 2013-09-27 15:44:59 +0200
From: @swingbit

What about algebra.subsort() ?
I experience the same problem.

Comment 19264

Date: 2013-10-09 17:17:53 +0200
From: @swingbit

I need to reopen this, as only the pqueue case has been resolved, but not the algebra.subsort case (see second part of the original report)

Comment 19300

Date: 2013-10-23 10:41:11 +0200
From: MonetDB Mercurial Repository <>

Changeset 6cfed0167e2e made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=6cfed0167e2e

Changeset description:

Added another shortcut in BATsubsort.
If the input is already sorted and we need to return group
information, we can take a shortcut if the input is also key or
constant.
This should fix bug #3368.

Comment 19301

Date: 2013-10-23 10:41:54 +0200
From: @sjoerdmullender

Roberto, can you check with the latest fix?

Comment 19305

Date: 2013-10-24 18:32:58 +0200
From: @swingbit

Confirmed!

Comment 19385

Date: 2013-12-03 13:59:38 +0100
From: @sjoerdmullender

Feb2013-SP6 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant