Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation on multi column sort #6387

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Performance degradation on multi column sort #6387

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2017-08-07 19:34:42 +0200
From: Manuel <>
To: SQL devs <>
Version: 11.27.5 (Jul2017-SP1)

Last updated: 2017-10-26 14:01:44 +0200

Comment 25557

Date: 2017-08-07 19:34:42 +0200
From: Manuel <>

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
Build Identifier:

I noticed a quite drastic performance degradation, once I added more than one order by clauses.

For example the query
select * from "fdb6d859a24DM_FULFile_defau_" order by "name_" limit 500;

takes 0.826 seconds on my machine

whereas the query

select "id_", "name_" from "fdb6d859a24DM_FULFile_defau_" order by "name_", "id_" limit 500

takes over 12 seconds on the same machine.

Am I doing anything wrong ? Is there a way to improve performance on these kinds of queries ?

Reproducible: Always

Steps to Reproduce:

1.Import the data from hr.sql
2. Compare the performance between the two queries

Actual Results:

An order by clause with multiple columns cause a very significant performance drop

Comment 25558

Date: 2017-08-07 19:46:01 +0200
From: Manuel <>

Created attachment 566
Sample data

I am not able to attach the sql file as the size is too large. This is the schema of the table

CREATE TABLE "sys"."fdb6d859a24DM_FULFile_defau_" (
"id_" BIGINT,
"name_" VARCHAR(255),
"sex_" VARCHAR(255),
"race_" VARCHAR(255),
"location_" VARCHAR(255),
"department_" VARCHAR(255),
"level_" BIGINT,
"date_of_birth_" TIMESTAMP
);

id_ is assumes value 1,2,3 ....
The table contains 20 million records. Attached a sample of the records.

Attached file: human resources 10k.csv (application/vnd.ms-excel, 617844 bytes)
Description: Sample data

Comment 25594

Date: 2017-08-23 13:15:50 +0200
From: @sjoerdmullender

The implementation seems to be inefficient.

If you can live with a smaller limit, try that. For instance, compare the two queries with limit 5 instead of limit 500. It should make a huge difference.

Comment 25596

Date: 2017-08-25 14:48:54 +0200
From: MonetDB Mercurial Repository <>

Changeset f65ce66374ad made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=f65ce66374ad

Changeset description:

Reimplemented BATfirstn_grouped to use higher level functions.
This speeds up two column order bys with largish limits by several
orders.
Fix for bug #6387.

Comment 25597

Date: 2017-08-25 15:27:39 +0200
From: MonetDB Mercurial Repository <>

Changeset 7a7283729ef0 made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=7a7283729ef0

Changeset description:

Reimplemented some versions of BATfirstn to not use a nested loop.
This should fix the performance problem of bug #6387.

Comment 25610

Date: 2017-09-05 10:59:52 +0200
From: @sjoerdmullender

In my own test using the part table from TPC-H scale factor 100, time went down from about 47.5 seconds to 3.2 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant