Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: add optimized variants of external sort #45192

Open
yuzefovich opened this issue Feb 19, 2020 · 1 comment
Open

colexec: add optimized variants of external sort #45192

yuzefovich opened this issue Feb 19, 2020 · 1 comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Feb 19, 2020

When #44978 merges, we will have falling back to disk from all three variants of the in-memory sort operators. However, the general external sorter will be used in all cases, regardless of what the in-memory sorter is used. We could be more efficient and take advantage of either the limit and or the partial ordering in the external sorter as well.

Update: when #66303 merges, we will have the optimized variant for the top K case; however, the segmented sort ("sort chunks") optimization won't be added yet. There are some difficulties with making chunker spooler implement Resetter interface because sortChunksOp needs special resetting behavior for its own needs.

Jira issue: CRDB-5172

@yuzefovich yuzefovich added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Feb 19, 2020
@yuzefovich yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Execution via automation Feb 19, 2020
@yuzefovich yuzefovich moved this from Triage to [BACKLOG] Enhancements/Features in BACKLOG, NO NEW ISSUES: SQL Execution Feb 19, 2020
@yuzefovich yuzefovich removed this from [VECTORIZED BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Jun 10, 2021
@yuzefovich yuzefovich added this to Triage in SQL Queries via automation Jun 10, 2021
@yuzefovich yuzefovich added the E-quick-win Likely to be a quick win for someone experienced. label Jun 10, 2021
@yuzefovich yuzefovich moved this from Triage to Backlog in SQL Queries Jun 10, 2021
@yuzefovich yuzefovich removed the E-quick-win Likely to be a quick win for someone experienced. label Jun 10, 2021
@yuzefovich yuzefovich self-assigned this Jun 10, 2021
@yuzefovich yuzefovich moved this from Backlog to 21.2 High Likelihood (90%) in SQL Queries Jun 10, 2021
@yuzefovich yuzefovich removed their assignment Jun 10, 2021
@yuzefovich yuzefovich moved this from 21.2 High Likelihood (90%) to Backlog in SQL Queries Jun 10, 2021
craig bot pushed a commit that referenced this issue Jun 16, 2021
66303: colexec: optimize the external sort for top K case r=yuzefovich a=yuzefovich

**colexec: extend external sort benchmark for top K case**

Release note: None

**colexec: optimize the external sort for top K case**

Previously, if the top K sort spilled to disk, we used the general
external sort. However, we could easily optimize that case with the
knowledge that only K tuples are needed by the output.

Namely, we can use the in-memory top K sort (in order to create each new
partition of the desired size) and also limit the size of each merged
partition by K tuples. This commit adds these optimizations.

Addresses: #45192.

Release note: None

66379: colexecbase: add casts from decimals to ints and floats r=yuzefovich a=yuzefovich

This commit adds vectorized casts from decimals to ints (of all widths)
and floats.

Addresses: #48135.

Release note: None

66412: sqlproxyccl: minor fixes and enhancements to the proxy handler and denylist r=jaylim-crl a=jaylim-crl

#### sqlproxyccl: allow denylist entries that do not expire

Previously, we assumed that all denylist entries have an expiration key. When
denylist entries do not specify an expiration key, the entries are marked as
expired right away since their values default to the zero instant time. This
might be cumbersome for operators to specify an expiration when the intention
was to not allow the rule to expire at all. This patch changes the behavior of
the denylist such that entries without any expiration keys represent rules
that do not expire.

#### sqlproxyccl: minor fixes around the proxy handler

In #65164, we migrated the sqlproxy in the CC code to the DB repository, and
there were a few buglets:
- sqlproxy crashes when the tenant ID supplied in the connection string is 0
  because roachpb.MakeTenantID panics when the tenant ID is 0.
- sqlproxy leaks internal parsing errors to the client.

This patch hides internal parsing errors, and replaces them with friendly
user-facing errors (e.g. "Invalid cluster name"). We also add a bounds check
to the parsed tenant ID so that the process does not crash on an invalid
tenant ID. More tests were added as well.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Jay Lim <jay@cockroachlabs.com>
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries May 25, 2023
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
SQL Queries
New Backlog
Development

No branches or pull requests

2 participants