sql: improve inverted index validation observability and runtime #54481

ajwerner · 2020-09-16T23:16:55Z

Is your feature request related to a problem? Please describe.

Inverted index creation is reasonably efficient. It uses the usual bulk-io index addition goodness. At the end of creating the index we run a validation step. Unlike unique indexes, this step is purely a sanity check. Unfortunately this check is far less optimized than the index construction itself. The check works by running two parallel operations, counting the entries in the new index and reading the primary index to count the implied entries to confirm they match. This can make inverted index validation very slow. This is made dramatically worse by the general lack of observability.

Describe the solution you'd like

Determine whether we need this validation at all.
- Why are we even running these queries? What leads us to lack confidence in the adder?
Provide visibility into the progress of validation (assuming we keep it around)
- Very loosely related to ui: show estimated schema change progress #10265
- We actually may have an answer here pending verification, seems like sql: No way to get insight into VALIDATE CONSTRAINT progress #26639 landed in 20.1 so we may be able to leverage it.
Optimize the validation to use distribution and parallelism, reading the data should not take as long as writing it unless we're doing a bad job!

I think my preference would be to remove the validation from the creation path and move it into something akin to SCRUB (but better because that thing is not good). Then, wherever we move it, we should provide good observability and better performance.

Additional context

Related to #26639

Jira issue: CRDB-3756

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-09-16T23:16:57Z

Hi @ajwerner, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

ajwerner added this to Triage in SQL Foundations via automation Sep 16, 2020

blathers-crl bot added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Sep 16, 2020

ajwerner added this to Triage in Disaster Recovery Backlog via automation Sep 16, 2020

ajwerner moved this from Triage to Backlog in SQL Foundations Sep 22, 2020

mwang1026 moved this from Triage to Schema Change Backfills in Disaster Recovery Backlog Sep 22, 2020

kenliu added the T-disaster-recovery label Dec 5, 2020

jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021

ajwerner mentioned this issue Mar 15, 2022

sql: provide syntax to scan an index directly #59549

Open

exalate-issue-sync bot removed the T-disaster-recovery label Mar 21, 2022

postamar moved this from Backlog to Cold storage in SQL Foundations Nov 10, 2022

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

dt removed this from Schema Change Backfills in Disaster Recovery Backlog Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: improve inverted index validation observability and runtime #54481

sql: improve inverted index validation observability and runtime #54481

ajwerner commented Sep 16, 2020 •

edited by cockroach-jira-scripts

blathers-crl bot commented Sep 16, 2020

sql: improve inverted index validation observability and runtime #54481

sql: improve inverted index validation observability and runtime #54481

Comments

ajwerner commented Sep 16, 2020 • edited by cockroach-jira-scripts

blathers-crl bot commented Sep 16, 2020

ajwerner commented Sep 16, 2020 •

edited by cockroach-jira-scripts