Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: improve inverted index validation observability and runtime #54481

Open
3 tasks
ajwerner opened this issue Sep 16, 2020 · 1 comment
Open
3 tasks

sql: improve inverted index validation observability and runtime #54481

ajwerner opened this issue Sep 16, 2020 · 1 comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Sep 16, 2020

Is your feature request related to a problem? Please describe.

Inverted index creation is reasonably efficient. It uses the usual bulk-io index addition goodness. At the end of creating the index we run a validation step. Unlike unique indexes, this step is purely a sanity check. Unfortunately this check is far less optimized than the index construction itself. The check works by running two parallel operations, counting the entries in the new index and reading the primary index to count the implied entries to confirm they match. This can make inverted index validation very slow. This is made dramatically worse by the general lack of observability.

Describe the solution you'd like

  • Determine whether we need this validation at all.
    • Why are we even running these queries? What leads us to lack confidence in the adder?
  • Provide visibility into the progress of validation (assuming we keep it around)
  • Optimize the validation to use distribution and parallelism, reading the data should not take as long as writing it unless we're doing a bad job!

I think my preference would be to remove the validation from the creation path and move it into something akin to SCRUB (but better because that thing is not good). Then, wherever we move it, we should provide good observability and better performance.

Additional context

Related to #26639

Jira issue: CRDB-3756

@ajwerner ajwerner added this to Triage in SQL Foundations via automation Sep 16, 2020
@blathers-crl
Copy link

blathers-crl bot commented Sep 16, 2020

Hi @ajwerner, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Sep 16, 2020
@ajwerner ajwerner added this to Triage in Disaster Recovery Backlog via automation Sep 16, 2020
@ajwerner ajwerner moved this from Triage to Backlog in SQL Foundations Sep 22, 2020
@mwang1026 mwang1026 moved this from Triage to Schema Change Backfills in Disaster Recovery Backlog Sep 22, 2020
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@postamar postamar moved this from Backlog to Cold storage in SQL Foundations Nov 10, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
@dt dt removed this from Schema Change Backfills in Disaster Recovery Backlog Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
SQL Foundations
  
Cold storage
Development

No branches or pull requests

3 participants