Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create index where the included columns would be maintained by CRDB as all the columns needed to delete records from the base table. #80244

Open
jhatcher9999 opened this issue Apr 20, 2022 · 2 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@jhatcher9999
Copy link
Collaborator

jhatcher9999 commented Apr 20, 2022

Is your feature request related to a problem? Please describe.
When you try to delete records from a table (i.e., DELETE FROM tbl WHERE id BETWEEN 10 AND 20), not only do you need an index that helps resolve the predicate, but the SQL optimizer also needs to retrieve all the key fields used in the primary index and all the secondary indexes. The query plan will show an INDEX scan/seek, then an INDEX JOIN back to the primary table, and then a DELETE.

This index join can be expensive. The need for this index join can be removed by creating a covering index that includes all the fields that are keys for the primary/secondary indexes. However, this is not very intuitive. Nor is it easy to maintain. For instance, if you created a covering index to help with deletes and later created another index on the table but failed to alter your "delete index" to include the fields used in your new index, then your DELETE queries would revert to having an INDEX JOIN in their execution plan.

Example:

CREATE TABLE customer ( id uuid DEFAULT gen_random_uuid(), name varchar, dob timestamp );
INSERT INTO customer ( id, name, dob ) VALUES ( DEFAULT, 'jim', '1776-07-04' );
CREATE INDEX ix_customer_name ON customer ( name );
CREATE INDEX ix_customer_dob ON customer ( dob );
root@localhost:26257/defaultdb> SELECT * FROM customer;
                   id                  | name |         dob
---------------------------------------+------+----------------------
  feb962b7-0be8-461f-9fba-1d4a5acfb510 | jim  | 1776-07-04 00:00:00
root@localhost:26257/defaultdb> EXPLAIN DELETE FROM customer WHERE id = 'feb962b7-0be8-461f-9fba-1d4a5acfb510';
                                          info
----------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • delete
  │ from: customer
  │ auto commit
  │
  └── • filter <-  At larger data volumes, this will turn into an INDEX JOIN -- but same general idea
      │ estimated row count: 1
      │ filter: id = 'feb962b7-0be8-461f-9fba-1d4a5acfb510'
      │
      └── • scan
            estimated row count: 1 (100% of the table; stats collected 21 seconds ago)
            table: customer@primary
            spans: FULL SCAN
CREATE INDEX ix_customer_id_for_delete ON customer ( id ) INCLUDE ( name, dob );
root@localhost:26257/defaultdb> EXPLAIN DELETE FROM customer WHERE id = 'feb962b7-0be8-461f-9fba-1d4a5acfb510';
                                                info
----------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • delete
  │ from: customer
  │ auto commit
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 38 seconds ago)
        table: customer@ix_customer_id_for_delete
        spans: [/'feb962b7-0be8-461f-9fba-1d4a5acfb510' - /'feb962b7-0be8-461f-9fba-1d4a5acfb510']

Describe the solution you'd like
I would like to suggest an index that worked like:

CREATE INDEX id_delete ON tbl STORING NEEDED DELETE COLUMNS;

where this index would handle updating itself whenever schema events occurred that necessitated adding a field to the index to support more performant DELETEs.

Describe alternatives you've considered
Maintaining these indexes manually.

Additional context
I'm testing this on CRDB 21.2.9

Jira issue: CRDB-15794

@jhatcher9999 jhatcher9999 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 20, 2022
@rytaft rytaft added this to Triage in SQL Queries via automation Apr 20, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Apr 20, 2022
@michae2 michae2 added this to Triage in SQL Foundations via automation Apr 26, 2022
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Apr 26, 2022
@michae2 michae2 removed this from Triage in SQL Queries Apr 26, 2022
@michae2
Copy link
Collaborator

michae2 commented Apr 26, 2022

We think most of the work here would be related to schema changes.

@ajwerner
Copy link
Contributor

ajwerner commented May 3, 2022

This is some rather bespoke syntactic sugar. I'm putting it in the backlog. Feel free to lobby a PM about it.

@ajwerner ajwerner moved this from Triage to Backlog in SQL Foundations May 3, 2022
@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
@postamar postamar moved this from Backlog to Cold storage in SQL Foundations Nov 10, 2022
@healthy-pod healthy-pod added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 17, 2023
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
SQL Foundations
  
Cold storage
Development

No branches or pull requests

5 participants