Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): create drop inverted index #14859

Merged
merged 12 commits into from Mar 9, 2024

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Mar 6, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Parser support create/drop inverted index
  • TableMeta adds indexes field
  • Add create_table_index and drop_table_index api

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@b41sh b41sh requested a review from drmingdrmer as a code owner March 6, 2024 09:06
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Mar 6, 2024
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 43 files at r1.
Reviewable status: 14 of 44 files reviewed, 1 unresolved discussion (waiting on @b41sh and @sundy-li)


src/meta/api/src/schema_api_impl.rs line 4013 at r1 (raw file):

            UnknownIndex::new(&tenant_index.index_name, "drop index with different type"),
        )));
    }

An index is uniquely identified by index_type at line 3985, does it have to check index_type?

Code quote:

    if index_type != &index_meta.index_type {
        return Err(KVAppError::AppError(AppError::UnknownIndex(
            UnknownIndex::new(&tenant_index.index_name, "drop index with different type"),
        )));
    }

@b41sh
Copy link
Member Author

b41sh commented Mar 6, 2024

Reviewed 14 of 43 files at r1.
Reviewable status: 14 of 44 files reviewed, 1 unresolved discussion (waiting on @b41sh and @sundy-li)

src/meta/api/src/schema_api_impl.rs line 4013 at r1 (raw file):

            UnknownIndex::new(&tenant_index.index_name, "drop index with different type"),
        )));
    }

An index is uniquely identified by index_type at line 3985, does it have to check index_type?

Code quote:

    if index_type != &index_meta.index_type {
        return Err(KVAppError::AppError(AppError::UnknownIndex(
            UnknownIndex::new(&tenant_index.index_name, "drop index with different type"),
        )));
    }

The aggregating index and the inverted index have different drop syntaxes,

DROP AGGREGATING INDEX idx1;
DROP INVERTED INDEX idx1;

The main reason for checking the index type here is to avoid using the wrong drop syntax, for example:

CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;
DROP INVERTED INDEX my_agg_index;

@drmingdrmer
Copy link
Member

drmingdrmer commented Mar 6, 2024

The aggregating index and the inverted index have different drop syntaxes,

DROP AGGREGATING INDEX idx1;
DROP INVERTED INDEX idx1;

The main reason for checking the index type here is to avoid using the wrong drop syntax, for example:

CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;
DROP INVERTED INDEX my_agg_index;

Is it allowed for an aggregating index and an inverted index to have same index name? For example:

CREATE AGGREGATING INDEX aaa ...;
CREATE INVERTED INDEX aaa ...;

@b41sh
Copy link
Member Author

b41sh commented Mar 6, 2024

The aggregating index and the inverted index have different drop syntaxes,

DROP AGGREGATING INDEX idx1;
DROP INVERTED INDEX idx1;

The main reason for checking the index type here is to avoid using the wrong drop syntax, for example:

CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;
DROP INVERTED INDEX my_agg_index;

Is it allowed for an aggregating index and an inverted index to have same index name? For example:

CREATE AGGREGATING INDEX aaa ...;
CREATE INVERTED INDEX aaa ...;

Same name is not allowed.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 14 of 45 files reviewed, 2 unresolved discussions (waiting on @b41sh and @sundy-li)


src/meta/app/src/schema/index.rs line 96 at r3 (raw file):

    AGGREGATING = 1,
    JOIN = 2,
    INVERTED = 3,

Adding a variant is a incompatible change. In a cluster of new-query and old-query running together, the old-query that can not read INVERTED will return an error

@sundy-li
Copy link
Member

sundy-li commented Mar 7, 2024

Adding a variant is a incompatible change. In a cluster of new-query and old-query running together, the old-query that can not read INVERTED will return an error

Seems reasonable, new-query should remove this new meta to downgrade.

@drmingdrmer
Copy link
Member

Reviewed all commit messages.
Reviewable status: 14 of 45 files reviewed, 2 unresolved discussions (waiting on @b41sh and @sundy-li)

src/meta/app/src/schema/index.rs line 96 at r3 (raw file):

    AGGREGATING = 1,
    JOIN = 2,
    INVERTED = 3,

Adding a variant is a incompatible change. In a cluster of new-query and old-query running together, the old-query that can not read INVERTED will return an error

@TCeason had solved a similar compatibility issue like this. What's the correct way for adding a enum variant?

@b41sh
Copy link
Member Author

b41sh commented Mar 7, 2024

Adding a variant is a incompatible change. In a cluster of new-query and old-query running together, the old-query that can not read INVERTED will return an error

@TCeason had solved a similar compatibility issue like this. What's the correct way for adding a enum variant?

We can store the data for the inverted index separately, but this will bring some duplicate code, is there a better way to solve this problem?

@TCeason
Copy link
Collaborator

TCeason commented Mar 7, 2024

We can store the data for the inverted index separately, but this will bring some duplicate code, is there a better way to solve this problem?

PriviligeType is bitflag. Use from_bits_truncate.

let privileges =
            BitFlags::<mt::principal::UserPrivilegeType, u64>::from_bits_truncate(p.privileges);

@TCeason
Copy link
Collaborator

TCeason commented Mar 7, 2024

Reviewed all commit messages.
Reviewable status: 14 of 45 files reviewed, 2 unresolved discussions (waiting on @b41sh and @sundy-li)

src/meta/app/src/schema/index.rs line 96 at r3 (raw file):

    AGGREGATING = 1,
    JOIN = 2,
    INVERTED = 3,

Adding a variant is a incompatible change. In a cluster of new-query and old-query running together, the old-query that can not read INVERTED will return an error

@TCeason had solved a similar compatibility issue like this. What's the correct way for adding a enum variant?

index_type: FromPrimitive::from_i32(p.index_type).ok_or_else(|| Incompatible {
                reason: format!("invalid IndexType: {}", p.index_type),
            })?,

Now index type uses from i32. So we can add a compat test for it? like this:

  1. in new query: add inverted index; add AGGREGATING index
  2. revert version to main without the inverted index
  3. try to query the inverted index; try to query the agg index.

@drmingdrmer
Copy link
Member

If there is an incompatible variant, from_pb will return an Error. Is it acceptable? @b41sh

index_type: FromPrimitive::from_i32(p.index_type).ok_or_else(|| Incompatible {

image

@b41sh
Copy link
Member Author

b41sh commented Mar 7, 2024

If there is an incompatible variant, from_pb will return an Error. Is it acceptable? @b41sh

Returning an error makes the index unusable, which is unacceptable.

Inverted indexes are for a single table, whereas aggregating indexes are for multiple tables, so it seems better to define them separately. Can we define a new inverted index and put it in TableMeta?

@ariesdevil
Copy link
Member

If there is an incompatible variant, from_pb will return an Error. Is it acceptable? @b41sh

Returning an error makes the index unusable, which is unacceptable.

Inverted indexes are for a single table, whereas aggregating indexes are for multiple tables, so it seems better to define them separately. Can we define a new inverted index and put it in TableMeta?↳

Aggregating index also for single table, but one table can have multiple aggregating indexes.

@drmingdrmer
Copy link
Member

Can we define a new inverted index and put it in TableMeta?

It would be more clear and neat.

@b41sh b41sh marked this pull request as draft March 8, 2024 00:14
@b41sh b41sh marked this pull request as ready for review March 8, 2024 10:47
@b41sh b41sh requested a review from drmingdrmer March 8, 2024 10:47
@b41sh b41sh requested a review from ariesdevil March 8, 2024 10:47
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 72 files at r4, all commit messages.
Reviewable status: 7 of 78 files reviewed, 3 unresolved discussions (waiting on @ariesdevil, @b41sh, and @sundy-li)


src/meta/app/src/schema/table.rs line 258 at r4 (raw file):

pub struct TableIndex {
    pub name: String,
    pub columns: Vec<String>,

If a column is renamed, columns becomes invalid.
The column_id should be used instead of String. column_id are unique.

Refer to:

pub next_column_id: ColumnId,

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 72 files at r4, 3 of 15 files at r5, all commit messages.
Reviewable status: 9 of 78 files reviewed, 4 unresolved discussions (waiting on @ariesdevil, @b41sh, and @sundy-li)


src/meta/api/src/schema_api_impl.rs line 3186 at r5 (raw file):

                column_ids: req.column_ids.clone(),
            };
            indexes.insert(req.name.clone(), index);

The column_ids is passed in via the CreateTableIndexReq, the column ids may have changed in the TableMeta fetched at line 3115.

There are two options to address this issue:

  • Confirm that the column ids from CreateTableIndexReq present in the TableMeta'
  • Specify column by names in the CreateTableIdnexReq, and translate them to column-id in this method.

@b41sh b41sh requested a review from drmingdrmer March 9, 2024 03:29
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 30 of 72 files at r4, 3 of 15 files at r5, 3 of 10 files at r6, all commit messages.
Reviewable status: 42 of 78 files reviewed, 2 unresolved discussions (waiting on @ariesdevil, @b41sh, and @sundy-li)

@b41sh b41sh added this pull request to the merge queue Mar 9, 2024
Merged via the queue into datafuselabs:main with commit a9d1fac Mar 9, 2024
71 of 72 checks passed
@b41sh b41sh deleted the feat-inverted-index-2 branch March 9, 2024 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants