-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-26144: Add keys/indexes to support highly concurrent workload #3214
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @kovjanos. I left some general questions under the JIRA.
Also, I am wondering if you tested the upgrade scripts with tables having data. What happens to existing rows when you alter the table to introduce the new PK column? Are they populated automatically?
mysql part (on MariaDB) was tested in production environment. Let me add the other tests and results here... |
@zabetak Tested with Derby, MySQL, PostgreSQL and MSSQL and worked - see below. Versions tested:
Derby
PostgreSQL:
MySQL:
MSSQL:
|
That doesn't work with Oracle, even plain table has uniqueness issues - see below. Do we need to have this also on ORACLE? I can add the column to keep schema consistent, but it can't be a PRIMARY KEY (might not even be required for the DELETEs in Oracle if it works differently).
|
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0-alpha-2.derby.sql
Show resolved
Hide resolved
@@ -570,7 +571,8 @@ CREATE TABLE COMPLETED_TXN_COMPONENTS ( | |||
CTC_PARTITION varchar(767), | |||
CTC_TIMESTAMP timestamp DEFAULT CURRENT_TIMESTAMP NOT NULL, | |||
CTC_WRITEID bigint, | |||
CTC_UPDATE_DELETE char(1) NOT NULL | |||
CTC_UPDATE_DELETE char(1) NOT NULL, | |||
CTC_ID bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @deniskuzZ! Based on the above tests, that's only a problem for Oracle as others - Derby, PgSQL, MySQL, MSSQL - all generate uniq value for the multi-line inserts. As soon as out from other ticket I'll test the Oracle case if the cleaner queries do better plans or not with the identity to see if a sequence based column would be needed or just an identity column to keep schema consistent across all engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry that's not a TXN_ID column, might be worse to check HIVE-23048: Use sequences for TXN_ID generation
.
What are the problematic queries that could benefit from TC_ID/CTC_ID PK?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
What changes were proposed in this pull request?
Missing keys/index is to be added to the HMS backend db schema
Why are the changes needed?
On a high-concurrency test we found that backend database is doing full table scans in some cases where the table has missing key/index.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Integration tests for all database types: