-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[FLINK-27843] Schema evolution for data file meta #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-27843] Schema evolution for data file meta #376
Conversation
|
Hi @JingsongLi @tsreaper Can you help to review this PR when you're free? THX |
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks very nice! I will take a deep look~
...e-store-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaEvolutionUtil.java
Show resolved
Hide resolved
...re-core/src/main/java/org/apache/flink/table/store/file/operation/AbstractFileStoreScan.java
Outdated
Show resolved
Hide resolved
...-core/src/main/java/org/apache/flink/table/store/file/operation/AppendOnlyFileStoreScan.java
Outdated
Show resolved
Hide resolved
...re-core/src/main/java/org/apache/flink/table/store/file/operation/KeyValueFileStoreScan.java
Outdated
Show resolved
Hide resolved
...re-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaFieldTypeExtractor.java
Show resolved
Hide resolved
...re-core/src/main/java/org/apache/flink/table/store/file/schema/SchemaFieldTypeExtractor.java
Outdated
Show resolved
Hide resolved
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
|
Thanks @JingsongLi I have updated the codes |
|
Thanks for the update. We can improve |
Done |
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Currently, the table store uses the latest schema id to read the data file meta. When the schema evolves, it will cause errors, for example:
When table store reads the field stats from data file meta, it should mapping schema 1 to 0 according to their field ids.
This PR will read and parse the data according to the schema id in the meta file when reading the data file meta, and create index mapping from the table schema and the meta schema, so that the table store can read the correct file meta data through its latest schema.
The main codes are as follows:
SchemaFieldTypeExtractorto extract key fields forChangelogValueCountFileStoreTableandChangelogWithKeyFileStoreTableSchemaEvolutionUtilto create index mapping from table schema to meta file schemaFieldStatsArraySerializerto read field stats with given index mappingThe main tests include:
SchemaEvolutionUtilTestto create index mapping between two schemas.FieldStatsArraSerializerTestto read meta from table schemaAppendOnlyTableFileMetaFilterTest,ChangelogValueCountFileMetaFilterTestandChangelogWithKeyFileMetaFilterTestto filter old field, new field, partition field and primary key in data file meta in table scan.