-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[FLINK-29612] Extract changelog files out of DataFileMeta#extraFiles #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| changelogFiles.add(changelogWriter.result()); | ||
| } catch (Exception e) { | ||
| // exception occurs, clean up written file | ||
| writerFactory.deleteFile(fileMeta.fileName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to delete file, already in newFiles
| compactChanges.addAll(collectChanges(committable.compactAfter(), FileKind.ADD)); | ||
|
|
||
| if (createEmptyCommit || !appendChanges.isEmpty()) { | ||
| List<ManifestEntry> appendMergeTree = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No mergetree, appendDataFiles?
| if (fileMeta == null) { | ||
| for (String extraFile : extraFiles) { | ||
| writerFactory.deleteFile(extraFile); | ||
| Iterator<KeyValue> iterator = memTable.mergeIterator(keyComparator, mergeFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing data first will sort the data in the writer buffer, which will make the changelog different from the input order.
But it may not be a bad thing, because in #315 It is impossible to maintain the input order.
It is better to note the following sequence here.
| @Test | ||
| public void testStreamingChangelogCompatibility02() throws Exception { | ||
| // already contains 2 commits | ||
| CompatibilityTestUtils.unzip("compatibility/0.2-changelog-table.zip", tablePath.getPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to table-changelog-0.2.zip?
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
| newChangesListName = manifestList.write(newChangesManifests); | ||
|
|
||
| // write changelog into manifest files | ||
| changelogMetas.addAll(manifestFile.write(changelogFiles)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid generating redundant files without changelog?
Currently changelog files are stored as extra files in
DataFileMeta. However for the full compaction changelog we're about to introduce, it cannot be added as extra files because their statistics might be different from the corresponding merge tree files.We need to extract changelog files out of
DataFileMeta#extraFiles.