Skip to content

Conversation

@tsreaper
Copy link
Contributor

Currently changelog files are stored as extra files in DataFileMeta. However for the full compaction changelog we're about to introduce, it cannot be added as extra files because their statistics might be different from the corresponding merge tree files.

We need to extract changelog files out of DataFileMeta#extraFiles.

changelogFiles.add(changelogWriter.result());
} catch (Exception e) {
// exception occurs, clean up written file
writerFactory.deleteFile(fileMeta.fileName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to delete file, already in newFiles

compactChanges.addAll(collectChanges(committable.compactAfter(), FileKind.ADD));

if (createEmptyCommit || !appendChanges.isEmpty()) {
List<ManifestEntry> appendMergeTree = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mergetree, appendDataFiles?

if (fileMeta == null) {
for (String extraFile : extraFiles) {
writerFactory.deleteFile(extraFile);
Iterator<KeyValue> iterator = memTable.mergeIterator(keyComparator, mergeFunction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing data first will sort the data in the writer buffer, which will make the changelog different from the input order.

But it may not be a bad thing, because in #315 It is impossible to maintain the input order.

It is better to note the following sequence here.

@Test
public void testStreamingChangelogCompatibility02() throws Exception {
// already contains 2 commits
CompatibilityTestUtils.unzip("compatibility/0.2-changelog-table.zip", tablePath.getPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to table-changelog-0.2.zip?

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

newChangesListName = manifestList.write(newChangesManifests);

// write changelog into manifest files
changelogMetas.addAll(manifestFile.write(changelogFiles));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid generating redundant files without changelog?

@tsreaper tsreaper merged commit 1430491 into apache:master Oct 14, 2022
@tsreaper tsreaper deleted the extract-changelog branch October 14, 2022 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants