Skip to content

[core] Introduce DeletionVectorIndexFileWriter#3402

Merged
JingsongLi merged 9 commits intoapache:masterfrom
YannByron:core_dvindex_writer
May 30, 2024
Merged

[core] Introduce DeletionVectorIndexFileWriter#3402
JingsongLi merged 9 commits intoapache:masterfrom
YannByron:core_dvindex_writer

Conversation

@YannByron
Copy link
Copy Markdown
Contributor

Purpose

to support merge and write multi deletion vector within same partition and bucket to one index file.

Linked issue: close #xxx

Tests

API and Format

Documentation

private final boolean isWrittenToMulitFiles;
private final long targetSizeInBytes;

private boolean written = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class looks a bit strange, why does it need variables? It only provides one method, which appears to not require any state preservation.

written = true;
writtenSizeInBytes += currentSize;
}
result.add(writer.closeWriter());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use try (resource) to avoid resource leak.

return new SingleIndexFileWriter(fileIO, indexPathFactory.newPath());
}

static class SingleIndexFileWriter {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private, and maybe can be an inner class instead of static.
In this way, you can inline createWriter to the SingleIndexFileWriter constructor.

@YannByron YannByron force-pushed the core_dvindex_writer branch from c682030 to ca79eca Compare May 29, 2024 06:53
long currentSize = writer.write(entry.getKey(), entry.getValue());

if (isWrittenToMulitFiles
&& !writer.hasWritten()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!writer.hasWritten() always return true?

Maybe you should remove it.


private final PathFactory indexPathFactory;
private final FileIO fileIO;
private final boolean isWrittenToMulitFiles;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: isWrittenToMultiFiles

}
}

class SingleIndexFileWriter implements Closeable {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private


private long writtenSizeInBytes = 0L;

public SingleIndexFileWriter(FileIO fileIO, Path path) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public SingleIndexFileWriter() throws IOException {
     this.path = indexPathFactory.newPath();
     .....
}

public DeletionVectorIndexFileWriter(
FileIO fileIO,
PathFactory pathFactory,
BucketMode bucketMode,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need bucketMode, here we can just pass a long targetSizePerIndexFile. If it is BUCKET_UNAWARE, targetSizePerIndexFile can be Long.MAX.

Same to DeletionVectorsIndexFile.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. That's what I thought at first. But I don't think it's a little serious.


public List<IndexFileMeta> write(Map<String, DeletionVector> input) throws IOException {
if (input.isEmpty()) {
return emptyIndexFile();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is possible? Why we need to return a empty Index file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty index file is needed to overwrite the previous one when compact.
I am going to think it again when to support compact operation on deletion vector.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here just aligned to the original logic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may only be to overwrite the old deletion file, as there is no proactive message to delete the deletion file for the primary key table.

But for append table, maybe here no need to generate empty deletion file?

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit cec1709 into apache:master May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants