Skip to content

Conversation

@tsreaper
Copy link
Contributor

Currently we have two types of files to write:

  • Data files (LSM tree files), where a level 0 data file is a single file and a level >= 1 data file is a set of rolling files. Statistics for these files are needed for pruning when scanning.
  • Extra files (changelog files), just a list of records. No statistics are needed.

However, current writers are all based on MetricFileWriter, which always produces statistics.

We'd like to refactor the writers and group them into SingleFileWriter and RollingFileWriter. StatsCollectingSingleFileWriter should be a subclass of SingleFileWriter which additionally produces statistics, and data file writers should be a subclass of StatsCollectingSingleFileWriter or RollingFileWriter based on their level. For extra file writers, extending from SingleFileWriter is enough.

@tsreaper tsreaper marked this pull request as ready for review September 14, 2022 04:59
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, left comments.

LOG.warn(
"Failed to open the bulk writer, closing the output stream and throw the error.",
e);
IOUtils.closeQuietly(out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to close out here, outside will invoke abort.

// Abort this writer to clear uncommitted files.
writer.abort();

writer.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use close and return result.
We can just make RecordWriter.close returns void.

import java.util.function.Supplier;

/** A {@link RollingFileWriter} to write {@link KeyValue}s into several rolling data files. */
public class KeyValueDataRollingFileWriter extends RollingFileWriter<KeyValue, DataFileMeta> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create separate class? I think there is no logical.

+ (currentWriter == null ? null : currentWriter.path())
+ ". Cleaning up.",
e);
abort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add document to FileWriter.abort, Implementation needs to be reentrant

}

try {
if (writer == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a single file writer, the file should be single instead of zero.
We can create writer in the constructor. This can avoid various inconsistencies caused by not producing files.

+ (currentWriter == null ? null : currentWriter.path())
+ ". Cleaning up.",
e);
abort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add document to FileWriter.write, clear file by itself now when exception in write.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@tsreaper tsreaper merged commit 736f936 into apache:master Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants