Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Data files with delete records should not be upgraded directly to max level #2962

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

tsreaper
Copy link
Contributor

@tsreaper tsreaper commented Mar 7, 2024

Purpose

Data files on max level should not have delete records. However when upgrading files to max level we currently does not perform this check.

Tests

TableWriteTest#testUpgradeToMaxLevel.

API and Format

Yes. DataFileMeta is changed, and thus ManifestEntry is changed. However as long as we can pass PrimaryKeyFileStoreTableTest#testStreamingChangelogCompatibility02 there should not be any issue.

Documentation

No.

// Why don't we keep addRowCount and deleteRowCount?
// Because in previous versions of DataFileMeta, we only keep rowCount.
// We have to keep the compatibility.
private final long totalRowCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still use rowCount naming?
Because in schema, it is still _ROW_COUNT...

// Because in previous versions of DataFileMeta, we only keep rowCount.
// We have to keep the compatibility.
private final long totalRowCount;
private final @Nullable Long addRowCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to introduce deleteRowCount?

@tsreaper tsreaper force-pushed the upgrade branch 3 times, most recently from 06ccb22 to 53f7067 Compare March 15, 2024 06:33
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please manually test that the newly written table can be read by the old code.

private final long rowCount;
private final @Nullable Long deleteRowCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this field to last class member.

@tsreaper
Copy link
Contributor Author

Please manually test that the newly written table can be read by the old code.

I've tested by hand, that the old reader code can read this newly written table.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 554c858 into apache:master Mar 18, 2024
8 checks passed
zhu3pang pushed a commit to zhu3pang/incubator-paimon that referenced this pull request Mar 29, 2024
@nonggialiang
Copy link

Please manually test that the newly written table can be read by the old code.

I've tested by hand, that the old reader code can read this newly written table.

When i use 0.5 code to read a table written by 0.8-SNAPSHOT, i got this exception. Is that as expected? @tsreaper

Caused by: java.lang.IndexOutOfBoundsException: Index: 13, Size: 13
        at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_201]
        at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_201]
        at org.apache.paimon.format.avro.FieldReaderFactory$RowReader.<init>(FieldReaderFactory.java:455) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.FieldReaderFactory$RowReader.<init>(FieldReaderFactory.java:447) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.FieldReaderFactory.visitRecord(FieldReaderFactory.java:147) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.FieldReaderFactory.visitRecord(FieldReaderFactory.java:44) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroSchemaVisitor.visit(AvroSchemaVisitor.java:41) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.FieldReaderFactory.visitUnion(FieldReaderFactory.java:70) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.FieldReaderFactory.visitUnion(FieldReaderFactory.java:44) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroSchemaVisitor.visit(AvroSchemaVisitor.java:44) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.FieldReaderFactory$RowReader.<init>(FieldReaderFactory.java:456) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.FieldReaderFactory.createRowReader(FieldReaderFactory.java:436) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroRowDatumReader.setSchema(AvroRowDatumReader.java:54) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.shade.org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:145) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.shade.org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:143) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.shade.org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:134) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.shade.org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:74) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroBulkFormat$AvroReader.createReaderFromPath(AvroBulkFormat.java:81) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroBulkFormat$AvroReader.<init>(AvroBulkFormat.java:68) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroBulkFormat$AvroReader.<init>(AvroBulkFormat.java:58) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroBulkFormat.createReader(AvroBulkFormat.java:55) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.format.avro.AvroBulkFormat.createReader(AvroBulkFormat.java:41) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.utils.FileUtils.createFormatReader(FileUtils.java:128) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.utils.ObjectsFile.read(ObjectsFile.java:97) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.operation.AbstractFileStoreScan.readManifestFileMeta(AbstractFileStoreScan.java:341) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.operation.AbstractFileStoreScan.lambda$null$2(AbstractFileStoreScan.java:238) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) ~[?:1.8.0_201]
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_201]
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) ~[?:1.8.0_201]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_201]
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_201]
        at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) ~[?:1.8.0_201]
        at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) ~[?:1.8.0_201]
        at java.util.stream.AbstractTask.compute(AbstractTask.java:316) ~[?:1.8.0_201]
        at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) ~[?:1.8.0_201]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[?:1.8.0_201]
        at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) ~[?:1.8.0_201]
        at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) ~[?:1.8.0_201]
        at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) ~[?:1.8.0_201]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:1.8.0_201]
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[?:1.8.0_201]
        at org.apache.paimon.operation.AbstractFileStoreScan.lambda$doPlan$3(AbstractFileStoreScan.java:240) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at org.apache.paimon.utils.ParallellyExecuteUtils$1.lambda$advanceIfNeeded$0(ParallellyExecuteUtils.java:78) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_201]
        at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) ~[?:1.8.0_201]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[?:1.8.0_201]
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) ~[?:1.8.0_201]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) ~[?:1.8.0_201]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ~[?:1.8.0_201]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants