Skip to content

[Bug] The length of DeletionFile is incorrect #3313

@suxiaogang223

Description

@suxiaogang223

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.8-SNAPSHOT

Compute Engine

JavaAPI

Minimal reproduce step

Nothing to do

What doesn't meet your expectations?

I'm trying to support deletion vector for doris' PaimonNativeReader. When I use the offset and length in DeletionFile to read the content of the hdfs file to the local, I got an error when deserializing the content into RoaringBitmap, actually I found that the correct way is to read the content of length + 4 bytes to local.
I guess that these 4 bytes are due to saving the serialized length of the DeletionVector when storing DeletionVector to index file.

    static DeletionVector read(FileIO fileIO, DeletionFile deletionFile) throws IOException {
        Path path = new Path(deletionFile.path());
        try (SeekableInputStream input = fileIO.newInputStream(path)) {
            input.seek(deletionFile.offset());
            DataInputStream dis = new DataInputStream(input);
            int actualLength = dis.readInt();
            if (actualLength != deletionFile.length()) {
                throw new RuntimeException(
                        "Size not match, actual size: "
                                + actualLength
                                + ", expert size: "
                                + deletionFile.length()
                                + ", file path: "
                                + path);
            }
            int magicNum = dis.readInt();
            if (magicNum == BitmapDeletionVector.MAGIC_NUMBER) {
                return BitmapDeletionVector.deserializeFromDataInput(dis);
            } else {
                throw new RuntimeException("Invalid magic number: " + magicNum);
            }
        }
    }

Maybe we should add 4 to length or offset in DeletionFile because it's very confusing.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions