Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cross-version remapped merges to xl-meta #19765

Merged
merged 7 commits into from
May 19, 2024

Conversation

klauspost
Copy link
Contributor

@klauspost klauspost commented May 17, 2024

Description

Adds -xver which can be used with -export and -combine to attempt to combine files across versions if data is suspected to be the same. Overlapping data is compared.

Also can now read external part files.

Bonus: Make inspect accept wildcards.

How to test this PR?

λ ./xl-meta -export -combine -xver sample.zip
...
Base version "null/5870230c".
Read shard 2 Data shards 12 Parity 4 (null/5870230c/shard-02-of-16.data)
Read shard 3 Data shards 12 Parity 4 (null/5870230c/shard-03-of-16.data)
Read shard 4 Data shards 12 Parity 4 (null/5870230c/shard-04-of-16.data)
Read shard 5 Data shards 12 Parity 4 (null/5870230c/shard-05-of-16.data)
Read shard 6 Data shards 12 Parity 4 (null/5870230c/shard-06-of-16.data)
Read shard 8 Data shards 12 Parity 4 (null/5870230c/shard-08-of-16.data)
Read shard 10 Data shards 12 Parity 4 (null/5870230c/shard-10-of-16.data)
Read shard 12 Data shards 12 Parity 4 (null/5870230c/shard-12-of-16.data)
Reading version "null/9f2b4b49".
Read shard 1 Data shards 11 Parity 5 (null/9f2b4b49/shard-01-of-16.data)
Read shard 7 Data shards 11 Parity 5 (null/9f2b4b49/shard-07-of-16.data)
Read shard 9 Data shards 11 Parity 5 (null/9f2b4b49/shard-09-of-16.data)
Read shard 11 Data shards 11 Parity 5 (null/9f2b4b49/shard-11-of-16.data)
Read shard 13 Data shards 11 Parity 5 (null/9f2b4b49/shard-13-of-16.data)
Read shard 14 Data shards 11 Parity 5 (null/9f2b4b49/shard-14-of-16.data)
Read shard 15 Data shards 11 Parity 5 (null/9f2b4b49/shard-15-of-16.data)
Read shard 16 Data shards 11 Parity 5 (null/9f2b4b49/shard-16-of-16.data)
Data overlaps (3938 bytes). Combining with "null/9f2b4b49".
Attempting to reconstruct using parity sets:
* Setup: Data shards: 12 - Parity blocks: 4
Have 8 complete remapped data shards and 0 complete parity shards. Could NOT reconstruct: too few shards given
3384 bytes missing. Truncating 0 from end.
Wrote output to null/5870230c-00230-429-34b66a9a-1d4c-4dfd-8425-9d8609952504-00001.parquet.truncated
Base version "null/9f2b4b49".
Read shard 1 Data shards 11 Parity 5 (null/9f2b4b49/shard-01-of-16.data)
Read shard 7 Data shards 11 Parity 5 (null/9f2b4b49/shard-07-of-16.data)
Read shard 9 Data shards 11 Parity 5 (null/9f2b4b49/shard-09-of-16.data)
Read shard 11 Data shards 11 Parity 5 (null/9f2b4b49/shard-11-of-16.data)
Read shard 13 Data shards 11 Parity 5 (null/9f2b4b49/shard-13-of-16.data)
Read shard 14 Data shards 11 Parity 5 (null/9f2b4b49/shard-14-of-16.data)
Read shard 15 Data shards 11 Parity 5 (null/9f2b4b49/shard-15-of-16.data)
Read shard 16 Data shards 11 Parity 5 (null/9f2b4b49/shard-16-of-16.data)
Reading version "null/5870230c".
Read shard 2 Data shards 12 Parity 4 (null/5870230c/shard-02-of-16.data)
Read shard 3 Data shards 12 Parity 4 (null/5870230c/shard-03-of-16.data)
Read shard 4 Data shards 12 Parity 4 (null/5870230c/shard-04-of-16.data)
Read shard 5 Data shards 12 Parity 4 (null/5870230c/shard-05-of-16.data)
Read shard 6 Data shards 12 Parity 4 (null/5870230c/shard-06-of-16.data)
Read shard 8 Data shards 12 Parity 4 (null/5870230c/shard-08-of-16.data)
Read shard 10 Data shards 12 Parity 4 (null/5870230c/shard-10-of-16.data)
Read shard 12 Data shards 12 Parity 4 (null/5870230c/shard-12-of-16.data)
Data overlaps (3938 bytes). Combining with "null/5870230c".
Attempting to reconstruct using parity sets:
* Setup: Data shards: 11 - Parity blocks: 5
Have 7 complete remapped data shards and 4 complete parity shards. Could reconstruct completely
0 bytes missing. Truncating 0 from end.
Wrote output to null/9f2b4b49-00230-429-34b66a9a-1d4c-4dfd-8425-9d8609952504-00001.parquet.complete

First tries moving data into 5870230c, but there is not enough to reconstruct. Then it moves data into 9f2b4b49 - and gets enough data to be able to reconstruct.

The overall method: We lay out the file:

.................................................................

Then we add the data shards from the first version, adding an x for the bytes we have

xxxxxxxxxxxxxx.............xxxxxxxxx.........xxxxxxxxxxxxxxxxxxxx

Then we do the same for the other version:

..........xxxxxxxxxxxxxxxxxxxxxxxxxx.........xxxxxxxx............

We then merge these two:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.........xxxxxxxxxxxxxxxxxxxx

We also check if the overlapping bytes match.

If this gives us all the data - fine. If not we use the parity shards we may have to reconstruct the remaining.

If data and parity shard count is the same we also merge those.

Types of changes

  • New feature (non-breaking change which adds functionality)

Adds `-xver` which can be used with `-export` and `-combine` to attempt to combine files across versions if data is suspected to be the same. Overlapping data is compared.

Bonus: Make `inspect` accept wildcards.
@klauspost klauspost changed the title Add crossversion merges to xl-meta Add cross-version remapped merges to xl-meta May 17, 2024
Copy link
Member

@krisis krisis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the parts I understand now. Great stuff!

@harshavardhana harshavardhana merged commit 2c7bcee into minio:master May 19, 2024
20 checks passed
@klauspost klauspost deleted the xlmeta-add-xver branch May 19, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants