Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split files into smaller chunks when replicating snapshots #12795

Closed
deepthidevaki opened this issue May 17, 2023 · 5 comments · Fixed by #19490
Closed

Split files into smaller chunks when replicating snapshots #12795

deepthidevaki opened this issue May 17, 2023 · 5 comments · Fixed by #19490
Assignees
Labels
area/performance Marks an issue as performance related area/resilience component/raft component/zeebe Related to the Zeebe component/team kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. support Marks an issue as related to a customer support request

Comments

@deepthidevaki
Copy link
Contributor

deepthidevaki commented May 17, 2023

Description

A snapshot consists of multiple files. When replicating a snapshot, we put one file into a single request. This can cause issues because it takes longer to send the request and can hit the configured timeout. In high latency networks this can happen more frequent. We have configured max file sizes for rocksdb. Even then the files can be upto 64MB.

A better way to handle it is to split each files to smaller chunks, and send one small chunk at a time. Smaller packets are faster to send. Besides, this also lowers memory footprint, as we don't have to load the whole file into memory. This might also help to relax the limit on rocksdb sst file size.

relates to https://jira.camunda.com/browse/SUPPORT-16901

@deepthidevaki deepthidevaki added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. area/performance Marks an issue as performance related area/resilience component/raft labels May 17, 2023
@megglos
Copy link
Contributor

megglos commented May 25, 2023

ZDP-Triage:

@megglos megglos added the support Marks an issue as related to a customer support request label May 25, 2023
@megglos
Copy link
Contributor

megglos commented Jun 9, 2023

@romansmirnov romansmirnov added the component/zeebe Related to the Zeebe component/team label Mar 5, 2024
@EuroLew
Copy link
Contributor

EuroLew commented Jun 11, 2024

Notes:

The implementation of this will come in two parts the sender PR and the receiver PR.

The sender:
A ChunkedFileBasedSnapshotChunkReader will be created an will replace the current FileBasedSnapshotChunkReader however the chunk size designated will be Integer.MAX_VALUE so therefore no functional change should occur each chunk will still contain the full file. In addition the chunkName has been changed, previously this was the file name however since there could be multiple chunks per file the chunkName will now be appended with the file part e.g. file-1, file-2,file-3. As a result a fileName field has been added to the snapshot chunk implementation used instead of relying on string splits to extract (There is a fair amount of changes required to add the fileName filed maybe it would be simpler to rely on string split but this locks in the chunk naming scheme so a specific format so not ideal)

The receiver:

This involves adding support for chunks which only contain parts of files, currently correctness checks occur where file existence is taken as a failure however with chunked file snapshots this no longer holds these checks will be removed or refactored. In addition there is the question of checksums, before each chunk was a file and the file snapshot was provided. Currently the new implementation would involve snapshots of the file chunks. I think this is a fine solution, if we know all the file parts are not corrupted the full file should be correct? Open to thoughts on this. Lastly there will need to be changes to how files are written per chunk as now a chunk could represent not just a new file to create and write but also appending content to a file.

@deepthidevaki
Copy link
Contributor Author

Have you also looked into how to make these changes with out breaking rolling update?

@EuroLew
Copy link
Contributor

EuroLew commented Jun 12, 2024

Have you also looked into how to make these changes with out breaking rolling update?

Yes changing chunkName to be a chunk identifier and adding a fileName field will break rolling update. Instead keep chunkName to be the fileName (not a very apt name this can be changed in a later version). and add a field called chunkIdentifier? Which has format file_name-<file_part> for use during debugging (Do you think this field is even needed?)

github-merge-queue bot pushed a commit that referenced this issue Jun 18, 2024
## Description
This will result in split file snapshot chunks being sent to a broker
which supports it.

Changes:
- Install Response, added a `preferredChunKSize` field which defaults to
0 for old versions.
- Changed `FileBasedSnapshotChunkReader.next()` to instead read files
chunk by chunk.
- Added `chunkSize` field to the `FileBasedSnapshotChunkReader`.
- Changed `FileBasedSnapshotChunkReader.nextId()` from a `<file-name>`
to `<file name>-<file-part>` format e.g. file-1, file-2.
- Added `fileBlockPosition` and `totalFileSize` field to `SnapshotChunk`
in order to aid in processing on the receiver side as it is useful to
know the current position in the file for next expected chunk checks and
total file size for any completion actions when all blocks have been
received for example file flushing. (Can be calculated with data.length,
position and file size if it is the last chunk for a given file)

## Related issues
Sender side of #12795
EuroLew added a commit that referenced this issue Jun 20, 2024
## Description

Add config support for chunk size.

closes #12795
EuroLew added a commit that referenced this issue Jun 24, 2024
## Description

Add config support for chunk size.

closes #12795
esraagamal6 pushed a commit that referenced this issue Jun 26, 2024
## Description

Built on the changes in #19361 as new chunk fields are needed.

- [ ] Removed file existence checks as they will fail for chunked files.
- [ ] Now writes file using the chunk position so order of chunks
doesn't matter (for writing files will still fail checksums)
- [ ] Now support partially building metadata if file is chunked.
- [ ] Now enforces order of chunks using `chunkId` in `PassiveRole`

## Related issues

closes #12795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Marks an issue as performance related area/resilience component/raft component/zeebe Related to the Zeebe component/team kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. support Marks an issue as related to a customer support request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants