-
Notifications
You must be signed in to change notification settings - Fork 573
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
11017: feat(backup/s3): optionally compress backup contents r=oleschoenburg a=oleschoenburg Introduces a new configuration option for the S3 backup store: `ZEEBE_BROKER_DATA_BACKUP_S3_ENABLECOMPRESSION`. For now, this is disabled by default. When enabled, files above a hard coded threshold (currently 8 MiB) are compressed with [Zstandard](https://github.com/facebook/zstd). We first compress to a temporary file before uploading. In-memory or streaming compression is impractical. For each file, the used compression algorithm is stored as metadata in the manifest. On restore, files that have a compression algorithm stored in the metadata are downloaded to a temporary file and then decompressed to the actual target path. Compression is handled by [commons-compress](https://commons.apache.org/proper/commons-compress/index.html) which provides a consistent interface for many compression algorithms. If we decide to change the default compression algorithm or make it configurable, restoring will use whatever algorithm was used to compress, thus achieving backwards compatibility. Currently, the compression algorithm is hard coded as Zstandard. This requires an additional dependency on the native wrapper library which provides the binaries for all supported architectures: https://github.com/luben/zstd-jni#binary-releases ## Alternatives At first I considered archiving and compressing the entire backup or at least all segment and snapshot files together. However, this could easily exceed the current limitation of 5 GiB per file upload. It's not too bad though, because this way we can actually compress files in parallel and don't increase the peak disk usage by too much. Another alternative was to implement this independently from the backup store and let the Zeebe broker handle compression. I believe handling it in the store itself makes more sense as some backends may have support for native compression. Additionally, letting the broker handle compression would again increase peak disk usage. Worst case, if backup content is not at all compressible, this could double the required disk size. ## Improvements We should consider using a dedicated thread pool or another mechanism to control how many files are (de-)compressed in parallel. We don't have any data yet, but it is easy to imagine that doing a lot of compression in parallel could have a considerable impact on CPU usage, disk usage and disk I/O. Actually, it might make sense to use such a mechanism more broadly on parallel upload and download of files to also control the impact on network I/O. Some parts of the backup might not be very compressible. In particular, parts of the snapshot are already compressed by RocksDB. It might make sense to not even attempt to compress snapshot files. closes #10846 Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
- Loading branch information
Showing
20 changed files
with
525 additions
and
125 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.