New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse already created checksum collections on persist #14184
Conversation
Useful, if we want to avoid re-reading files etc.
Should allow to use pre calculated checksums, instead of always recalculating the value.
New persist method accepts SFV checksum, but needs to be reworked that we not recalculate value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 I think we'll need one where you pass the checksum value itself (either as a long or as a byte array), and not the byte array of contents (since RocksDB will give us the checksum and not the file contents). It might also be useful if we try to replicate snapshot files in smaller chunks 🤷
Anyway, we'll cross that bridge when we get there 😄
@Override | ||
public void updateFromBytes(final String fileName, final byte[] bytes) { | ||
combinedChecksum.update(fileName.getBytes(UTF_8)); | ||
final Checksum checksum = new CRC32C(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 You can create it once for the whole class, and just call CRC32C#reset
before every usage. Very minor optimization 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I keep it for now.
The CRC323C#reset :D
public void reset() {
this.crc = -1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at how it works, this is the initial value 🙃 Then when you pass things it gets updated. So it is, in fact, resetting the state to the initial values ;)
@@ -62,7 +62,7 @@ private static boolean isNotMetadataFile(final Path file) { | |||
|
|||
public static void persist(final Path checksumPath, final ImmutableChecksumsSFV checksum) | |||
throws IOException { | |||
try(final var stream = new FileOutputStream(checksumPath.toFile())) { | |||
try (final var stream = new FileOutputStream(checksumPath.toFile())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups 🙈 my bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah :D seems to be from the other PR :D
Yeah, let's do that when we are there, meaning when we are actually migrating. |
bors r+ |
14175: Refactor checksum collection writing and usage r=Zelldon a=Zelldon ## Description The first part of PR: Instead of constructing a byte array with byte output stream and writers, and then later writing again to a stream, we now get an OutputStream in and write directly to the stream Additionally PR contains #14184 - Support to update the checksum collection from bytes, which might be received or read from the file already. This is useful, if we want to avoid re-reading files etc. - Support to persist snapshot with given checksum collection. - collection of checksum is no longer recreated and directly written to the directory. <!-- Please explain the changes you made here. --> ## Related issues <!-- Which issues are closed by this PR or are related --> related #14045 Co-authored-by: Christopher Zell <zelldon91@googlemail.com> Co-authored-by: Christopher Kujawa (Zell) <zelldon91@googlemail.com>
14175: Refactor checksum collection writing and usage r=Zelldon a=Zelldon ## Description The first part of PR: Instead of constructing a byte array with byte output stream and writers, and then later writing again to a stream, we now get an OutputStream in and write directly to the stream Additionally PR contains #14184 - Support to update the checksum collection from bytes, which might be received or read from the file already. This is useful, if we want to avoid re-reading files etc. - Support to persist snapshot with given checksum collection. - collection of checksum is no longer recreated and directly written to the directory. <!-- Please explain the changes you made here. --> ## Related issues <!-- Which issues are closed by this PR or are related --> related #14045 Co-authored-by: Christopher Zell <zelldon91@googlemail.com> Co-authored-by: Christopher Kujawa (Zell) <zelldon91@googlemail.com>
Merge conflict. |
Description
Related issues
relate #14045
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.
Please refer to our review guidelines.