New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-2905: Fix Utf8 hash cache #955
Conversation
byte[] bytes = getBytesFor(string); | ||
int length = bytes.length; | ||
if (length > MAX_LENGTH) { | ||
throw new AvroRuntimeException("String length " + length + " exceeds maximum allowed"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure, but this seemed like a bug to me as well.
MAX_LENGTH
should be kept at all time right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is, the other 2 constructors need to be changed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that seems reasonable to me! Do you want to make the change in the PR directly or create a JIRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added to this PR 😄
@RyanSkraba 🙇🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! We can merge this -- I'll give it a day if you want to fix the marker value for the not-yet-calculated hash or the string (byte) length constructors!
@@ -119,16 +120,21 @@ public Utf8 setByteLength(int newLength) { | |||
} | |||
this.length = newLength; | |||
this.string = null; | |||
this.hasHash = false; | |||
this.hash = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! For consistency with how Schema caches the hash, what do you think about using Integer.MIN_VALUE instead of 0? Not a big deal, except that all "zeroed" byte arrays will otherwise have hashCode 0 and be recalculated every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer 0 because all fields get initialized as 0 in Java...
Would it be okay to leave unchanged?
byte[] bytes = getBytesFor(string); | ||
int length = bytes.length; | ||
if (length > MAX_LENGTH) { | ||
throw new AvroRuntimeException("String length " + length + " exceeds maximum allowed"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that seems reasonable to me! Do you want to make the change in the PR directly or create a JIRA?
Thanks for the contribution! I cherry-picked to branch-1.10 so this will appear in the next 1.10.x release as well. |
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation