Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-2905: Fix Utf8 hash cache #955

Merged
merged 2 commits into from Sep 30, 2020
Merged

Conversation

lukemin89
Copy link

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@probot-autolabeler probot-autolabeler bot added the Java Pull Requests for Java binding label Sep 8, 2020
Comment on lines 129 to 133
byte[] bytes = getBytesFor(string);
int length = bytes.length;
if (length > MAX_LENGTH) {
throw new AvroRuntimeException("String length " + length + " exceeds maximum allowed");
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure, but this seemed like a bug to me as well.
MAX_LENGTH should be kept at all time right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is, the other 2 constructors need to be changed as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that seems reasonable to me! Do you want to make the change in the PR directly or create a JIRA?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added to this PR 😄

@lukemin89
Copy link
Author

@RyanSkraba 🙇🙏

@RyanSkraba RyanSkraba self-requested a review September 8, 2020 09:50
Copy link
Contributor

@RyanSkraba RyanSkraba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! We can merge this -- I'll give it a day if you want to fix the marker value for the not-yet-calculated hash or the string (byte) length constructors!

@@ -119,16 +120,21 @@ public Utf8 setByteLength(int newLength) {
}
this.length = newLength;
this.string = null;
this.hasHash = false;
this.hash = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! For consistency with how Schema caches the hash, what do you think about using Integer.MIN_VALUE instead of 0? Not a big deal, except that all "zeroed" byte arrays will otherwise have hashCode 0 and be recalculated every time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer 0 because all fields get initialized as 0 in Java...
Would it be okay to leave unchanged?

Comment on lines 129 to 133
byte[] bytes = getBytesFor(string);
int length = bytes.length;
if (length > MAX_LENGTH) {
throw new AvroRuntimeException("String length " + length + " exceeds maximum allowed");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that seems reasonable to me! Do you want to make the change in the PR directly or create a JIRA?

@RyanSkraba RyanSkraba merged commit 9a6aa43 into apache:master Sep 30, 2020
RyanSkraba pushed a commit that referenced this pull request Sep 30, 2020
* AVRO-2905: Fix Utf8 hash cache

* AVRO-2905: Reflect Comment, add length check
@RyanSkraba
Copy link
Contributor

Thanks for the contribution! I cherry-picked to branch-1.10 so this will appear in the next 1.10.x release as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
2 participants