Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 9438][pulsar-client] Fix inconsistent equals and hashCode for MessageIds #9440

Merged
merged 1 commit into from
Apr 21, 2021

Conversation

gmethvin
Copy link
Member

@gmethvin gmethvin commented Feb 3, 2021

Fixes #9438

Motivation

Since MessageId is an opaque interface that has several subclasses, in general those subclasses should try to behave the same way with respect to one another if they refer to the same message. They should also follow the standard contract for Java equals and hashCode methods.

Modifications

Provided a single equals and hashCode implementation for MessageIdImpl that takes into account the batch version as well. Changed TopicMessageIdImpl to not take into account the topic anymore.

Verifying this change

  • Extended unit test for BatchMessageIdImpl
  • Added unit test for TopicMessageIdImpl

@sijie sijie added this to the 2.8.0 milestone Feb 3, 2021
@gmethvin gmethvin force-pushed the fix-messageId-equals branch 4 times, most recently from 57b7ddb to 5395426 Compare February 4, 2021 00:13
@gmethvin
Copy link
Member Author

gmethvin commented Feb 4, 2021

I also noticed the compareTo implementation seemed a bit inconsistent, so I tried to simplify it, but I'm not sure what the intention was exactly so I may not have implemented it properly.

@sijie
Copy link
Member

sijie commented Feb 8, 2021

@merlimat Can you review it again?

@gmethvin
Copy link
Member Author

@codelipenghui @315157973 responded to your comments.

I get that the implementation for this is a little weird, so let me know if you have any other suggestions to achieve the same goal. Ideally we would have single message ID type that can represent either batched or unbatched messages, but I'm not sure how feasible that is without breaking changes.

@eolivelli
Copy link
Contributor

Thinking out loud:

What about keeping your new tests cases, they express exactly the problem.
and fix the "equals" and "hashCode" methods of all of the implementations of MessageId in a way that all of the combinations work?
(it would be great to be able to mark MessageId as a 'sealed' interface).

in order to implement hashCode you could add a static utility method that receives all of the relevant fields:

public static int computeMessageIdHashCode(long ledgerId, long entryId, long partitionIndex, long batchIndex)

the same can be done for the equals method

public static booleanMessageIdEquals(long ledgerId1, long entryId1, long partitionIndex1, long batchIndex1, long ledgerId2, long entryId2, long partitionIndex2, long batchIndex2 )

@gmethvin
Copy link
Member Author

@eolivelli That sounds workable. I'd assume we'd want to do the same for compareTo as well?

@eolivelli
Copy link
Contributor

Sure

@315157973
Copy link
Contributor

315157973 commented Mar 20, 2021

in order to implement hashCode you could add a static utility method that receives all of the relevant fields:
public static int computeMessageIdHashCode(long ledgerId, long entryId, long partitionIndex, long batchIndex)

batchIndex is an attribute that only BatchMessageId has, and it doesn't seem appropriate to put it in the top MessageId.

I don’t know much about the historical reasons, why is there such a parameter as NO_BATCH in BatchMessageIdImpl

There is a special treatment for NO_BATCH in equals. When batchIndex=-1 (NO_BATCH), only ledgerId, entryId and partitionIndex need to be equal.

Can the HashCode also be specially processed? When batchIndex=-1, batchIndex is not allowed to participate in the calculation of hashcode

@eolivelli

@eolivelli
Copy link
Contributor

The constraint is that if two objects are 'equals' to each other they have the same hash code.

@315157973
Copy link
Contributor

315157973 commented Mar 20, 2021

The constraint is that if two objects are 'equals' to each other they have the same hash code.

If we do special processing in hashCode as in the euqals method, when batchIndex=-1, batchIndex is not allowed to participate in the calculation of hashcode, so the hashCode is equal

The problem now is that special processing is done in equals, but not in hashcode:

@gmethvin
Copy link
Member Author

@315157973 I'm also unsure about why we have NO_BATCH. Can an unbatched message can't be treated like a batched message with an index of 0?

@315157973
Copy link
Contributor

@315157973 I'm also unsure about why we have NO_BATCH. Can an unbatched message can't be treated like a batched message with an index of 0?

The current hashcode is like this
return (int) (31 * (ledgerId + 31 * entryId) + (31 * (long) partitionIndex) + batchIndex);

If it is consistent with the logic of equal, it should be:

if(batchIndex == NO_BATCH){
return (int) (31 * (ledgerId + 31 * entryId) + (31 * (long) partitionIndex));
}
return (int) (31 * (ledgerId + 31 * entryId) + (31 * (long) partitionIndex) + batchIndex);

cc @eolivelli

@eolivelli
Copy link
Contributor

@315157973 works for me.
Let's add tests that cover all edge cases

@gmethvin
Copy link
Member Author

@315157973 Let me clarify what I'm asking.

As you mentioned, the current BatchMessageIdImpl#equals takes into account all fields including the batchIndex:

. If there is no batchIndex it is assumed to be NO_BATCH (-1).

And BatchMessageIdImpl#hashCode is:

return (int) (31 * (ledgerId + 31 * entryId) + (31 * (long) partitionIndex) + batchIndex);

So for BatchMessageIdImpl, it does appear consistent with equals, as it takes into account the batchIndex.

The inconsistency is that MessageIdImpl#hashCode does not take into account the batchIndex:

return (int) (31 * (ledgerId + 31 * entryId) + partitionIndex);

This can be fixed by using the the same implementation as in BatchMessageIdImpl#hashCode and assuming batchIndex = NO_BATCH, like I have done in this pull request.

I think you are suggesting changing BatchMessageIdImpl#hashCode to:

if(batchIndex == NO_BATCH){
   return (int) (31 * (ledgerId + 31 * entryId) + (31 * (long) partitionIndex));
}
return (int) (31 * (ledgerId + 31 * entryId) + (31 * (long) partitionIndex) + batchIndex);

Which would only be necessary if equals treats batchIndex = NO_BATCH as equivalent to batchIndex = 0, which it does not currently. But should it?

Or are you suggesting only treating batchIndex = NO_BATCH like batchIndex = 0 only for the purposes of hashCode?

@gmethvin
Copy link
Member Author

I talked to @sijie and he said that batchIndex = 0 and batchIndex = NO_BATCH should not be treated as not equal, so it should be fine for them to be both unequal and have unequal hash codes.

@gmethvin
Copy link
Member Author

@315157973 @eolivelli I updated to delegate to the static method.

I think I understand what you're saying now. Your change to BatchMessageIdImpl#hashCode works if you keep the existing code in MessageIdImpl#hashCode. The problem is now you have the logic in two different places, which I'd like to avoid, since that's probably why we ended up with this inconsistency to begin with. Having the logic in one place makes the code a lot easier to understand and reduces the chance of future breakage.

@gmethvin gmethvin force-pushed the fix-messageId-equals branch 2 times, most recently from b43cf81 to f946510 Compare March 25, 2021 00:50
@315157973
Copy link
Contributor

works for me

@gmethvin
Copy link
Member Author

Thanks @315157973. I added some tests for some of the edge cases, but I'd appreciate feedback on any other tests I can add.

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -73,7 +73,7 @@ public void testGreaterThan() {
assertTrue(batchMessageId2.compareTo(batchMessageId3) > 0, "Expected to be greater than");
assertTrue(batchMessageId2.compareTo(batchMessageId4) > 0, "Expected to be greater than");
assertTrue(batchMessageId2.compareTo(batchMessageId5) > 0, "Expected to be greater than");
assertTrue(batchMessageId3.compareTo(batchMessageId4) > 0, "Expected to be greater than");
assertTrue(batchMessageId4.compareTo(batchMessageId3) > 0, "Expected to be greater than");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting catch

@eolivelli
Copy link
Contributor

@merlimat can you take a final look ?

Copy link
Contributor

@315157973 315157973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@codelipenghui
Copy link
Contributor

@merlimat Could you please help review it again

Copy link
Contributor

@golden-yang golden-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! LGTM

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@merlimat merlimat merged commit f56ae72 into apache:master Apr 21, 2021
@gmethvin gmethvin deleted the fix-messageId-equals branch April 21, 2021 22:58
@eolivelli
Copy link
Contributor

This patch is too dangerous for 2.7.2 IMHO, I have removed the release/2.7.2 label
@codelipenghui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistencies in MessageId equals and hashCode
7 participants