-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TabletFile methods #2854
Conversation
* Remove unused validate method in StoredTabletFile * Add compareTo/equals/hashCode to StoredTabletFile * Add javadoc to TabletFile compareTo/equals/hashCode to explain need for commutativity * Add secondary comparison to compareTo for StoredTabletFile cases, to ensure consistent ordering * Simplify equals and short-circuit if same object in TabletFile
* The ordering of this class depends on the normal String ordering of the normalized paths first, | ||
* then any original path, if one exists as a StoredTabletFile, second. To ensure subclasses don't | ||
* break things, and to preserve commutativity, subclasses should defer to this implementation. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of using StoredTabletFile original path in the comparison breaks the implied contract that
(x.compareTo(y) == 0 ) ==> (x.equals(y))
Collections that use hashCode and equals (HashSet / Map) will behave differently than collections that use compareTo (TreeSet / Map) This is similar to the way BigDecimal
handles compareTo
and equals
But is is unusual.
This seems that it could introduce issues. If somehow there were two metadata entries that had different "original" paths but resolved to the same normalized path and if they were put into a HashSet and a TreeSet the collections will have different contents.
For example:
If you have two StoredTabletFiles:
StoredTabletFile file1 = new StoredTabletFile(
"hdfs://nn.somewhere.com:86753/accumulo/tables/" + id + "/" + dir + "/" + filename);
StoredTabletFile file2 = new StoredTabletFile(
"hdfs://nn.somewhere.com:86753/./accumulo/tables/" + id + "/" + dir + "/" + filename);
And stored them in a TreeSet and a HashSet you end up with different collections (both show the "original name(s)" in the collection:
TreeSet - 2 entries
hdfs://nn.somewhere.com:86753/./accumulo/tables/2a/t-0003/C0004.rf
hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf
HashSet - 1 entry
hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections that use hashCode and equals (HashSet / Map) will behave differently than collections that use compareTo (TreeSet / Map) This is similar to the way
BigDecimal
handlescompareTo
andequals
But is is unusual.
Yeah, that's definitely a problem.
The original intent of |
Currently (without these changes) StoredTabletFile does not have equals, hashCode or CompareTo - they just by default use the parent file (TabletFile). Currently, that means that any metadata entry in StoredTabletFile is ignored when equals, hashCode or compareTo is called. Any StoredTableFIle will compare with other StoredTableFiles, or a TabletFile without considering the metadata string as long as the normalized path compares / is equal. These changes just make that explicit for equals() and hashCode() - but the change also modifies compareTo() so that the metadata string in StoredTabletFile is part of the comparison. With what @milleruntime suggested is intent of the StoredTabletFile class - this change aligns the compareTo() with his statement that objects with different metadata string should not compare as equal (comapreTo(..) will not equal 0 ). However, two objects with different metadata entries, but the same normalized path will be equal and have the same hash code. |
Had a discussion with @milleruntime about this - while these change make the existing code work - it may be better to make other changes in the code so that classes that inherit from |
Okay, it looks like there are several issues, and this PR isn't going to cut it:
I had considered removing the compareTo method, and defining an explicit comparator. That would address the places where we are trying to compare using the natural ordering of the base class and the subclass. But, it doesn't address these other issues. Creating an interface at the base might help make things less confusing... I'm not sure. Either way, it looks like we're going to have to dig through every use of these classes in any kind of data structure, to evaluate for correct reliance on the equals, hashCode, and compareTo implementations. Closing this PR, because it's not the right fix. |
It was possible before these changes. These changes didn't change that. In fact, we got breakages when we tried to make that impossible. So, there is currently code relying on these being equal. |
for commutativity
ensure consistent ordering