Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TabletFile methods #2854

Closed
wants to merge 1 commit into from

Conversation

ctubbsii
Copy link
Member

@ctubbsii ctubbsii commented Aug 6, 2022

  • Remove unused validate method in StoredTabletFile
  • Add compareTo/equals/hashCode to StoredTabletFile
  • Add javadoc to TabletFile compareTo/equals/hashCode to explain need
    for commutativity
  • Add secondary comparison to compareTo for StoredTabletFile cases, to
    ensure consistent ordering
  • Simplify equals and short-circuit if same object in TabletFile

* Remove unused validate method in StoredTabletFile
* Add compareTo/equals/hashCode to StoredTabletFile
* Add javadoc to TabletFile compareTo/equals/hashCode to explain need
  for commutativity
* Add secondary comparison to compareTo for StoredTabletFile cases, to
  ensure consistent ordering
* Simplify equals and short-circuit if same object in TabletFile
* The ordering of this class depends on the normal String ordering of the normalized paths first,
* then any original path, if one exists as a StoredTabletFile, second. To ensure subclasses don't
* break things, and to preserve commutativity, subclasses should defer to this implementation.
*/
Copy link
Contributor

@EdColeman EdColeman Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of using StoredTabletFile original path in the comparison breaks the implied contract that

(x.compareTo(y) == 0 ) ==> (x.equals(y))

Collections that use hashCode and equals (HashSet / Map) will behave differently than collections that use compareTo (TreeSet / Map) This is similar to the way BigDecimal handles compareTo and equals But is is unusual.

This seems that it could introduce issues. If somehow there were two metadata entries that had different "original" paths but resolved to the same normalized path and if they were put into a HashSet and a TreeSet the collections will have different contents.

For example:

If you have two StoredTabletFiles:

    StoredTabletFile file1 = new StoredTabletFile(
            "hdfs://nn.somewhere.com:86753/accumulo/tables/" + id + "/" + dir + "/" + filename);

    StoredTabletFile file2 = new StoredTabletFile(
            "hdfs://nn.somewhere.com:86753/./accumulo/tables/" + id + "/" + dir + "/" + filename);

And stored them in a TreeSet and a HashSet you end up with different collections (both show the "original name(s)" in the collection:

TreeSet - 2 entries
   hdfs://nn.somewhere.com:86753/./accumulo/tables/2a/t-0003/C0004.rf
   hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf

HashSet - 1 entry
   hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collections that use hashCode and equals (HashSet / Map) will behave differently than collections that use compareTo (TreeSet / Map) This is similar to the way BigDecimal handles compareTo and equals But is is unusual.

Yeah, that's definitely a problem.

@milleruntime
Copy link
Contributor

The original intent of StoredTabletFile was to have a more restrictive type of a TabletFile. A StoredTabletFile should have an exact string of what is in the metadata to prevent junk from never being deleted, while a TabletFile may not. To me, that implies that they should not be equal if refer to different metadata entries. Is it possible to have a TabletFile equal to a StoredTabletFile with these changes?

@EdColeman
Copy link
Contributor

Currently (without these changes) StoredTabletFile does not have equals, hashCode or CompareTo - they just by default use the parent file (TabletFile). Currently, that means that any metadata entry in StoredTabletFile is ignored when equals, hashCode or compareTo is called. Any StoredTableFIle will compare with other StoredTableFiles, or a TabletFile without considering the metadata string as long as the normalized path compares / is equal.

These changes just make that explicit for equals() and hashCode() - but the change also modifies compareTo() so that the metadata string in StoredTabletFile is part of the comparison. With what @milleruntime suggested is intent of the StoredTabletFile class - this change aligns the compareTo() with his statement that objects with different metadata string should not compare as equal (comapreTo(..) will not equal 0 ). However, two objects with different metadata entries, but the same normalized path will be equal and have the same hash code.

@EdColeman
Copy link
Contributor

Had a discussion with @milleruntime about this - while these change make the existing code work - it may be better to make other changes in the code so that classes that inherit from TabletFile do not need to rely on this behavior in these changes. While we have looked at StoredTabletFile ScanServerRefTabletFile also extends TabletFile and may have similar issues.

@ctubbsii
Copy link
Member Author

Okay, it looks like there are several issues, and this PR isn't going to cut it:

  1. compareTo and equals and hashCode currently rely on the implementations in the base class. This makes it unreliable for using the StoredTabletFile class in data structures in the context of requiring separate entries for each metadata entry when it is found stored non-normalized.
  2. Overriding any of these methods can break existing code that relies on the current behavior. But, the current behavior might be broken, too (because of bullet point number 1 above).
  3. There are a few cases where we place TabletFile in the same data structure as the StoredTabletFile, requiring them to have cohesive implementations for these methods.
  4. ScanServerRefTabletFile complicates things further, because it overrides equals and hashCode, forcing incompatibility with the base class implementation, if these happen to get stored in the same data structure.

I had considered removing the compareTo method, and defining an explicit comparator. That would address the places where we are trying to compare using the natural ordering of the base class and the subclass. But, it doesn't address these other issues.

Creating an interface at the base might help make things less confusing... I'm not sure. Either way, it looks like we're going to have to dig through every use of these classes in any kind of data structure, to evaluate for correct reliance on the equals, hashCode, and compareTo implementations.

Closing this PR, because it's not the right fix.

@ctubbsii ctubbsii closed this Aug 11, 2022
@ctubbsii
Copy link
Member Author

Is it possible to have a TabletFile equal to a StoredTabletFile with these changes?

It was possible before these changes. These changes didn't change that. In fact, we got breakages when we tried to make that impossible. So, there is currently code relying on these being equal.

@ctubbsii ctubbsii deleted the tabletFileUtilMethods branch August 11, 2022 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants