Skip to content

Equals#150

Merged
kroenlein merged 6 commits intomainfrom
equals
Oct 6, 2021
Merged

Equals#150
kroenlein merged 6 commits intomainfrom
equals

Conversation

@kroenlein
Copy link
Copy Markdown
Collaborator

As per PLA-8254, this implements GEMD object comparison, crawling over an entire material history and respecting the relationship between LinkByUIDs and the Base Entities they reference.

@kroenlein kroenlein requested a review from maxhutch September 27, 2021 02:58
Copy link
Copy Markdown
Contributor

@maxhutch maxhutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks like a reasonable approach. It'd be nice to get a bit more test coverage of the material history crawling logic; the unit tests added all look very simple. A few questions inline.

Comment thread gemd/entity/object/process_spec.py Outdated
return self._output_material

# To avoid infinite recursion, fast return on revisit
global eq_seen
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a global because you can't change the signature of __eq__? Why put the global in the class instead of outside of it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; you can't change the sig of eq and flow of control moves between several modules / is re-entrant. I wish there was a way to get a tighter scope, but a closure wouldn't get ahead of it. It's possible there's a better thread/exception safe approach, but I was also trying to maintain velocity.

Not opposed to moving the declaration to the top of file, but wrote it this way to be close to where it is used in the file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better approach implemented, using the object itself as the state vector.

Comment thread gemd/entity/object/process_spec.py Outdated
else:
result = False

eq_seen.remove((self, other))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be missed if an exception is raised between the equ_seen.add and this. I suggest using a try/catch/finally to make sure its removed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd initially skipped the try-catch because of performance concerns, but it seems like this would be prudent.

# Note the hash function checks if objects are identical, as opposed to the equals method,
# which checks if fields are equal. This is because BaseEntities are fundamentally
# mutable objects.
def __hash__(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this definition? Doesn't it automatically inherit from its parent?

Copy link
Copy Markdown
Collaborator Author

@kroenlein kroenlein Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually does not; you need to explicitly define a __hash__ function every time you define an __eq__. I went into this expecting ^^ would have been correct.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be worth mentioning in the inline comments


def __eq__(self, other):
if (self, other) in eq_seen: # Cycle encountered
return True # This will functionally be & with the correct result of ==
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get lit up in the tests?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When the Process Spec iterates over its Ingredient Specs, each ingredient.process bounces through this check.

@kroenlein kroenlein requested a review from maxhutch September 30, 2021 14:58
maxhutch
maxhutch previously approved these changes Oct 4, 2021
Copy link
Copy Markdown
Contributor

@maxhutch maxhutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It might benefit from a bit more developer docs in the comments.

result = all(ing in other.ingredients for ing in self.ingredients)
elif (not self.ingredients and self.uids) \
or (not other.ingredients and other.uids):
result = True # One can be empty if you flattened
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they are equal if they have matching ingredients OR one of them doesn't have any ingredients defined?

Copy link
Copy Markdown
Collaborator Author

@kroenlein kroenlein Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly correct. If they have matching ingredients OR (one of them has no ingredients AND there was a UID dictionary to validate).

Intended to compensate for the fact that ingredients is set implicitly by a relationship that wouldn't be initialized if you just compared the result of dataset.process_runs.get() in citrine-python.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make and self.uids and and other.uids a little more explicit? The way that python automatically casts structs to booleans is really easy to mess up / easy to be confused by when reading. Logically, this makes sense, though.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those aren't compared here - they are tested in DictSerializable.__eq__. I'm intending to make sure that any uid exists. I would be open to changing that line to (len() == 0 / len() != 0) if that reads better.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I meant.

@kroenlein kroenlein requested a review from maxhutch October 6, 2021 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants