Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

add equality check for index field; allennlp interpret #3073

Merged
merged 10 commits into from
Jul 18, 2019
Merged

add equality check for index field; allennlp interpret #3073

merged 10 commits into from
Jul 18, 2019

Conversation

Eric-Wallace
Copy link
Contributor

This is needed in AllenNLP interpret to check for equality between two reading comprehension model outputs.

@matt-gardner
Copy link
Contributor

Can you add a test case to this that fails before this change and passes afterward?

@Eric-Wallace
Copy link
Contributor Author

Actually, it looks like there is a test to make sure we don't have this functionality, see here https://github.com/allenai/allennlp/blob/master/allennlp/tests/data/fields/index_field_test.py#L41. Any reason why that is there @joelgrus?

@joelgrus
Copy link
Contributor

originally an IndexField was only equal to itself. then there was an issue to allow comparing it to its value:

#1926

so we changed that, but I added a test that it wasn't equal to a different index field because that was not desired / expected behavior (at the time, maybe it is now)

@joelgrus
Copy link
Contributor

that said, I mostly feel like it's the right behavior (if you look at the original issue, I wasn't a huge fan of the proposed change) and that you probably shouldn't be checking fields for equality this way. What's the argument for allowing it?

@Eric-Wallace
Copy link
Contributor Author

Im using this in an update to the interpret PR. Its possible to get around adding this PR by having an extra check if the object I want to compare is an IndexField. I can do that if that's preferred.

@matt-gardner
Copy link
Contributor

Imagine you want to compare two Instances to see if they're representing the same thing. They will be different objects, but comparable to each other. There are probably other places where we would want to do this also, but the interpret code definitely needs this, because there are loops where we try changing the inputs of an Instance and check to see if the output has changed.

More broadly, though, why should we have some Field types be comparable across different objects, but not IndexField?

@matt-gardner
Copy link
Contributor

In particular, this is the default implementation:

def __eq__(self, other) -> bool:
if isinstance(self, other.__class__):
return self.__dict__ == other.__dict__

Seems like classes that override this should only be more permissive, not less.

@joelgrus
Copy link
Contributor

I get the use case, I'm just not sure that two Instances should be equal (in a Python object sense) whenever their fields contain the same data. If I have two IndexFields that both contain 3 but have different underlying SequenceFields should they be equal? It's not entirely clear to me how surprising that would be or if it might cause hard-to-debug errors.

@matt-gardner
Copy link
Contributor

If they have different underlying sequence fields, self.__dict__ == other.__dict__ will be false. Functions should be contravariant in their input types, and having the base Field.__eq__ be more permissive than subtypes feels like it breaks this.

@matt-gardner
Copy link
Contributor

>>> from allennlp.data.fields import Field, IndexField
>>> f = IndexField(3, None)                           
>>> g = IndexField(3, None)                           
>>> f == g                                            
False                                                 
>>> Field.__eq__(f, g)                                
True                                                  
>>>                                                   

@joelgrus
Copy link
Contributor

but that's not the proposed change?

if isinstance(other, IndexField):
    return self.sequence_index == other.sequence_index

or am I missing something?

@matt-gardner
Copy link
Contributor

I haven't actually looked at the proposed change, other than to ask for a test. I was responding to your arguments. I definitely agree that that change is wrong, it should just fall back to a __dict__ comparison.

More tests, just for kicks:

>>> from allennlp.data.fields import Field, IndexField, TextField
>>> from allennlp.data import Token           
>>> t = TextField([Token('hi')], None)        
>>> t2 = TextField([Token('bye')], None)      
>>> t3 = TextField([Token('hi')], None)       
>>> t == t3                                   
True                                          
>>> t == t2                                   
False                                         
>>> f = IndexField(3, t)                      
>>> g = IndexField(3, t2)                     
>>> f == g                                    
False                                         
>>> Field.__eq__(f, g)                        
False                                         
>>> g = IndexField(3, t3)                     
>>> f == g                                    
False                                         
>>> Field.__eq__(f, g)                        
True                                          
>>>                                           

@joelgrus
Copy link
Contributor

my arguments were about the proposed change 😢

@matt-gardner
Copy link
Contributor

@Eric-Wallace, adding something like the examples I have above would be a good way to test this (and check both cases, that equality fails if the underlying sequence fields don't test equal).

@matt-gardner
Copy link
Contributor

Sorry =).

@matt-gardner
Copy link
Contributor

Are you happy with what I'm suggesting?

@matt-gardner
Copy link
Contributor

Also, to be super clear, I would propose this:

    def __eq__(self, other) -> bool:
        # Allow equality checks to ints that are the sequence index
        if isinstance(other, int):
            return self.sequence_index == other
        return super().__eq__(other)

@joelgrus
Copy link
Contributor

yeah, that seems fine

@Eric-Wallace
Copy link
Contributor Author

If the SequenceField's are two different objects but have the same value, then this won't recursively do the equals, right?

In my use case I have:

{'sequence_index': 12, 'sequence_field': <allennlp.data.fields.text_field.TextField object at 0x2af2ef5c2978>}
{'sequence_index': 12, 'sequence_field': <allennlp.data.fields.text_field.TextField object at 0x2af2ef5dc898>}

Note the objects are different. Matt's proposed change will say these two index fields are not equal

@matt-gardner
Copy link
Contributor

Look at the examples I gave. It'll work.

@Eric-Wallace
Copy link
Contributor Author

But I just ran this and it didn't work. I think TextField needs an equals method?

@matt-gardner
Copy link
Contributor

Is that because you're changing the underlying text field? It clearly works in my example. What's different about what you're doing?

@matt-gardner
Copy link
Contributor

matt-gardner commented Jul 17, 2019

It's probably best to just make a test case for this that shows your use case. So:

  1. Update the __eq__ method as I suggested.
  2. Write some test cases, including the simple ones I gave, and whatever case is failing for you.

Once you have failing test cases with this implementation, I'll be in a much better situation to help figure out any remaining issues.

@Eric-Wallace
Copy link
Contributor Author

Yeah I have two text fields that have the same everything but are different objects.

@matt-gardner
Copy link
Contributor

See t and t3 above where this works just fine. Again, just make the test cases and show me the failure.

@Eric-Wallace
Copy link
Contributor Author

The reason mine is failing was because my _token_indexers on the TextField are different objects.

@Eric-Wallace
Copy link
Contributor Author

because we are deepcopying the instances each time I think

@matt-gardner
Copy link
Contributor

I am comfortable adding a similar __eq__ implementation to TokenIndexer.

@Eric-Wallace
Copy link
Contributor Author

Eric-Wallace commented Jul 18, 2019

Seems there already was one, and the problem was in CharacterTokenizer. I just added an equals like the above that checks the __dict__ for equality.

@joelgrus
Copy link
Contributor

so this seems like we're creating an enormously complicated solution to a not complicated problem?

you want to change the minimal number of inputs so that the outputs change.

why not just compare the outputs directly rather than shoving them into existing fields [?] and comparing the fields?

@joelgrus
Copy link
Contributor

like, it's a bad code smell for me that suddenly we have to care very deeply about when fields are equal and when token indexers [!] are equal when all we really want to do is compare some model outputs. Am I missing something?

@@ -156,7 +156,8 @@ def attack_from_json(self,
current_instance_labeled = self.predictor.predictions_to_labeled_instances(current_instance,
outputs)[0]
# if the prediction has changed, then stop
if any(current_instance_labeled[field] != fields_to_compare[field] for field in fields_to_compare):
if any(not current_instance_labeled[field].__eq__(fields_to_compare[field])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't call .__eq__, just use ==. I only did that in my example to force using the base class implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you shouldn't need to change this line at all. Why do you need to change this? Changing the other parts should have been sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O I see. I didn't know how equals works in python.

@matt-gardner
Copy link
Contributor

In general, yes I think it is reasonable to expect our data objects to test for equality of state, not equality of id. This is something we already support. We have run into this issue before, which is why both the Field and TokenIndexer already have the __eq__ methods implemented. If you look at the diff, there are only two places that are changed, and both of them are bringing things in line with what was already there.

It's possible that there was a better way to check for equality of model outputs (I'm not certain about that, though, because the code needs to be entirely generic about what model it gets, and it forced a particular design). But, again, what they did worked for everything except IndexField out of the box. This is just making things consistent.

@Eric-Wallace
Copy link
Contributor Author

Ok, this should do it.

Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @joelgrus, I'm going to merge this soon, unless you're still strongly opposed to this.

@matt-gardner matt-gardner merged commit a1476c0 into allenai:master Jul 18, 2019
reiyw pushed a commit to reiyw/allennlp that referenced this pull request Nov 12, 2019
* add equality check for index field; allennlp interpret

* add test

* change hotflip to use equals method

* tests per matt

* newline

* change input reduction to eq also

* undo

* add newline

* fix pylutn
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants