Pass instance metadata through to `forward`, use it to compute official BiDAF metrics #216

matt-gardner · 2017-08-29T16:42:52Z

There are two main things here:

I made the metadata that gets associated with each instance available in Model.forward(), if it's requested. I used some reflection in order to make this work without requiring Model.forward() to accept a metadata argument. This is a bit magical, but I didn't really want to require every model to have to accept this argument.
I used this metadata inside of BiDAF to compute the official metrics.

DeNeutoy

I'm not thrilled with this, but I can see it's necessary for computing Bidaf metrics. There seems to be quite a bit of special casing for the "metadata" key which makes me think that the forward signature is fundamentally wrong, if we need this sort of info to do forward passes of models (or that doing this sort of thing is fine in a separate training script but doesn't belong in the library) and the reflection isn't great as you said. But ultimately I think my only complaint here is more to do with building stuff from params, so LGTM.

matt-gardner · 2017-08-29T20:54:12Z

We could have a MetadataField, and just remove all of the special casing and reflection, requiring the model to deal with the metadata if it used a reader that produces it... We'll, we'd probably need just a little bit of special casing when producing arrays, but not nearly as much as there is now. That might be a bit cleaner. What do you think?

DeNeutoy · 2017-08-30T01:08:08Z

One more idea to not have to do this: pass in a character representation of the passage without tokenising it first. This would make the predicted indices line up and remove the need to pass metadata. It's still a bit clunky though, and isn't particularly memory efficient. Is there a reason I haven't thought of why this won't work?

Otherwise, I'm not sure about the MetadataField idea, because ideally you'd want it to be a mixin, as it isn't really something which is constrained to any particular field, and that seems just as obscure as having to special case stuff in what we already have.

matt-gardner · 2017-08-30T01:15:12Z

I'm not sure what you mean by "a character representation of the passage without tokenizing it first", or how having that would solve the problem.

For the MetadataField, it would be a field that does not get indexed and does not get converted into numpy arrays or pytorch variables, but otherwise holds whatever you want, like a question ID, original passage string, or whatever. Not sure how a mixin would solve anything here.

DeNeutoy · 2017-08-30T01:38:41Z

The problem is that the predicted char span is for a tokenised version of the passage but the gold char span is for a non-tokenised version of the passage, right? So what if we passed in a non-tokenised character representation of the exact answer as another "label". Maybe I'm not understanding the problem at all, but it seems like it's possible to do exact string matching using indexed tensors where the indexed tensor is a padded version of the exact string answer.

matt-gardner · 2017-08-30T01:51:02Z

The problem is that I need a predicted character span, and all I have is a predicted token span. I need a way to go from the token span to the character span. I don't know of a way to do that other than keeping track of character offsets.

* Set default input limits for forms (#209) * Limit the max input of text inputs and areas. * maxlength -> maxLength * Use rudder for deploy related functionality. (#211) * Use rudder for deploy related functionality. This ensures that the application is deployed to the latest n' greatest skiff cluster. * Use the right image path. * Release pending changes. (#212) (#213) * Set default input limits for forms (#209) * Use rudder for deploy related functionality. (#211) * Bert srl (#214) * update srl model * update description * bump to commit which includes srl model * set timeout to 15 minutes * update to latest image, add numbers (#216)

matt-gardner requested a review from DeNeutoy August 29, 2017 16:42

DeNeutoy approved these changes Aug 29, 2017

View reviewed changes

matt-gardner added 2 commits August 30, 2017 14:50

Got it working...

64ecc64

Fix pylint

9225b07

matt-gardner force-pushed the tokenizer_offsets branch from 9c7b766 to 9225b07 Compare August 30, 2017 14:50

Merge branch 'master' into tokenizer_offsets

0af1bb7

matt-gardner merged commit fb73633 into allenai:master Aug 30, 2017

matt-gardner deleted the tokenizer_offsets branch August 30, 2017 16:30

matt-gardner mentioned this pull request Aug 30, 2017

Change Instance.metadata into a MetadataField #225

Closed

schmmd pushed a commit that referenced this pull request Feb 26, 2020

update to latest image, add numbers (#216)

92f27a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass instance metadata through to `forward`, use it to compute official BiDAF metrics #216

Pass instance metadata through to `forward`, use it to compute official BiDAF metrics #216

matt-gardner commented Aug 29, 2017

DeNeutoy left a comment

matt-gardner commented Aug 29, 2017

DeNeutoy commented Aug 30, 2017

matt-gardner commented Aug 30, 2017

DeNeutoy commented Aug 30, 2017 •

edited

matt-gardner commented Aug 30, 2017

Pass instance metadata through to forward, use it to compute official BiDAF metrics #216

Pass instance metadata through to forward, use it to compute official BiDAF metrics #216

Conversation

matt-gardner commented Aug 29, 2017

DeNeutoy left a comment

Choose a reason for hiding this comment

matt-gardner commented Aug 29, 2017

DeNeutoy commented Aug 30, 2017

matt-gardner commented Aug 30, 2017

DeNeutoy commented Aug 30, 2017 • edited

matt-gardner commented Aug 30, 2017

Pass instance metadata through to `forward`, use it to compute official BiDAF metrics #216

Pass instance metadata through to `forward`, use it to compute official BiDAF metrics #216

DeNeutoy commented Aug 30, 2017 •

edited