-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Better multi-word predicates in Open IE predictors #1759
Better multi-word predicates in Open IE predictors #1759
Conversation
…span_metric` parameter to the SRL model to accomodate for Open IE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly stylistic things, plus one logic question
""" | ||
Return the word indices of a predicate in BIO tags. | ||
""" | ||
return [ind for (ind, tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't have to use parens here
another nit: you could probably fit this on one line (and if you can't, the in
is a weird place to split it)
""" | ||
# Get predicate word indices from both predictions | ||
pred_ind1, pred_ind2 = map(get_predicate_indices, | ||
[tags1, tags2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an unusual use of map
and makes it less obvious what you're doing. just do
pred_ind1 = get_predicate_indices(tags1)
pred_ind2 = get_predicate_indices(tags2)
it's the same number of lines and much clearer
""" | ||
# Allow tags1 to add elements to tags2 | ||
return [tag2 if (tag2 != 'O')\ | ||
else tag1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this else
on the previous line, it will make the code clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the parens aren't really necessary here either
the embedded predicate ("run"). | ||
""" | ||
pred_dict: Dict[str, List[str]] = {} | ||
merged_outputs = list(map(join_mwp, outputs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I am not a fan of list(map
, just use a list comprehension instead
return " ".join([sent_tokens[pred_id].text | ||
for pred_id in get_predicate_indices(tags)]) | ||
|
||
def check_predicates_subsumed(tags1: List[str], tags2: List[str]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the check
in the name, it makes it read awkward when you use it later. I'd call it something like predicates_are_subsumed
|
||
# Else - check if this predicate if subsumed by another predicate | ||
for pred2_text, tags2 in zip(predicate_texts, merged_outputs): | ||
if (tags1 != tags2) and check_predicates_subsumed(tags1, tags2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see, if it were called predicates_are_subsumed
, this line would read much clearer
also nit: you don't need parens here
for (tag1, tag2) in zip(tags1, tags2)] | ||
|
||
def consolidate_predictions(outputs: List[List[str]], sent_tokens: List[Token]) -> Dict[str, List[str]]: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to have (say) predicate_a < predicate_b < predicate_c ?
it seems like your logic wouldn't work in that case
@joelgrus I addressed your style comments, and changed the predicate merging logic to be able to merge more than 2 predicates (also added a test for that). PTAL, thanks! |
# Consolidate | ||
pred_dict = consolidate_predictions(predictions, sent_tokens) | ||
|
||
# Check that only "decided to join" is left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pred_ind2 = get_predicate_indices(tags2) | ||
|
||
# Return if pred_ind1 is contained in pred_ind2 | ||
return any(set.intersection(set(pred_ind1), set(pred_ind2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this returns true if they intersect, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok, maybe just the docstring and comment are wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, intersection is what I was going for. The docstrings were wrong.
def merge_overlapping_predictions(tags1: List[str], tags2: List[str]) -> List[str]: | ||
""" | ||
Merge two predictions into one. Assumes the predicate in tags1 are contained in | ||
the predicate of tags2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "contained in" correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, leftover docstring. I fixed that.
Consolidate multiword predicate (e.g., "decided to run"), in which case we don't need to return the embedded predicate ("run").