-
Notifications
You must be signed in to change notification settings - Fork 37
Closed
Description
Description
When patterns to exclude from the context are contained in the pattern of the main matcher, entities are always discarded, even if it's not in the right window. This happens when we want to extract "ASA 5" but not "5-ASA" for instance. In this example, we would put a negative window on the excluded pattern but the contextual matcher will however discard the entity. This is the case when the pattern to exclude (here 5) is included in the main pattern as we are looking for an integer after ASA. If we would like to discard "other-ASA", the pipe works, as "other" is not in the main pattern.
To sum up, window seems to be counting from the end_char of the entity whereas in this case, we would expect to be counting from start_char.
How to reproduce the bug
import edsnlp
import edsnlp.pipes as eds
asa_pattern = r"\basa\b ?:? ?([1-5]|[A-Z]{1,3})"
exclude_asa_ttt=r"5"
asa = dict(source="asa",
regex=asa_pattern,
regex_attr="NORM",
exclude=dict(
regex=exclude_asa_ttt,
window=-5)
)
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
eds.contextual_matcher(patterns=[asa], label="asa")
)
text="ASA 5"
docs=nlp(text)
docs.entsYour Environment
- EDS-NLP Version Used: 0.15
Metadata
Metadata
Assignees
Labels
No labels