Skip to content

Commit

Permalink
Merge pull request #276 from DeNeutoy/matcher-hearst
Browse files Browse the repository at this point in the history
Spacy Matcher for hearst patterns
  • Loading branch information
DeNeutoy committed Oct 15, 2020
2 parents b470961 + 622a827 commit df9e83b
Show file tree
Hide file tree
Showing 4 changed files with 768 additions and 0 deletions.
28 changes: 28 additions & 0 deletions README.md
Expand Up @@ -229,6 +229,34 @@ for umls_ent in entity._.kb_ents:
AR protein, human, Androgen Receptor, Dihydrotestosterone Receptor, AR, DHTR, NR3C4, ...
```

### Hearst Patterns (v3.0)

This component implements [Automatic Aquisition of Hyponyms from Large Text Corpora](https://www.aclweb.org/anthology/C92-2082.pdf) using the SpaCy Matcher component.

Passing `extended=True` to the `HyponymDetector` will use the extended set of hearst patterns, which include higher recall but lower precision hyponymy relations (e.g X compared to Y, X similar to Y, etc).

This component produces a doc level attribute on the spacy doc: `doc._.hearst_patterns`, which is a list containing tuples of extracted hyponym pairs. The tuples contain:

- The relation rule used to extract the hyponym (type: `str`)
- The more general concept (type: `spacy.Span`)
- The more specific concept (type: `spacy.Span`)


#### Usage:

```python
import spacy
from scispacy.hyponym_detector import HyponymDetector

nlp = spacy.load("en_core_sci_sm")
hyponym_pipe = HyponymDetector(nlp, extended=True)
nlp.add_pipe(hyponym_pipe, last=True)

doc = nlp("Keystone plant species such as fig trees are good for the soil.")

print(doc._.hearst_patterns)
>>> [('such_as', Keystone plant species, fig trees)]
```


## Citing
Expand Down

0 comments on commit df9e83b

Please sign in to comment.