In [2]:
import spacy
from spacy.matcher import Matcher, PhraseMatcher
from spacy.tokens import Span
from spacy import displacy

In [3]:
nlp = spacy.load('en_core_web_sm')

In [4]:
doc = nlp(u"SpaceNews is a print and digital publication that covers business \
and political news in the space and satellite industry. SpaceNews provides news, \
commentary and analysis to an audience of government officials, politicians and \
executives within the space industry.")

In [5]:
for i, sentence in enumerate(doc.sents):
    print(f"{i+1}: {sentence}")

1: SpaceNews is a print and digital publication that covers business and political news in the space and satellite industry.
2: SpaceNews provides news, commentary and analysis to an audience of government officials, politicians and executives within the space industry.


### Building a Matcher Object

In [6]:
matcher = Matcher(nlp.vocab)

Creating a list of patterns to identify the target

In [7]:
patterns = [
    [{"LOWER":'spacenews'}]
]

In [8]:
matcher.add(key='SpaceNews', patterns=patterns)

In [9]:
found_matches = matcher(doc)

The found_matches variable contains a tuple for every match found within the doc object. Within each tuple, there is a match id, a start token, and an end location that map the token within the doc.

In [10]:
found_matches

[(10501091333728194545, 0, 1), (10501091333728194545, 20, 21)]

In [11]:
for match_id, start, end in found_matches:
    m_id = nlp.vocab.strings[match_id]
    print(m_id)

SpaceNews
SpaceNews


### Adding a Named Entity to a Span

We don't see spaCy in here as a recognized entity. We can manually add in an entity into a matcher object.

In [12]:
for ent in doc.ents:
    print(f"{ent.text:{50}} {ent.label_:{15}} {spacy.explain(ent.label_)}")

SpaceNews                                          ORG             Companies, agencies, institutions, etc.
SpaceNews                                          ORG             Companies, agencies, institutions, etc.


### Visualizing Named Entity Recognitions

In [58]:
for sent in doc.sents:
    displacy.render(nlp(sent.text), style='ent', jupyter=True)