# Named entity recognition with SpaCy

[SpaCy](https://spacy.io/) is a great NLP library that allows you to do basic things, such as tokenization, morphology analysis, and dependency parsing. But spaCy also contains pre-trained models for named entity recognition and sentence segmentation, among other things.

![spacy](static/spaCy.png)

First, we need to install spaCy and download a model for the language that you want to analyze. Here, we download the Core-Web-Small model for English:

In [None]:
import sys

!{sys.executable} -m pip install spacy
!{sys.executable} -m spacy download en_core_web_sm

### An example

The following code snippet should identify several named entities (i.e., person names, place names, organizations, monetary sums, etc.):

In [None]:
import spacy

# load the model that will be used for the task
nlp = spacy.load('en_core_web_sm')

# an example sentence
s = 'Mathias Creutz Inc., a newcomer from Finland, has acquired Microsoft for $30,000,000,000.'

# parse the text with the loaded model
doc = nlp(s)

# print the text, its location and the named entity tag
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

You do many other things with spaCy, for example, you can see the tags assigned to each token in the sentence:

In [None]:
for token in doc:
    print(f'lemma: {token.lemma_}\t\tPoS: {token.pos_}') 

Try to modify the sentence in the example. You can also test other languages. And if you like the simplicity this provides, check the documentation (see more info at the end of this page).

### Highlighting named entities in the text

It is also possible to visualize the NER annotations directly in the text using the `displacy` module:

In [None]:
from spacy import displacy

text = """With a degree of frustration, George tried various piano teachers for some two years (circa. 1911) 
before finally being introduced to Charles Hambitzer by Jack Miller (circa. 1913), the pianist in the Beethoven 
Symphony Orchestra. Until his death in 1918, Hambitzer remained Gershwin's musical mentor and taught him conventional 
piano technique, introduced him to music of the European classical tradition, and encouraged him to attend orchestral 
concerts. Following such concerts, young Gershwin would essentially try to play, on the piano at home, the music he had 
heard from recall, and without sheet music. As a matter of course, Gershwin later studied with the classical composer 
Rubin Goldmark and avant-garde composer-theorist Henry Cowell, thus formalizing his classical music training.
In 1913, Gershwin left school at the age of 15 and found his first job as a "song plugger". His employer was Jerome H. 
Remick and Company, a Detroit-based publishing firm with a branch office on New York City's Tin Pan Alley, and he earned 
$15 a week.
"""

nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
displacy.render(doc, style='ent', page=True)

When working on a command-line version of Python, you can save the visualization in HTML format as follows:

```
import pathlib
html = displacy.render(doc, style='ent', page=True)
output_path = pathlib.Path("gershwin_ner.html")
output_path.open("w", encoding="utf-8").write(html)
```

There are many more options for producing visualizations. For instance, you can integrate the HTML directly [in a Flask application](https://spacy.io/usage/visualizers).

Read through the list of different [linguistic features that spaCy offers](https://spacy.io/usage/linguistic-features). Try some of them out. Is there anything here that you would like to use for your project?

<sup>Dmitry Kan and Mathias Creutz</sup>