### Execute this cell to install required python module

After you've installed this once, you can delete this cell.

In [1]:
!pip install spacy
!pip install tabulate
!python -m spacy download en_core_web_sm
# !python -m spacy download en_core_web_lg

Collecting spacy
  Downloading spacy-2.3.5-cp38-cp38-win_amd64.whl (9.7 MB)
Collecting wasabi<1.1.0,>=0.4.0
  Downloading wasabi-0.8.0-py3-none-any.whl (23 kB)
Collecting thinc<7.5.0,>=7.4.1
  Downloading thinc-7.4.5-cp38-cp38-win_amd64.whl (910 kB)
Collecting catalogue<1.1.0,>=0.0.7
  Downloading catalogue-1.0.0-py2.py3-none-any.whl (7.7 kB)
Collecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.5-cp38-cp38-win_amd64.whl (112 kB)
Collecting plac<1.2.0,>=0.9.6
  Downloading plac-1.1.3-py2.py3-none-any.whl (20 kB)
Collecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.5-cp38-cp38-win_amd64.whl (21 kB)
Collecting blis<0.8.0,>=0.4.0
  Downloading blis-0.7.4-cp38-cp38-win_amd64.whl (6.5 MB)
Collecting cymem<2.1.0,>=2.0.2
  Downloading cymem-2.0.5-cp38-cp38-win_amd64.whl (36 kB)
Collecting srsly<1.1.0,>=1.0.2
  Downloading srsly-1.0.5-cp38-cp38-win_amd64.whl (178 kB)
Installing collected packages: wasabi, catalogue, plac, murmurhash, srsly, blis, cymem, preshed, thinc, spacy


# Named Entity Recognition & Parts of Speech Tagging

Using [Spacy's pre-trained NER and POS tagger](https://spacy.io/api/annotation#named-entities)

### Import dependencies

In [2]:
import spacy
from spacy import displacy
from tabulate import tabulate

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

### Create a text variable

In [3]:
text = """A live 1957 recording of John Coltrane and Thelonious Monk sat in the Library of Congress’s 
archives unnoticed for 48 years, before the library’s Magnetic Recording Laboratory supervisor 
Larry Appelbaum found it. For 60 years , Verve stored a live recording of Ella Fitzgerald 
performing at Zardi’s Jazzland in Hollywood, before releasing it in 2017 just after what would 
have been her 100th birthday.
"""

### Process text into a Spacy document

In [4]:
doc = nlp(text)

In [5]:
list(doc.noun_chunks)

[A live 1957 recording,
 John Coltrane,
 Thelonious Monk,
 the Library,
 Congress,
 archives,
 48 years,
 the library,
 Magnetic Recording Laboratory supervisor,
 Larry Appelbaum,
 it,
 60 years,
 Verve,
 a live recording,
 Ella Fitzgerald,
 Zardi,
 Jazzland,
 Hollywood,
 it,
 what,
 her 100th birthday]

### Analyze Parts of Speech

In [6]:
print("Nouns:\n", [token.lemma_ for token in doc if token.pos_ == "NOUN"])
print("Proper Nouns:\n", [token.lemma_ for token in doc if token.pos_ == "PROPN"])
print("\nNoun phrases:\n",[chunk.text for chunk in doc.noun_chunks])
print("\nVerbs:\n", [token.lemma_ for token in doc if token.pos_ == "VERB"])
print("\nAdjectives:\n", [token.lemma_ for token in doc if token.pos_ == "ADJ"])

Nouns:
 ['recording', 'archive', 'year', 'library', 'supervisor', 'year', 'recording', 'birthday']
Proper Nouns:
 ['John', 'Coltrane', 'Thelonious', 'Monk', 'Library', 'Congress', '’s', 'Magnetic', 'Recording', 'Laboratory', 'Larry', 'Appelbaum', 'Verve', 'Ella', 'Fitzgerald', 'Zardi', 'Jazzland', 'Hollywood']

Noun phrases:
 ['A live 1957 recording', 'John Coltrane', 'Thelonious Monk', 'the Library', 'Congress', 'archives', '48 years', 'the library', 'Magnetic Recording Laboratory supervisor', 'Larry Appelbaum', 'it', '60 years', 'Verve', 'a live recording', 'Ella Fitzgerald', 'Zardi', 'Jazzland', 'Hollywood', 'it', 'what', 'her 100th birthday']

Verbs:
 ['sit', 'find', 'store', 'perform', 'release', 'would']

Adjectives:
 ['live', 'unnoticed', 'live', '100th']


### Extract Entities

In [7]:
ent_list = [[entity.text, entity.label_] for entity in doc.ents]
print(tabulate(ent_list, headers=['Entity', 'Entity Type']))

Entity                         Entity Type
-----------------------------  -------------
1957                           DATE
John Coltrane                  ORG
Thelonious Monk                PERSON
Congress                       ORG
48 years                       DATE
Magnetic Recording Laboratory  ORG
Larry Appelbaum                PERSON
60 years                       DATE
Verve                          ORG
Ella Fitzgerald                PERSON
Zardi’s Jazzland               ORG
Hollywood                      GPE
2017                           DATE
100th                          ORDINAL


# Spacy Visuals

### Entity labeling

In [8]:
sentence_spans = list(doc.sents)
displacy.render(sentence_spans, style="ent")

### Dependency Parsing

In [9]:
displacy.render(sentence_spans, style="dep", options={"word_spacing":15})