## SpaCy 101: simple examples of using SpaCy for NLP

[Spacy](https://spacy.io/) is an open-source, multi-lingual NLP library. Its components are not SOTA but they are robust, easy to use and fast.

This notebook demonstrates some of its basics.

You may need to do the following:
 * pip install spacy
 * python -m spacy download en_core_web_md

In [1]:
import spacy
from spacy import displacy

### Load one of Spacy's language models. This is a medium sized one for English
 * A common convention is to use nlp to name the function that is returned
 * we'll use this function to process the text with spacy's analytic pipleline

In [2]:
nlp = spacy.load("en_core_web_md")

spacy_entity_linker


### Input text is just a string which can  be a phrase, sentence,  paragraph, or more

In [3]:
text = "Joe Biden was elected president of the United States in November 2020."

### Use nlp to process the text and save the result in variable doc

In [4]:
doc = nlp(text)

SpaCy's default **pipeline** does these steps, saving the results in a complex object that doc now points to.
 * break text into **tokens** (e.g., words, punctuation) and segments it into **sentences**
 * tag tokens with their part of speech (e.g, noun, verb preposition, ...)
 * parse the tokens to recognize **dependencies** between them
 * identify **named entities** and assign each a type (e.g., PERSON, LOC, ORG, ...)
Addional steps (e.g., coreference) can be added as needed.
![](spacy_pipeline.png)

In [5]:
# print each of the sentences in the document
for sentence in doc.sents: 
    print(sentence)

Joe Biden was elected president of the United States in November 2020.


### Display the dependency diagram 
 * this shows how the sentence's structure was understood


In [6]:
displacy.render(doc, style="dep")

### show the text with the named entities and their types that were found
* Default types are the 18 types from [Ontonotes](https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf)

In [7]:
displacy.render(doc, style="ent")

### Access the strings of the entities and noun_chunks found 
* noun chunks are what SPaCy calls basic noun phrases

In [8]:
for entity in doc.ents:
    print(f" Entity: {entity.text}, {entity.label_}")
for np in doc.noun_chunks:
    print(f" NP: {np.text}")


 Entity: Joe Biden, PERSON
 Entity: the United States, GPE
 Entity: November 2020, DATE
 NP: Joe Biden
 NP: president
 NP: the United States
 NP: November


***
*The End*
***