In Machine Learning, spaCy is a very useful open-source library for advanced natural language processing (NLP). If we work with a lot of text, we might want to learn more about it. For example, 
* What is it? 
* What do the words mean in context? 
* Who does what to whom? 
* Which companies and which products are mentioned? 
* Which texts are similar to each other?

### Why spaCY?

spaCy is specially designed for production use and helps us create applications that **process and understand large volumes of text. It can be used to create systems for extracting information or understanding natural language, or for preprocessing text for deep learning.

`!pip install spacy`

We will also need to access at least one of the spaCy language models. spaCy can be used to analyze texts from different languages including English, German, Spanish and French, each with its models. We’re going to be working with English text for this simple analysis, so go ahead and take spaCy’s little English language template, again via the command line:

`python -m spacy download en_core_web_sm`

### Tokenization
The task of Text processing now comes down to loading our language model and passing strings directly to it. Now let’s see what it does with a sample review:

In [1]:
import spacy
nlp = spacy.load("en_core_web_sm")
review = "I'am so happy I went to this awesome Vegas buffet!"

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
doc = nlp(review)

To see the resulting output, we need to loop over the above NLP document:

In [4]:
for token in doc:
    print(token.text, token.pos_, token.lemma_, token.is_stop)

I'am NOUN i'am False
so ADV so True
happy ADJ happy False
I PRON I True
went VERB go False
to ADP to True
this DET this True
awesome ADJ awesome False
Vegas PROPN Vegas False
buffet NOUN buffet False
! PUNCT ! False


`spaCy` does not explicitly divide the original text into a list, but tokens are accessible by the index range:

In [5]:
print(doc[:5])

I'am so happy I went


### Spacy Dependencies

NLP consists of a lot of unique challenges, certainly with syntactic and semantic issues. spaCy identifies all the dependencies of each token as the text passes through the language model, let’s check the dependencies in our Text review:

In [6]:
for token in doc:
    print(token.text, token.dep_)

I'am nsubj
so advmod
happy ROOT
I nsubj
went ccomp
to prep
this det
awesome amod
Vegas compound
buffet pobj
! punct


It looks somewhat interesting, but visualizing these relationships reveals an even fuller story. Start by loading a submodule called `displaCy` to help with visualization:

In [7]:
from spacy import displacy
displacy.serve(doc)

# Then we need to render the dependency tree from the document:




Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


### Named Entity Recognition with Spacy

Machine learning practitioners often seek to identify key elements and individuals in unstructured text. This task, called Named Entity Recognition (NER), runs automatically as the text passes through the language model. To see which tokens it identifies as named entities in our restaurant review, simply browse doc.ents:

In [8]:
for ent in doc.ents:
    print(ent.text, ent.label_)

Vegas PERSON


It recognizes **Vegas** as a named entity, but what does the label `PERSON` mean? If we don’t know what any of the abbreviations mean, just ask `spaCy` to explain it:

In [9]:
spacy.explain("PERSON")

'People, including fictional'

In [10]:
# Anonther example

spacy.explain("GPE")

'Countries, cities, states'

Additionally, the displacement method of `displaCy` can highlight named entities if the style argument is specified:

In [11]:
displacy.serve(doc, style='ent')


Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


The coloured texts represent named entities by type. Consider this more complicated example with four different types of entities: 

In [12]:
document = nlp("One year ago, I visited the Eiffel Tower with Jeff in Paris, France")
displacy.serve(document, style='ent')


Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.
