## Introduction to Spacy

<b>spaCy</b> is a framework for NLP written in Cython and super fast when compared to NLTK framwork

#### Advantages:
1. Super fast
2. Its accurate
3. Pretrained word vectors
4. Beautiful visualizations
5. It also has its own DL framwork for NLP tasks

#### Installation:
pip install -U spaCy

#### Note:
Notice that the installation doesn’t automatically download the English model. We need to do that ourselves.

In [3]:
# How to get index of a word in spacy
import spacy
nlp = spacy.load('en') # Load the english model
doc = nlp('Hello     world!')
for token in doc:
    print('"' + token.text + '"', token.idx)



    Only loading the 'en' tokenizer.

"Hello" 0
"    " 6
"world" 10
"!" 15


In [4]:
# here Token class exposes a lot of word-level attributes. Here are a few examples:
doc = nlp("Next week I'll   be in Madrid.")
for token in doc:
    print("{0}\t{1}\t{2}\t{3}\t{4}\t{5}\t{6}\t{7}".format(
        token.text,
        token.idx,
        token.lemma_,
        token.is_punct,
        token.is_space,
        token.shape_,
        token.pos_,
        token.tag_
    ))

Next	0		False	False	Xxxx		
week	5		False	False	xxxx		
I	10	-PRON-	False	False	X	PRON	PRP
'll	11	will	False	False	'xx	VERB	MD
  	15		False	True	  		
be	17		False	False	xx		
in	20		False	False	xx		
Madrid	23		False	False	Xxxxx		
.	29		True	False	.		


In [5]:
# Sentence detection
import spacy
import en_core_web_sm

nlp = en_core_web_sm.load()
r = nlp(u'This is a sentence. This is another one. And this is a third, for good measure.' )
for s in r.sents:
    print(s)

This is a sentence.
This is another one.
And this is a third, for good measure.


In [6]:
# Part Of Speech Tagging
doc = nlp("Next week I'll be in Madrid.")
print([(token.text, token.tag_) for token in doc])

[('Next', 'JJ'), ('week', 'NN'), ('I', 'PRP'), ("'ll", 'MD'), ('be', 'VB'), ('in', 'IN'), ('Madrid', 'NNP'), ('.', '.')]


### Named Entity Recognition:
Doing NER with spaCy is super easy and the pretrained model performs pretty well:

Spacy consists of a fast entity recognition model which is capable of identifying entitiy phrases from the document. Entities can be of different types, such as – person, location, organization, dates, numerals, etc. These entities can be accessed through “.ents” property.

In [7]:
doc = nlp("I just bought 2 shares at 9 a.m. because the stock went up 30% in just 2 days according to the WSJ")
for ent in doc.ents:
    print(ent.text, ent.label_)
 
# Next week DATE
# Madrid GPE

2 CARDINAL
9 a.m. because the stock went up 30% in just 2 days according to the WSJ TIME


### Dependency Parsing

In [8]:
doc = nlp('Wall Street Journal just published an interesting piece on crypto currencies')
 
for token in doc:
    print("{0}/{1} <--{2}-- {3}/{4}".format(
        token.text, token.tag_, token.dep_, token.head.text, token.head.tag_))

Wall/NNP <--compound-- Journal/NNP
Street/NNP <--compound-- Journal/NNP
Journal/NNP <--nsubj-- published/VBD
just/RB <--advmod-- published/VBD
published/VBD <--ROOT-- published/VBD
an/DT <--det-- piece/NN
interesting/JJ <--amod-- piece/NN
piece/NN <--dobj-- published/VBD
on/IN <--prep-- piece/NN
crypto/JJ <--amod-- currencies/NNS
currencies/NNS <--pobj-- on/IN


### Word Vectors
The vectors are attached to spaCy objects: Token, Lexeme (a sort of unnatached token, part of the vocabulary), Span and Doc. The multi-token objects average its constituent vectors.

<b>Few properties word vectors have:</b>

1. If two words are similar, they appear in similar contexts
2. Word vectors are computed taking into account the context (surrounding words)
3. Given the two previous observations, similar words should have similar word vectors
4. Using vectors we can derive relationships between words

### Computing similarity
It is able to compute the similarity between words.(Token, Span, Doc and Lexeme.)

### Spacy Architecture
The entire spaCy architecture is built upon three building blocks: Document (the big encompassing container), Token(most of the time, a word) and Span (set of consecutive Tokens). 

### Deep Learning
spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python's awesome AI ecosystem. With spaCy, you can easily construct linguistically sophisticated statistical models for a variety of NLP problems.