# spaCy Basics
**spaCy** (https://spacy.io/) is an open-source Python library that parses and "understands" large volumes of text. Separate models are available that cater to specific languages (English, French, German, etc.).

In this section we'll install and setup spaCy to work with Python, and then introduce some concepts related to Natural Language Processing.

# Installation and Setup

Installation is a two-step process. First, install spaCy using either conda or pip. Next, download the specific model you want, based on language.<br> For more info visit https://spacy.io/usage/

### 1. From the command line or terminal:
> `conda install -c conda-forge spacy`
> <br>*or*<br>
> `pip install -U spacy`

> ### Alternatively you can create a virtual environment:
> `conda create -n spacyenv python=3 spacy=2`

### 2. Next, also from the command line (you must run this as admin or use sudo):

> `python -m spacy download en`

> ### If successful, you should see a message like:

> **`Linking successful`**<br>
> `    C:\Anaconda3\envs\spacyenv\lib\site-packages\en_core_web_sm -->`<br>
> `    C:\Anaconda3\envs\spacyenv\lib\site-packages\spacy\data\en`<br>
> ` `<br>
> `    You can now load the model via spacy.load('en')`


# Working with spaCy in Python

This is a typical set of instructions for importing and working with spaCy. Don't be surprised if this takes awhile - spaCy has a fairly large library to load:

In [1]:
import spacy

In [2]:
#Loading a model named NLP
nlp = spacy.load('en_core_web_sm')

In [3]:
#Created a doc object
doc = nlp(u"Tesla is looking at buying U.S. startup for $6 billion")

In [8]:
#Spacy is smart enough to identify US as a country
#Splitting text and Part of speech pos
#Dep synctatctic dependency

for token in doc:
    print(token.text,token.pos_, token.dep_)
    print(token.text,token.pos)

Tesla PROPN nsubj
Tesla 95
is VERB aux
is 99
looking VERB ROOT
looking 99
at ADP prep
at 84
buying VERB pcomp
buying 99
U.S. PROPN compound
U.S. 95
startup NOUN dobj
startup 91
for ADP prep
for 84
$ SYM quantmod
$ 98
6 NUM compound
6 92
billion NUM pobj
billion 92


In [9]:
#Shows basic pipeline
nlp.pipeline

[('tagger', <spacy.pipeline.Tagger at 0x1fa0d6af988>),
 ('parser', <spacy.pipeline.DependencyParser at 0x1fa0d6aa7c8>),
 ('ner', <spacy.pipeline.EntityRecognizer at 0x1fa0d6aad68>)]

In [10]:
nlp.pipe_names

['tagger', 'parser', 'ner']

In [18]:
#Tokenisation : Splitting into tokens

doc2 = nlp(u"Tesla isn't looking into      startups anymore")

In [19]:
#Understands root word and negation n't
#Undestands whitespaces
#More pos from docs
for token in doc2:
    print(token.text,token.pos_,token.dep_)

Tesla PROPN nsubj
is VERB aux
n't ADV neg
looking VERB ROOT
into ADP prep
      SPACE 
startups NOUN pobj
anymore ADV advmod


In [20]:
doc2[0].pos

95

In [21]:
doc2[0].pos_

'PROPN'

In [22]:
#Synctactic dependency
doc2[0].dep_

'nsubj'

In [23]:
#Additional attributes
#.text
#.lemma Base form of verb
#Dealong with long string and grabbing span of it
doc3 = nlp(u'Although commmonly attributed to John Lennon from his song "Beautiful Boy", \
the phrase "Life is what happens to us while we are making other plans" was written by \
cartoonist Allen Saunders and published in Reader\'s Digest in 1957, when Lennon was 17.')

In [24]:
life_quote = doc3[16:30]

In [25]:
print(life_quote)

"Life is what happens to us while we are making other plans"


In [26]:
#Spacy smart enough that understands span
type(life_quote)

spacy.tokens.span.Span

In [27]:
type(doc3)

spacy.tokens.doc.Doc

In [28]:
#Segmentation

doc4 = nlp(u"This is first sentence. This is another sentence. This is last sentence.")

In [30]:
#Printing sentence
for sentence in doc4.sents:
    print(sentence)

This is first sentence.
This is another sentence.
This is last sentence.


In [31]:
doc4[6]

is

In [35]:
#Doesn't return anything because not the start
doc4[6].is_sent_start

In [40]:
#Returns true because it's the start
doc4[5].is_sent_start

True