In [1]:
#!pip install spacy

# Why we can use spacy instead of nltk

##### <font color='green'>For basic NLP tasks Spacy is much faster and more efficient compared to NLTK at the cost of the user not being able to choose algorithmic implementations</font>

## Lets work with spacy now

In [2]:
import spacy

In [3]:
#!python -m spacy download en_core_web_lg
#!python -m spacy download en_core_web_sm

If we don't run the above code then the en_code_web_sm will not work which we will work with 

In [4]:
nlp=spacy.load('en_core_web_sm')

In [5]:
s=nlp(u'GFG is looking for data science interns')

**Simple tokenisation**

In [6]:
for token in s:
    print(token)

GFG
is
looking
for
data
science
interns


In [7]:
for token in s:
    print(token.text)  #.text is giving us the output in string form

GFG
is
looking
for
data
science
interns


In [8]:
s=nlp(u'The cost of Iphone in U.K is 699$')

In [9]:
for token in s:
    print(token.text)

The
cost
of
Iphone
in
U.K
is
699
$


**It is smart enough to consider U.K as a single token**

### <font color='blue'>Some more functions</font>

**pop** - it indicates part of speech (a number will be printed which we will see later how those number indicates the part of speech)

In [10]:
for token in s:
    print(token.text,token.pos)

The 90
cost 92
of 85
Iphone 96
in 85
U.K 96
is 87
699 93
$ 99


**If we want the raw name of the part of speech we use <font color='red'> pos_ </font> an undurscore in the end**

In [11]:
for token in s:
    print(token.text,token.pos_)

The DET
cost NOUN
of ADP
Iphone PROPN
in ADP
U.K PROPN
is AUX
699 NUM
$ SYM


**Inference** 
- U.K is a pronoun
- cost is noun
- 699 is a number
- $ is a symbol

and the rest of the tokens are also showing some meaning

**Getting the dependency parsing with <font color='red'> dep_ </font> function**

In [12]:
s=nlp(u"He isn't going to       play today")

In [13]:
for token in s:
    print(token.text,token.pos_,token.dep_)

He PRON nsubj
is AUX aux
n't PART neg
going VERB ROOT
to PART aux
       SPACE dep
play VERB xcomp
today NOUN npadvmod


**Inference**

- is and n't are being categorised differently  'n't' is being treated as negation
- Due to the extra space between to and play we can see in the output a space is also being categorised

**We can get the specific token as well by using indexing**

In [14]:
s[0]

He

In [15]:
s[1]

is

In [16]:
s[2]

n't

In [17]:
s[0].pos_  #giving us the part of speech of the first token

'PRON'

**To print different sentences from a string**

In [18]:
s=nlp(u"This is the first sentence. I gave given fullstop please check. Let's study now")

In [19]:
for sentence in s.sents:
    print(sentence)

This is the first sentence.
I gave given fullstop please check.
Let's study now
