<a href="https://colab.research.google.com/github/Saurav-23/Saurav_N-Edac-2020/blob/master/Spacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import spacy

In [None]:
nlp = spacy.load('en_core_web_sm')

#Create a Doc Object
doc = nlp(u'Tesla is looking at buying U.S. startup for $6 million')

#Print each token separately
for token in doc:
  print(token.text, token.pos_, token.dep_)

Tesla PROPN nsubj
is AUX aux
looking VERB ROOT
at ADP prep
buying VERB pcomp
U.S. PROPN compound
startup NOUN dobj
for ADP prep
$ SYM quantmod
6 NUM compound
million NUM pobj


In [None]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x782af24cff40>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x782af24cf700>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x782af24c8660>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x782af24d5d00>),
 ('lemmatizer',
  <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x782af20fbf80>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x782af25a3e60>)]

In [None]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [None]:
doc2 = nlp(u"Tesla isn't looking into startups anymore.")

for token in doc2:
  print(token.text, token.pos_, token.dep_)

Tesla PROPN nsubj
is AUX aux
n't PART neg
looking VERB ROOT
into ADP prep
startups NOUN pobj
anymore ADV advmod
. PUNCT punct


In [None]:
doc2

Tesla isn't looking into startups anymore.

In [None]:
doc2[0]

Tesla

In [None]:
type(doc2)

spacy.tokens.doc.Doc

**Part of Speech Tagging (POS)**

The next step after splitting the text up into tokens is to assign parts of speech. In above example, Tesla was recognised to be a proper noun. Here some statistical modeling is required. For example, words that follow "the" are typically nouns.


In [None]:
doc2[0].pos_

'PROPN'

In [None]:
doc2[0].dep_

'nsubj'

In [None]:
spacy.explain('PROPN')

'proper noun'

In [None]:
spacy.explain('nsubj')

'nominal subject'

In [None]:
# Lemmas(the base form of the word) :
print(doc2[4].text)
print(doc2[4].lemma_)

into
into


In [None]:
# Simple Parts-of-Speech & Detailed Tags:
print(doc2[4].pos_)
print(doc2[4].tag_ + ' / ' + spacy.explain(doc2[4].tag_))

ADP
IN / conjunction, subordinating or preposition


In [None]:
# Word Shapes :
print(doc2[0].text+' : '+doc2[0].shape_)
print(doc[5].text+ ' : '+doc[5].shape_)

Tesla : Xxxxx
U.S. : X.X.


In [None]:
# Boolean values :
print(doc2[0].is_alpha)
print(doc2[0].is_stop)

True
False


**Sentences**

In [None]:
doc4 = nlp(u'This is the first sentence. This is another sentence. This is the last sentence.')

In [None]:
for sent in doc4.sents:
  print(sent)

This is the first sentence.
This is another sentence.
This is the last sentence.


In [None]:
doc4[6].is_sent_start

True

In [None]:
mystring = '"We\'re moving to L.A.!"'

In [None]:
print(mystring)

"We're moving to L.A.!"


In [None]:
doc = nlp(mystring)

In [None]:
for token in doc:
  print(token.text)

"
We
're
moving
to
L.A.
!
"


In [None]:
doc2 = nlp(u"We're here to help! Send snail-mail, email support@oursite.com or visit us at http://www.oursite.com")

In [25]:
for t in doc2:
  print(t)

We
're
here
to
help
!
Send
snail
-
mail
,
email
support@oursite.com
or
visit
us
at
http://www.oursite.com


In [27]:
doc3 = nlp(u"A 5 km NYC cab ride costs $10.30")

In [28]:
for t in doc3:
  print(t)

A
5
km
NYC
cab
ride
costs
$
10.30


In [34]:
doc4 = nlp(u"Let's visit St. Louis in the U.S. next year.")

In [35]:
for t in doc4:
  print(t)

Let
's
visit
St.
Louis
in
the
U.S.
next
year
.


In [36]:
len(doc4)

11

In [37]:
doc4.vocab

<spacy.vocab.Vocab at 0x782af25cd750>

In [38]:
len(doc4.vocab)

808

In [42]:
doc5 = nlp(u"It is better to give than receive.")

In [43]:
doc5[0]

It

In [44]:
doc5[2:5]

better to give

In [45]:
doc8 = nlp(u'Apple to build a Hong Kong Factory for $6 million')

In [48]:
for token in doc8:
  print(token.text,end=' | ')

Apple | to | build | a | Hong | Kong | Factory | for | $ | 6 | million | 

In [49]:
for entity in doc8.ents:
  print(entity)

Apple
Hong Kong Factory
$6 million


In [51]:
for entity in doc8.ents:
  print(entity)
  print(entity.label_)
  print(str(spacy.explain(entity.label_)))
  print('\n')

Apple
ORG
Companies, agencies, institutions, etc.


Hong Kong Factory
GPE
Countries, cities, states


$6 million
MONEY
Monetary values, including unit




In [53]:
doc9 = nlp(u'Autonomous cars shift insurance liability towards manufacturers.')

In [54]:
for chunk in doc9.noun_chunks:
  print(chunk)

Autonomous cars
insurance liability
manufacturers


 **Tokenisation Visualized**

In [55]:
from spacy import displacy

In [56]:
doc = nlp(u"Apple is going to build a U.K. factory for $6 million.")

In [59]:
displacy.render(doc,style = 'dep',jupyter = True, options ={'distance':110})

In [60]:
doc = nlp(u"Over the last quarter Apple sold nearly 20 thousand iPods for a profit of $6 million.")

In [61]:
displacy.render(doc,style = 'ent',jupyter = True)

In [64]:
doc = nlp(u"This is a sentence.")
displacy.serve(doc,style='dep')


Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.
