# Spacy for NLP
- https://domino.ai/data-science-dictionary/spacy
- https://www.geeksforgeeks.org/tokenization-using-spacy-library/
- https://spacy.io/usage/spacy-101

In [3]:
#!pip install spacy

In [14]:
#!python -m spacy download en_core_web_sm

In [37]:
#!python -m spacy download en

In [10]:
import spacy

In [6]:
nlp = spacy.blank('en')

In [23]:
doc = nlp('Apple is looking at buying U.K. startup for $1 billion')
doc

Apple is looking at buying U.K. startup for $1 billion

In [24]:
for token in doc:
    print(token)

Apple
is
looking
at
buying
U.K.
startup
for
$
1
billion


In [25]:
nlp = spacy.load("en_core_web_sm") 

In [26]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [27]:
for token in doc:
    print(token.text, end =', ')

Apple, is, looking, at, buying, U.K., startup, for, $, 1, billion, 

In [28]:
for token in doc:
    print(token.text, token.pos_ , token.dep_)

Apple PROPN nsubj
is AUX aux
looking VERB ROOT
at ADP prep
buying VERB pcomp
U.K. PROPN dobj
startup NOUN dep
for ADP prep
$ SYM quantmod
1 NUM compound
billion NUM pobj


In [29]:
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)

Apple Apple PROPN NNP nsubj Xxxxx True False
is be AUX VBZ aux xx True True
looking look VERB VBG ROOT xxxx True False
at at ADP IN prep xx True True
buying buy VERB VBG pcomp xxxx True False
U.K. U.K. PROPN NNP dobj X.X. False False
startup startup NOUN NN dep xxxx True False
for for ADP IN prep xxx True True
$ $ SYM $ quantmod $ False False
1 1 NUM CD compound d False False
billion billion NUM CD pobj xxxx True False


## Named Entities
- A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction. Because models are statistical and strongly depend on the examples they were trained on, this doesn’t always work perfectly and might need some tuning later, depending on your use case.

In [30]:
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY


## Visualise Dependency parse
- https://spacy.io/usage/visualizers

In [32]:
from spacy import displacy

In [34]:
displacy.serve(doc, style="dep", auto_select_port=True)




Using the 'dep' visualizer
Serving on http://0.0.0.0:5001 ...

Shutting down server on port 5001.


In [38]:
sentence_spans = list(doc.sents)
displacy.serve(sentence_spans, style="dep", auto_select_port=True)


Using the 'dep' visualizer
Serving on http://0.0.0.0:5001 ...

Shutting down server on port 5001.


In [39]:
displacy.serve(doc, style="ent", auto_select_port=True)


Using the 'ent' visualizer
Serving on http://0.0.0.0:5001 ...

Shutting down server on port 5001.


In [40]:
# Title
doc.user_data["title"] = "This is a title"
displacy.serve(doc, style="ent", auto_select_port=True)




Using the 'ent' visualizer
Serving on http://0.0.0.0:5002 ...

Shutting down server on port 5002.


In [45]:
#!python -m spacy download en_core_web_lg
#!python -m spacy download en_core_web_md

In [46]:
nlp = spacy.load("en_core_web_md")
tokens = nlp("dog cat banana afskfsd")

for token in tokens:
    print(token.text, token.has_vector, token.vector_norm, token.is_oov)

dog True 75.254234 False
cat True 63.188496 False
banana True 31.620354 False
afskfsd False 0.0 True
