# Spacy

By Alberto Valdés.

**Mail 1:** anvaldes@uc.cl

**Mail 2:** alberto.valdes.gonzalez.96@gmail.com

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
import time
import spacy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib import image as mpimg

spaCy is a software library for natural language processing, network analysis, entity name recognition, data visualization, analysis, visual analysis, content analysis, enriching, annotation developed by Matt Honnibal and programmed in Python language. It was launched in February 2015, with active development and being used in different environments.

In [3]:
start = time.time()

### 1. Linguistic annotations

In [4]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m33.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [5]:
nlp = spacy.load("en_core_web_sm")

In [6]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [7]:
for token in doc:
    
    print("Text:", token.text)
    print("Pos:", token.pos_)
    print("Dep:", token.dep_)
    print("-"*50)

Text: Apple
Pos: PROPN
Dep: nsubj
--------------------------------------------------
Text: is
Pos: AUX
Dep: aux
--------------------------------------------------
Text: looking
Pos: VERB
Dep: ROOT
--------------------------------------------------
Text: at
Pos: ADP
Dep: prep
--------------------------------------------------
Text: buying
Pos: VERB
Dep: pcomp
--------------------------------------------------
Text: U.K.
Pos: PROPN
Dep: dobj
--------------------------------------------------
Text: startup
Pos: NOUN
Dep: dep
--------------------------------------------------
Text: for
Pos: ADP
Dep: prep
--------------------------------------------------
Text: $
Pos: SYM
Dep: quantmod
--------------------------------------------------
Text: 1
Pos: NUM
Dep: compound
--------------------------------------------------
Text: billion
Pos: NUM
Dep: pobj
--------------------------------------------------


### 2. Tokenization

In [8]:
nlp = spacy.load("en_core_web_sm")

In [9]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [10]:
for token in doc:
    
    print(token.text)

Apple
is
looking
at
buying
U.K.
startup
for
$
1
billion


### 3. Named Entities

In [11]:
nlp = spacy.load("en_core_web_sm")

In [12]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [13]:
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY


### 4. Word Vectors and Similarity

In [14]:
!python -m spacy download en_core_web_md

Collecting en-core-web-md==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl (42.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')


In [15]:
nlp = spacy.load("en_core_web_md")

In [16]:
tokens = nlp("dog cat banana afskfsd")

In [17]:
for token in tokens:
    print(token.text, token.has_vector, token.vector_norm, token.is_oov)

dog True 75.254234 False
cat True 63.188496 False
banana True 31.620354 False
afskfsd False 0.0 True


In [18]:
nlp = spacy.load("en_core_web_md")

In [19]:
doc1 = nlp("I like salty fries and hamburgers.")

In [20]:
doc2 = nlp("Fast food tastes very good.")

In [21]:
print(doc1, "<->", doc2, doc1.similarity(doc2))

I like salty fries and hamburgers. <-> Fast food tastes very good. 0.691649353055761


In [22]:
french_fries = doc1[2:4]

In [23]:
burgers = doc1[5]

In [24]:
print(french_fries, "<->", burgers, french_fries.similarity(burgers))

salty fries <-> hamburgers 0.6938489079475403


### 5. Vocab, Hashes and Lexemes

In [25]:
nlp = spacy.load("en_core_web_sm")

In [26]:
doc = nlp("I love coffee")

In [27]:
print(doc.vocab.strings["coffee"])

3197928453018144401


In [28]:
print(doc.vocab.strings[3197928453018144401])

coffee


### Time of execution

In [29]:
end = time.time()

In [30]:
delta = (end - start)

hours = int(delta/3600)
mins = int((delta - hours*3600)/60)
segs = int(delta - hours*3600 - mins*60)
print(f'Execute this notebook take us {hours} hours, {mins} minutes and {segs} seconds.')

Execute this notebook take us 0 hours, 0 minutes and 8 seconds.
