# NLP - spaCy ML capabilities
In this project we explore some of the light machine learning capabilities that the spaCy package has using the larger en_core_web_md framework. 

In [1]:
import spacy

In [2]:
# Showing that we are downloading the core_web_md here
!python -m spacy download en_core_web_md

Collecting en-core-web-md==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl (42.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0mm
Installing collected packages: en-core-web-md
Successfully installed en-core-web-md-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')


In [3]:
nlp = spacy.load("en_core_web_md")

In [4]:
with open("/Users/williamearley/Personal Projects/NLP/data/wiki_us.txt", "r") as f:
    text = f.read()

In [5]:
doc = nlp(text)
sentence1 = list(doc.sents)[0]
print(sentence1)

The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America.


In [6]:
import numpy as np
our_word = "country"

ms = nlp.vocab.vectors.most_similar(
    np.asarray([nlp.vocab.vectors[nlp.vocab.strings[our_word]]]), n=10)
words = [nlp.vocab.strings[w] for w in ms[0][0]]
distances = ms[2]
print(words)

['country—0,467', 'nationâ\x80\x99s', 'countries-', 'continente', 'Carnations', 'pastille', 'бесплатно', 'Argents', 'Tywysogion', 'Teeters']


In [7]:
doc1 = nlp("I like salty fries and hamburgers.")
doc2 = nlp("Fast food tastes very good.")

In [8]:
print(doc1, "<->", doc2, doc1.similarity(doc2))

I like salty fries and hamburgers. <-> Fast food tastes very good. 0.691649353055761


In [9]:
doc3 = nlp("Good morning world, how are you?")

In [10]:
print(doc1, "<->", doc3, doc1.similarity(doc3))

I like salty fries and hamburgers. <-> Good morning world, how are you? 0.6900988218109563


In [11]:
doc4 = nlp("I enjoy oranges.")
doc5 = nlp("I enjoy apples.")

In [12]:
print(doc4, "<->", doc5, doc4.similarity(doc5))

I enjoy oranges. <-> I enjoy apples. 0.977570143948367


In [13]:
doc6 = nlp("I enjoy shoes.")

In [14]:
print(doc4, "<->", doc6, doc4.similarity(doc6))

I enjoy oranges. <-> I enjoy shoes. 0.9522877873069583
