## SpaCy 

![title](Images\Word-Cloud-Featured-Image.jpg)

## Chapter 1 - The NLP Object

SpaCy has multiple different objects for each language. Here we will use the English NLP object to consume sentences and use the libary to process the text.

<u> Contents </u> 
#### 1.1 - Printing Sentences
#### 1.2 - Indexing and Slicing Docs



In [12]:
import spacy
from spacy.lang.en import English

# Create the nlp object
nlp = English()

# Process a text
doc = nlp("This is a sentence.")

# Print the document text
print(doc.text)

This is a sentence.


Here is an example of using the German NLP object, proving this can work on different languages

In [13]:
# Import the German language class
from spacy.lang.de import German

# Create the nlp object
nlp = German()

# Process a text (this is German for: "Kind regards!")
doc = nlp("Liebe Grüße!")

# Print the document text
print(doc.text)

Liebe Grüße!


In [14]:
for i in doc:
    print(i)

Liebe
Grüße
!


#### 1.2 Indexing and Slicing Docs

In [15]:
#### Slicing
from spacy.lang.en import English
nlp = English()

# Process the text
doc = nlp("I like tree kangaroos and narwhals.")

# Select the first token
first_token = doc[0]

# Print the first token's text
print(first_token.text)


I


In [16]:
length = len(doc)

for i in range(length):
    print(i)
    print(doc[i])

0
I
1
like
2
tree
3
kangaroos
4
and
5
narwhals
6
.


In [17]:
#### Slicing
from spacy.lang.en import English

nlp = English()

# Process the text
doc = nlp("I like tree kangaroos and narwhals.")

# A slice of the Doc for "tree kangaroos"
tree_kangaroos = doc[2:4]
print(tree_kangaroos.text)

# A slice of the Doc for "tree kangaroos and narwhals" (without the ".")
tree_kangaroos_and_narwhals = doc[2:6]
print(tree_kangaroos_and_narwhals.text)

tree kangaroos
tree kangaroos and narwhals


#### 1.3 Lexical Attributes
We will use lexical attributes to find percentages in a text. You’ll be looking for two subsequent tokens: a number and a percent sign.

Key Summary:
- .like_num method can be used on NLP objects to check if tokens are numeric. 
- token.text can be equated to anything you like, such as token.text == "%"

In [18]:
from spacy.lang.en import English

nlp = English()

# Process the text
doc = nlp(
    "In 1990, more than 60% of people in East Asia were in extreme poverty. "
    "Now less than 4% are."
)

# Iterate over the tokens in the doc
for token in doc:
    # Check if the token resembles a number
    if token.like_num:
        # Get the next token in the document
        next_token = doc[token.i + 1]
        # Check if the next token's text equals "%"
        if next_token.text == "%":
            print("Percentage found:", token.text +"%")

Percentage found: 60%
Percentage found: 4%


#### 1.4 Statistical Packages 
en_core_web_sm is a small english package

It can be used to:
- predict part of speech tags

In [19]:
nlp = spacy.load("en_core_web_sm")

doc = nlp("She ate the pizza")

for token in doc:
    # Print the text and the predicted part of speech tag
    print(token.text, token.pos_)

She PRON
ate VERB
the DET
pizza NOUN


#### More detailed version
Helps you understand the context of a sentenance through pos_, dep_ and .head.text

In [28]:
for token in doc:
    print(token.text, token.pos_,token.dep_,token.head.text)

Apple PROPN nsubj looking
is AUX aux looking
looking VERB ROOT looking
at ADP prep looking
buying VERB pcomp at
U.K PROPN compound startup
startup NOUN dobj buying
for ADP prep buying
$ SYM quantmod billion
1 NUM compound billion
billion NUM pobj for


#### Entity Detection
In Spacy, we can also detect and classify entities as seen below

In [29]:
doc = nlp("Apple is looking at buying U.K startup for $1 billion")

In [32]:
for ent in doc.ents:
    print(ent.text, ent.label_)

Apple ORG
U.K ORG
$1 billion MONEY


#### It's important to note the model package will not provide the labelled data the model was trained on.