# Overview
This notebook is created to save my notes from the ***ADVANCED NLP with Spacy*** [online](https://course.spacy.io/en/)

# first chapter


## basic objects

In [1]:
import spacy

# the nlp object is used for text analysis
nlp = spacy.blank("en")

# nlp can create auxiliary objects such as Doc
doc = nlp("Toyota will notify owners, and dealers will clean and reseal the fuel pressure sensor mounting area, free of charge. The recall began March 3, 2015. Owners may contact Toyota customer service at 1-800-331-4331") 

for t in doc:
    print(t.text)
    break
# so yeah doc represents an iterable of Token objects.  
# a slice of an doc is called a span
span = doc[1:3]

Toyota


## trained pipelines

In [2]:
nlp = spacy.blank("en")

# Process the text
doc = nlp(
    "In 1990, more than 60% of people in East Asia were in extreme poverty. "
    "Now less than 4% are."
)

# Iterate over the tokens in the doc
for token in doc:
    # Check if the token resembles a number
    if token.like_num:
        # Get the next token in the document
        next_token = doc[token.i + 1]
        # Check if the next token's text equals "%"
        if next_token.text == "%":
            print("Percentage found:", token.text)

Percentage found: 60
Percentage found: 4


In [3]:
# let's add a bit more power to Spacy
# this command downloads a number of trained packages
! python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
     --------------------------------------- 12.8/12.8 MB 11.3 MB/s eta 0:00:00
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [4]:
# let's load the package
nlp = spacy.load("en_core_web_sm")
our_phrase = "Toyota will notify owners, and dealers will clean and reseal the fuel pressure sensor mounting area, free of charge"
doc = nlp(our_phrase)

# for t in doc:
#     print(t.text, t.pos_)

for t in doc:
    token_text = t.text
    token_pos = t.pos_
    token_parent = t.head

# pos_: mainly deptermine the part of speech
# dep_: the predicted dependency label
# head: the syntatic head token: the token for which the current token belongs to

# we the dep_ determines the type of relation the entity has with its parent token
# the trained pipeline is capable of predicting certain labels for a group of tokens
# as follows:
for predicted_token in doc.ents:
    print(predicted_token.text, predicted_token.label_)

# it might extremely helpful to use the explain method
spacy.explain("GPE")

Toyota ORG


'Countries, cities, states'