<a href="https://colab.research.google.com/github/Spartan-119/Automaic-Spammer/blob/master/Trying_out_SpaCy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import spacy

# creating object
nlp = spacy.blank("en")

In [3]:
doc = nlp("Hey there, my name is Abin! 5")
for token in doc:
  print(token.text)

Hey
there
,
my
name
is
Abin
!
5


In [4]:
# The span object
'''
A Span object is a slice of the document consisting of one or more tokens. It's only a view of the Doc and doesn't contain any data itself.

To create a span, you can use Python's slice notation. For example, 1:3 will create a slice starting from the token at position 1, up to – but not including! – the token at position 3.
'''

span = doc[1:5]
print(span.text)

there, my name


In [5]:
# Lexical Attributes
print("Index: ", [token.i for token in doc])
print("Text: ", [token.text for token in doc])
print()
print("is_alpha: ", [token.is_alpha for token in doc])
print("is_punct: ", [token.is_punct for token in doc])
print("like_num: ", [token.like_num for token in doc])

Index:  [0, 1, 2, 3, 4, 5, 6, 7, 8]
Text:  ['Hey', 'there', ',', 'my', 'name', 'is', 'Abin', '!', '5']

is_alpha:  [True, True, False, True, True, True, True, False, False]
is_punct:  [False, False, True, False, False, False, False, True, False]
like_num:  [False, False, False, False, False, False, False, False, True]


In [10]:
# pipeline packages
# predicting the POS tags

# loading the English pipeline
nlp = spacy.load("en_core_web_sm")

# Processing a text
doc = nlp("She ate the pizza.")

for token in doc:
  print(token.text, token.pos_)

She PRON
ate VERB
the DET
pizza NOUN
. PUNCT


In [11]:
# predicting syntactic dependencies

for token in doc:
  print(token.text, token.pos_, token.dep_, token.head.text)

She PRON nsubj ate
ate VERB ROOT ate
the DET det pizza
pizza NOUN dobj ate
. PUNCT punct ate


To describe syntactic dependencies, spaCy uses a standardized label scheme. Here's an example of some common labels:

The pronoun "She" is a nominal subject attached to the verb – in this case, to "ate".

The noun "pizza" is a direct object attached to the verb "ate". It is eaten by the subject, "she".

The determiner "the", also known as an article, is attached to the noun "pizza".

In [13]:
# predicting the named entities

doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

for token in doc.ents:
  print(token.text, token.label_)

Apple ORG
U.K. GPE
$1 billion MONEY


Named entities are "real world objects" that are assigned a name – for example, a person, an organization or a country.

The doc.ents property lets you access the named entities predicted by the named entity recognition model.

It returns an iterator of Span objects, so we can print the entity text and the entity label using the .label_ attribute.

In this case, the model is correctly predicting "Apple" as an organization, "U.K." as a geopolitical entity and "$1 billion" as money.

*A quick tip: To get definitions for the most common tags and labels, you can use the spacy.explain helper function.*

*For example, "GPE" for geopolitical entity isn't exactly intuitive – but spacy.explain can tell you that it refers to countries, cities and states.*

*The same works for part-of-speech tags and dependency labels.*

In [15]:
print("GPE -> ", spacy.explain("GPE"))
print("NNP -> ", spacy.explain("NNP"))
print("dobj -> ", spacy.explain("dobj"))

GPE ->  Countries, cities, states
NNP ->  noun, proper singular
dobj ->  direct object
