# spaCy Demo
spaCy is a library for advanced Natural Language Processing in Python and Cython.
It's built on the very latest research, and was designed from day one to be used in real products.

In [1]:
# initialization
import spacy

## Tokenization of a document using spaCy

In [6]:
# We would be separating a general document of string into various tokens
from spacy.tokenizer import Tokenizer
from spacy.lang.en import English

nlp = English()
tokenizer = Tokenizer(nlp.vocab) #Creating a blank Tokenizer with just the English vocabulary
tokens = tokenizer("This is a demo-string")
for token in tokens:
    print(token)

This
is
a
demo-string


#### Adding special cases while Tokenizing

In [16]:
from spacy.symbols import ORTH
special_case = [{ORTH: "gim"}, {ORTH: "me"}]
nlp.tokenizer.add_special_case("gimme", special_case)
print([w.text for w in nlp("gimme that")])

['gim', 'me', 'that']


## Getting Parts of Speech using spaCY
For this part we will be using **spaCY's pre-trained** model. We will subject a paragraph for tokenization for this purpose.
Following is the lise of all UPOS (Universal Parts of Speech) Symbols

* ADJ: adjective
* ADV: adverb
* AUX: auxiliary verb
* NOUN: noun
* NUM: numeral
* PART: particle
* PRON: pronoun
* PROPN: proper noun
* VERB: verb

In [2]:
nlp = spacy.load("en_core_web_sm")
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
pos_text = {"ADJ": "adjective", "ADV": "adverb","AUX": "auxiliary verb",
            "NOUN": "noun","NUM": "numeral","PART": "particle",
            "PRON": "pronoun","PROPN": "proper noun","VERB": "verb"}
for token in doc:
    if token.pos_ in pos_text:
        print(token.text, pos_text[token.pos_], sep=" : ")

Apple : proper noun
is : auxiliary verb
looking : verb
buying : verb
U.K. : proper noun
startup : noun
1 : numeral
billion : numeral


## Named entity recognition
spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens.
The default model identifies a variety of named and numeric entities, including companies, locations, organizations and products. You can add arbitrary classes to the entity recognition system, and update the model with new examples.

In [5]:
# Example 1
text = "Google surpassed Microsoft's Bing search engine and is now worth $787.6 billion"
doc = nlp(text)
for entity in doc.ents:
    print(entity.text, entity.label_, sep=" : ")

Google : ORG
Microsoft : ORG
Bing : ORG
$787.6 billion : MONEY


In [6]:
# Example 2
text = "India currently has a GDP of $12 billion and it aims to overcome US some day"
doc = nlp(text)
for entity in doc.ents:
    print(entity.text, entity.label_, sep=" : ")

India:GPE
$12 billion:MONEY
US:GPE


In [9]:
# Example 3
text = "Jeff Bezos is trying to sponsor Amazon in Airports, such as the J.F.K. International Airport." \
    "Amazon is also trying to enter the food market by selling groceries, fruits etc." \
    "Currently Amazon holds 15% of the food market." \
    "This was a part of my report for the English essay." \
    "I got the first prize for submitting this report" \
    "The competition was held on 22nd August 2019 and I got the prize by 10 pm"
doc = nlp(text)
for entity in doc.ents:
    print(entity.text, entity.label_, sep=" : ")

Jeff Bezos : PERSON
Amazon : ORG
Airports : ORG
the J.F.K. International Airport : FAC
Amazon : ORG
Amazon : ORG
15% : PERCENT
English : NORP
first : ORDINAL
22nd August 2019 : DATE
10 pm : TIME


## Visualizers

In [1]:
from spacy import displacy
doc = nlp("This is a sentence")
displacy.render(doc, style="dep")

NameError: name 'spacy' is not defined