## **NLP with spaCy**

SpaCy is one of the most famous framework for NLP. It can be used for the implementation of tasks for sentiment analysis, chatbots, text summarization, intent and entity extraction, and others.

More information about spaCy:  [www.spacy.io](https://spacy.io/)

In this notebook some basic examples for following topics are shown:
- Tokenization
- Sentence Tokenization
- Part-Of-Speach Tagging
- named entity recognition

In [None]:
# Load resources for all following code cells
import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
sp = spacy.load('en_core_web_sm')

#### **Word-Tokenization**

Access the word tokens by iterating over the document object ``doc`` and print them.

In [None]:
# Create document
doc = sp(u'I am non-vegetarian, send me the menu at abs-xyz@gmail.com. "They are going to U.K. and the to the U.S.A"')

for token in doc:
    print(token.text)

I
am
non
-
vegetarian
,
send
me
the
menu
at
abs-xyz@gmail.com
.
"
They
are
going
to
U.K.
and
the
to
the
U.S.A
"


#### **Sentence-Tokenization**

In [None]:
# Print the whole sentences from the document 'doc'
for sentence in doc.sents:
    print(sentence)

I am non-vegetarian, send me the menu at abs-xyz@gmail.com.
"They are going to U.K. and the to the U.S.A"


#### **Part-Of-Speech (POS) tagging**

To output POS tags in spaCy, we iterate over the word token in our document ``doc_POS`` and print out the ``pos_`` attribute of each token


In [None]:
# Create POS document
doc_POS = sp(u"I am going to complete this book by this weekend")

In [None]:
# Show the found POS items
for word in doc_POS:
    print(word.text + '-->' + word.pos_)

I-->PRON
am-->AUX
going-->VERB
to-->PART
complete-->VERB
this-->DET
book-->NOUN
by-->ADP
this-->DET
weekend-->NOUN


#### **Named Entity Recognition (NER)**

To output named entity labels in spaCy, we just have to iterate over the entities in our document ``doc_ner`` and print out the ``label_`` attribute.

In [None]:
# Create NER document
doc_ner = sp(u'Christiano Ronaldo was signed by Juventus for $105 million')

for entity in doc_ner.ents:
    print(entity.text + ' - ' 
          + entity.label_ + ' - ' 
          + str(spacy.explain(entity.label_)))

Christiano Ronaldo - PERSON - People, including fictional
Juventus - ORG - Companies, agencies, institutions, etc.
$105 million - MONEY - Monetary values, including unit


Copyright © 2021 IU International University of Applied Sciences