# NLP - Using Stanza library

- **Created by Andrés Segura Tinoco**
- **Created on January 21, 2022**
- **Updated on January 21, 2022**

**Stanza** is a Python NLP toolkit that supports 60+ human languages. It is built with highly accurate neural network components that enable efficient training and evaluation with your own annotated data, and offers pretrained models on 100 treebanks. Additionally, Stanza provides a stable, officially maintained Python interface to **Java Stanford CoreNLP Toolkit**.

In [1]:
import stanza

In [2]:
stanza.__version__

'1.2.3'

## Stanza Text Processing

The following is a set of English sentences from Chapter 1 (A SCANDAL IN BOHEMIA) of the book The Adventures of Sherlock Holmes by Sr. Arthur Conan Doyle.

In [3]:
# Sentences
en_text = """
To Sherlock Holmes she is always the woman. 
I have seldom heard him mention her under any other name. 
In his eyes she eclipses and predominates the whole of her sex. 
It was not that he felt any emotion akin to love for Irene Adler.
"""

### Step 1 - Downloading model
Download an English model into the default directory. This command should not always be executed, but only the first time an English model is used or when it needs to be updated.

In [4]:
print("Downloading English model...")
#stanza.download('en')

Downloading English model...


### Step 2 - Creating pipeline

In [5]:
# Build an English pipeline, with all processors by default
print("Building an English pipeline...")
en_nlp = stanza.Pipeline('en')

2022-01-20 11:06:48 INFO: Loading these models for language: en (English):
| Processor | Package   |
-------------------------
| tokenize  | combined  |
| pos       | combined  |
| lemma     | combined  |
| depparse  | combined  |
| sentiment | sstplus   |
| ner       | ontonotes |

2022-01-20 11:06:48 INFO: Use device: cpu
2022-01-20 11:06:48 INFO: Loading: tokenize
2022-01-20 11:06:48 INFO: Loading: pos


Building an English pipeline...


2022-01-20 11:06:48 INFO: Loading: lemma
2022-01-20 11:06:48 INFO: Loading: depparse
2022-01-20 11:06:49 INFO: Loading: sentiment
2022-01-20 11:06:49 INFO: Loading: ner
2022-01-20 11:06:50 INFO: Done loading processors!


In [6]:
# Creating English model and processing text
en_doc = en_nlp(en_text)
print(type(en_doc))

<class 'stanza.models.common.doc.Document'>


### Step 3 - Accessing annotations

**NLP task**: Splitting Sentences. Show number of sentences in the text.

In [7]:
print("No. sentences:", len(en_doc.sentences))

No. sentences: 4


**NLP task**: Part of Speech tagging. Show annotations on the words of the sentences.

In [8]:
for i, sent in enumerate(en_doc.sentences):
    print("[Sentence {}]".format(i+1))
    for word in sent.words:
        print("{:12s}\t{:12s}\t{:6s}\t{:d}\t{:12s}".format(\
              word.text, word.lemma, word.pos, word.head, word.deprel))
    print("")

[Sentence 1]
To          	to          	ADP   	2	case        
Sherlock    	Sherlock    	PROPN 	8	obl         
Holmes      	Holmes      	PROPN 	2	flat        
she         	she         	PRON  	8	nsubj       
is          	be          	AUX   	8	cop         
always      	always      	ADV   	8	advmod      
the         	the         	DET   	8	det         
woman       	woman       	NOUN  	0	root        
.           	.           	PUNCT 	8	punct       

[Sentence 2]
I           	I           	PRON  	4	nsubj       
have        	have        	AUX   	4	aux         
seldom      	seldom      	ADV   	4	advmod      
heard       	hear        	VERB  	0	root        
him         	he          	PRON  	4	obj         
mention     	mention     	VERB  	4	xcomp       
her         	she         	PRON  	6	obj         
under       	under       	ADP   	11	case        
any         	any         	DET   	11	det         
other       	other       	ADJ   	11	amod        
name        	name        	NOUN  	6	obl         
.         

**NLP task**: Named Entity Recognition. 

In [9]:
print("Mention text\tType\tStart-End")
for ent in en_doc.ents:
    print("{}\t{}\t{}-{}".format(ent.text, ent.type, ent.start_char, ent.end_char))

Mention text	Type	Start-End
Sherlock Holmes	PERSON	4-19
Irene Adler	PERSON	223-234


<hr>
<p><a href="https://ansegura7.github.io/NLP/">« Home</a></p>