# Stanza examples

Additional information: https://stanfordnlp.github.io/stanza/usage.html

In [1]:
import stanza

Stanza will try to download resuorces needed to build the specified pipeline. 

You can download the models beforehand, see details here: https://stanfordnlp.github.io/stanza/faq.html#troubleshooting-download--installation

Below is a definition of a pipeline for tokenization, lemmatization, and POS-tagging.

In [2]:
nlp = stanza.Pipeline(lang='en', processors='tokenize,pos,lemma', download_method="reuse_resources")

2023-09-12 19:25:20 INFO: Loading these models for language: en (English):
| Processor | Package  |
------------------------
| tokenize  | combined |
| pos       | combined |
| lemma     | combined |

2023-09-12 19:25:20 INFO: Using device: cpu
2023-09-12 19:25:20 INFO: Loading: tokenize
2023-09-12 19:25:20 INFO: Loading: pos
2023-09-12 19:25:20 INFO: Loading: lemma
2023-09-12 19:25:20 INFO: Done loading processors!


Text processning:

In [3]:
doc = nlp('It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of some one or other of their daughters.')

Accessing results:

In [4]:
for sentence in doc.sentences:
    for word in sentence.words:
        print(word.text, word.lemma, word.pos)

It it PRON
is be AUX
a a DET
truth truth NOUN
universally universally ADV
acknowledged acknowledge VERB
, , PUNCT
that that SCONJ
a a DET
single single ADJ
man man NOUN
in in ADP
possession possession NOUN
of of ADP
a a DET
good good ADJ
fortune fortune NOUN
must must AUX
be be AUX
in in ADP
want want NOUN
of of ADP
a a DET
wife wife NOUN
. . PUNCT
However however ADV
little little ADJ
known know VERB
the the DET
feelings feeling NOUN
or or CCONJ
views view NOUN
of of ADP
such such DET
a a DET
man man NOUN
may may AUX
be be AUX
on on ADP
his he PRON
first first ADJ
entering enter VERB
a a DET
neighbourhood neighbourhood NOUN
, , PUNCT
this this DET
truth truth NOUN
is be AUX
so so ADV
well well ADV
fixed fix VERB
in in ADP
the the DET
minds mind NOUN
of of ADP
the the DET
surrounding surround VERB
families family NOUN
, , PUNCT
that that SCONJ
he he PRON
is be AUX
considered consider VERB
as as ADP
the the DET
rightful rightful ADJ
property property NOUN
of of ADP
some some DET
one one

Sentiment analysis pipeline:

In [6]:
nlp = stanza.Pipeline(lang='en', processors='tokenize,sentiment', download_method="reuse_resources")

2023-09-12 19:25:29 INFO: Loading these models for language: en (English):
| Processor | Package  |
------------------------
| tokenize  | combined |
| sentiment | sstplus  |

2023-09-12 19:25:29 INFO: Using device: cpu
2023-09-12 19:25:29 INFO: Loading: tokenize
2023-09-12 19:25:29 INFO: Loading: sentiment
2023-09-12 19:25:30 INFO: Done loading processors!


In [7]:
doc = nlp('It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of some one or other of their daughters.')

The models assign a sentiment score to each sentence:

In [8]:
for i, sentence in enumerate(doc.sentences):
    print("%d -> %d" % (i, sentence.sentiment))

0 -> 1
1 -> 2


This Yelp review contains two sentences:

In [9]:
doc = nlp('I could care less... The interior is just beautiful.')

In [10]:
for i, sentence in enumerate(doc.sentences):
    print("%d -> %d" % (i, sentence.sentiment))

0 -> 0
1 -> 2


You can switch off sentence splitting. Don't feed reviews by one to the model; instead use double line (\n\n) to separate individual reviews in a long documents. 

In [11]:
nlp = stanza.Pipeline(lang='en', processors='tokenize,sentiment', download_method=None, tokenize_no_ssplit=True)

2023-09-12 19:25:37 INFO: Loading these models for language: en (English):
| Processor | Package  |
------------------------
| tokenize  | combined |
| sentiment | sstplus  |

2023-09-12 19:25:37 INFO: Using device: cpu
2023-09-12 19:25:37 INFO: Loading: tokenize
2023-09-12 19:25:37 INFO: Loading: sentiment
2023-09-12 19:25:37 INFO: Done loading processors!


In [12]:
doc = nlp('I could care less... The interior is just beautiful.')

In [13]:
for i, sentence in enumerate(doc.sentences):
    print("%d -> %d" % (i, sentence.sentiment))

0 -> 2
