<a href="https://colab.research.google.com/github/Viny2030/UNED/blob/main/Stanza.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installing Stanza

In [None]:
!pip install stanza

import stanza



## Downloading Models



In [None]:
# Download an English model into the default directory
print("Downloading English model...")
stanza.download('en')

# Note that you can use verbose=False to turn off all printed messages
print("Downloading Chinese model...")
stanza.download('zh', verbose=False)

Downloading English model...


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 20.9MB/s]                    
2021-03-01 06:05:13 INFO: Downloading default packages for language: en (English)...
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/default.zip: 100%|██████████| 411M/411M [00:46<00:00, 8.76MB/s]
2021-03-01 06:06:07 INFO: Finished downloading models and saved to /root/stanza_resources.


Downloading Chinese model...


### Processing Text


### Constructing Pipeline



In [None]:
# Build an English pipeline, with all processors by default
print("Building an English pipeline...")
en_nlp = stanza.Pipeline('en')

# Build a Chinese pipeline, with customized processor list and no logging, and force it to use CPU
print("Building a Chinese pipeline...")
zh_nlp = stanza.Pipeline('zh', processors='tokenize,lemma,pos,depparse', verbose=False, use_gpu=False)

Building an English pipeline...
Building a Chinese pipeline...


### Annotating Text

After a pipeline is successfully constructed, you can get annotations of a piece of text simply by passing the string into the pipeline object. The pipeline will return a `Document` object, which can be used to access detailed annotations from. For example:


In [None]:
# Processing English text
en_doc = en_nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
print(type(en_doc))

# Processing Chinese text
zh_doc = zh_nlp("达沃斯世界经济论坛是每年全球政商界领袖聚在一起的年度盛事。")
print(type(zh_doc))

<class 'stanza.models.common.doc.Document'>
<class 'stanza.models.common.doc.Document'>


## Accessing Annotations

Annotations can be accessed from the returned `Document` object.


In [None]:
for i, sent in enumerate(en_doc.sentences):
    print("[Sentence {}]".format(i+1))
    for word in sent.words:
        print("{:12s}\t{:12s}\t{:6s}\t{:d}\t{:12s}".format(\
              word.text, word.lemma, word.pos, word.head, word.deprel))
    print("")

[Sentence 1]
Barack      	Barack      	PROPN 	4	nsubj:pass  
Obama       	Obama       	PROPN 	1	flat        
was         	be          	AUX   	4	aux:pass    
born        	bear        	VERB  	0	root        
in          	in          	ADP   	6	case        
Hawaii      	Hawaii      	PROPN 	4	obl         
.           	.           	PUNCT 	4	punct       

[Sentence 2]
He          	he          	PRON  	3	nsubj:pass  
was         	be          	AUX   	3	aux:pass    
elected     	elect       	VERB  	0	root        
president   	president   	NOUN  	3	xcomp       
in          	in          	ADP   	6	case        
2008        	2008        	NUM   	3	obl         
.           	.           	PUNCT 	3	punct       



The following example iterate over all extracted named entity mentions and print out their character spans and types.

In [None]:
print("Mention text\tType\tStart-End")
for ent in en_doc.ents:
    print("{}\t{}\t{}-{}".format(ent.text, ent.type, ent.start_char, ent.end_char))

Mention text	Type	Start-End
Barack Obama	PERSON	0-12
Hawaii	GPE	25-31
2008	DATE	62-66


And similarly for the Chinese text:

In [None]:
for i, sent in enumerate(zh_doc.sentences):
    print("[Sentence {}]".format(i+1))
    for word in sent.words:
        print("{:12s}\t{:12s}\t{:6s}\t{:d}\t{:12s}".format(\
              word.text, word.lemma, word.pos, word.head, word.deprel))
    print("")

[Sentence 1]
达沃斯         	达沃斯         	PROPN 	4	nmod        
世界          	世界          	NOUN  	4	nmod        
经济          	经济          	NOUN  	4	nmod        
论坛          	论坛          	NOUN  	16	nsubj       
是           	是           	AUX   	16	cop         
每年          	每年          	DET   	10	det         
全           	全           	DET   	8	det         
球政          	球政          	NOUN  	10	nmod        
商界          	商界          	NOUN  	10	nmod        
领袖          	领袖          	NOUN  	11	nsubj       
聚           	聚           	VERB  	16	acl:relcl   
在           	在           	VERB  	11	mark        
一起          	一起          	NOUN  	11	obj         
的           	的           	PART  	11	mark:relcl  
年度          	年度          	NOUN  	16	nmod        
盛事          	盛事          	NOUN  	0	root        
。           	。           	PUNCT 	16	punct       



Alternatively, you can directly print a `Word` object to view all its annotations as a Python dict:

In [None]:
word = en_doc.sentences[0].words[0]
print(word)

{
  "id": 1,
  "text": "Barack",
  "lemma": "Barack",
  "upos": "PROPN",
  "xpos": "NNP",
  "feats": "Number=Sing",
  "head": 4,
  "deprel": "nsubj:pass",
  "misc": "start_char=0|end_char=6"
}


In [None]:
stanza.download(lang="en")
nlp = stanza.Pipeline(lang="en")
doc = nlp("Question answering is a task where a sentence or sample of text is provided from which questions are asked and must be answered.")
for sentence in doc.sentences:
    print(sentence.ents)
    print(sentence.dependencies)

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 20.7MB/s]                    
2021-03-01 12:01:08 INFO: Downloading default packages for language: en (English)...
2021-03-01 12:01:09 INFO: File exists: /root/stanza_resources/en/default.zip.
2021-03-01 12:01:14 INFO: Finished downloading models and saved to /root/stanza_resources.
2021-03-01 12:01:14 INFO: Loading these models for language: en (English):
| Processor | Package   |
-------------------------
| tokenize  | combined  |
| pos       | combined  |
| lemma     | combined  |
| depparse  | combined  |
| sentiment | sstplus   |
| ner       | ontonotes |

2021-03-01 12:01:14 INFO: Use device: cpu
2021-03-01 12:01:14 INFO: Loading: tokenize
2021-03-01 12:01:14 INFO: Loading: pos
2021-03-01 12:01:15 INFO: Loading: lemma
2021-03-01 12:01:15 INFO: Loading: depparse
2021-03-01 12:01:15 INFO: Loading: sentiment
2021-03-01 12:01:16 INFO: Loading: ner
2021-03-01 12:01:17 

[]
[({
  "id": 2,
  "text": "answering",
  "lemma": "answer",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Number=Sing",
  "head": 5,
  "deprel": "nsubj",
  "misc": "start_char=9|end_char=18"
}, 'compound', {
  "id": 1,
  "text": "Question",
  "lemma": "question",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Number=Sing",
  "head": 2,
  "deprel": "compound",
  "misc": "start_char=0|end_char=8"
}), ({
  "id": 5,
  "text": "task",
  "lemma": "task",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Number=Sing",
  "head": 0,
  "deprel": "root",
  "misc": "start_char=24|end_char=28"
}, 'nsubj', {
  "id": 2,
  "text": "answering",
  "lemma": "answer",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Number=Sing",
  "head": 5,
  "deprel": "nsubj",
  "misc": "start_char=9|end_char=18"
}), ({
  "id": 5,
  "text": "task",
  "lemma": "task",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Number=Sing",
  "head": 0,
  "deprel": "root",
  "misc": "start_char=24|end_char=28"
}, 'cop', {
  "id": 3,
  "tex

In [None]:
for sentence in doc.sentences:
    for word in sentence.words:
        print("{:12s}\t{:12s}\t{:6s}".format(word.text, word.lemma, word.pos))

Question    	question    	NOUN  
answering   	answer      	NOUN  
is          	be          	AUX   
a           	a           	DET   
task        	task        	NOUN  
where       	where       	SCONJ 
a           	a           	DET   
sentence    	sentence    	NOUN  
or          	or          	CCONJ 
sample      	sample      	NOUN  
of          	of          	ADP   
text        	text        	NOUN  
is          	be          	AUX   
provided    	provide     	VERB  
from        	from        	ADP   
which       	which       	DET   
questions   	question    	NOUN  
are         	be          	AUX   
asked       	ask         	VERB  
and         	and         	CCONJ 
must        	must        	AUX   
be          	be          	AUX   
answered    	answer      	VERB  
.           	.           	PUNCT 


In [None]:
stanza.download('hi')
hi_nlp = stanza.Pipeline('hi')

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 27.7MB/s]                    
2021-03-01 10:14:20 INFO: Downloading default packages for language: hi (Hindi)...
2021-03-01 10:14:21 INFO: File exists: /root/stanza_resources/hi/default.zip.
2021-03-01 10:14:23 INFO: Finished downloading models and saved to /root/stanza_resources.
2021-03-01 10:14:23 INFO: Loading these models for language: hi (Hindi):
| Processor | Package |
-----------------------
| tokenize  | hdtb    |
| pos       | hdtb    |
| lemma     | hdtb    |
| depparse  | hdtb    |

2021-03-01 10:14:23 INFO: Use device: cpu
2021-03-01 10:14:23 INFO: Loading: tokenize
2021-03-01 10:14:23 INFO: Loading: pos
2021-03-01 10:14:23 INFO: Loading: lemma
2021-03-01 10:14:23 INFO: Loading: depparse
2021-03-01 10:14:24 INFO: Done loading processors!


In [None]:
hindi_doc = hi_nlp("प्रश्न का उत्तर देना एक ऐसा कार्य है जहाँ एक वाक्य या पाठ का नमूना प्रदान किया जाता है जहाँ से प्रश्न पूछे जाते हैं और उसका उत्तर दिया जाना चाहिए।")


In [None]:
for sentence in hindi_doc.sentences:
    for word in sentence.words:
        print("{:12s}\t{:12s}\t{:6s}".format(word.text, word.lemma, word.pos))

प्रश्न      	प्रश्न      	NOUN  
का          	का          	ADP   
उत्तर       	उत्तर       	NOUN  
देना        	दे          	VERB  
एक          	एक          	NUM   
ऐसा         	ऐसा         	DET   
कार्य       	कार्य       	NOUN  
है          	है          	VERB  
जहाँ        	जहाँ        	PRON  
एक          	एक          	NUM   
वाक्य       	वाक्य       	NOUN  
या          	या          	CCONJ 
पाठ         	पाठ         	NOUN  
का          	का          	ADP   
नमूना       	नमूना       	NOUN  
प्रदान      	प्रदान      	NOUN  
किया        	कर          	VERB  
जाता        	जा          	AUX   
है          	है          	AUX   
जहाँ        	जहाँ        	PRON  
से          	से          	ADP   
प्रश्न      	प्रश्न      	NOUN  
पूछे        	पूछ         	VERB  
जाते        	जा          	AUX   
हैं         	है          	AUX   
और          	और          	CCONJ 
उसका        	वह          	PRON  
उत्तर       	उत्तर       	NOUN  
दिया        	दे          	VERB  
जाना        	जा          	AUX   
चाहिए     

In [None]:
for sentence in hindi_doc.sentences:
    print(sentence.ents)
    print(sentence.dependencies)

[]
[({
  "id": 3,
  "text": "उत्तर",
  "lemma": "उत्तर",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Case=Nom|Gender=Masc|Number=Sing|Person=3",
  "head": 4,
  "deprel": "obj",
  "misc": "start_char=10|end_char=15"
}, 'nmod', {
  "id": 1,
  "text": "प्रश्न",
  "lemma": "प्रश्न",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Case=Acc|Gender=Masc|Number=Sing|Person=3",
  "head": 3,
  "deprel": "nmod",
  "misc": "start_char=0|end_char=6"
}), ({
  "id": 1,
  "text": "प्रश्न",
  "lemma": "प्रश्न",
  "upos": "NOUN",
  "xpos": "NN",
  "feats": "Case=Acc|Gender=Masc|Number=Sing|Person=3",
  "head": 3,
  "deprel": "nmod",
  "misc": "start_char=0|end_char=6"
}, 'case', {
  "id": 2,
  "text": "का",
  "lemma": "का",
  "upos": "ADP",
  "xpos": "PSP",
  "feats": "AdpType=Post|Case=Nom|Gender=Masc|Number=Sing",
  "head": 1,
  "deprel": "case",
  "misc": "start_char=7|end_char=9"
}), ({
  "id": 4,
  "text": "देना",
  "lemma": "दे",
  "upos": "VERB",
  "xpos": "VM",
  "feats": "Case=Nom|VerbForm=Inf"

In [None]:
for i, sent in enumerate(hindi_doc.sentences):
    print("[Sentence {}]".format(i+1))
    for word in sent.words:
        print("{:12s}\t{:12s}\t{:6s}\t{:d}\t{:12s}".format(\
              word.text, word.lemma, word.pos, word.head, word.deprel))
    print("")

[Sentence 1]
प्रश्न      	प्रश्न      	NOUN  	3	nmod        
का          	का          	ADP   	1	case        
उत्तर       	उत्तर       	NOUN  	4	obj         
देना        	दे          	VERB  	8	nsubj       
एक          	एक          	NUM   	7	nummod      
ऐसा         	ऐसा         	DET   	7	det         
कार्य       	कार्य       	NOUN  	8	obj         
है          	है          	VERB  	0	root        
जहाँ        	जहाँ        	PRON  	17	obl         
एक          	एक          	NUM   	11	nummod      
वाक्य       	वाक्य       	NOUN  	15	nmod        
या          	या          	CCONJ 	13	cc          
पाठ         	पाठ         	NOUN  	11	conj        
का          	का          	ADP   	11	case        
नमूना       	नमूना       	NOUN  	17	obj         
प्रदान      	प्रदान      	NOUN  	17	compound    
किया        	कर          	VERB  	7	acl:relcl   
जाता        	जा          	AUX   	17	aux:pass    
है          	है          	AUX   	17	aux:pass    
जहाँ        	जहाँ        	PRON  	23	obl         
से          	से 

In [None]:
stanza.download('en', package = 'partut',verbose=True)

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 21.8MB/s]                    
2021-03-01 09:21:03 INFO: Downloading these customized packages for language: en (English)...
| Processor | Package |
-----------------------
| tokenize  | partut  |
| mwt       | partut  |
| pos       | partut  |
| lemma     | partut  |
| depparse  | partut  |
| pretrain  | partut  |

2021-03-01 09:21:03 INFO: File exists: /root/stanza_resources/en/tokenize/partut.pt.
2021-03-01 09:21:03 INFO: File exists: /root/stanza_resources/en/mwt/partut.pt.
2021-03-01 09:21:03 INFO: File exists: /root/stanza_resources/en/pos/partut.pt.
2021-03-01 09:21:03 INFO: File exists: /root/stanza_resources/en/lemma/partut.pt.


Downloading English model...


2021-03-01 09:21:03 INFO: File exists: /root/stanza_resources/en/depparse/partut.pt.
2021-03-01 09:21:04 INFO: File exists: /root/stanza_resources/en/pretrain/partut.pt.
2021-03-01 09:21:04 INFO: Finished downloading models and saved to /root/stanza_resources.


In [None]:
stanza.download('en', package = 'lines', processors='tokenize,pos', verbose=True)

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 28.5MB/s]                    
2021-03-01 09:27:12 INFO: Downloading these customized packages for language: en (English)...
| Processor | Package |
-----------------------
| tokenize  | lines   |
| pos       | lines   |
| pretrain  | lines   |



Downloading English model...


Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/tokenize/lines.pt: 100%|██████████| 626k/626k [00:00<00:00, 3.12MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/pos/lines.pt: 100%|██████████| 18.3M/18.3M [00:00<00:00, 20.8MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/pretrain/lines.pt: 100%|██████████| 107M/107M [00:24<00:00, 4.36MB/s]
2021-03-01 09:27:38 INFO: Finished downloading models and saved to /root/stanza_resources.


In [None]:
stanza.download('en', package = 'partut' , verbose=True)
nlp = stanza.Pipeline('en', package = 'partut')

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 29.8MB/s]                    
2021-03-01 09:33:34 INFO: Downloading these customized packages for language: en (English)...
| Processor | Package |
-----------------------
| tokenize  | partut  |
| mwt       | partut  |
| pos       | partut  |
| lemma     | partut  |
| depparse  | partut  |
| pretrain  | partut  |

2021-03-01 09:33:35 INFO: File exists: /root/stanza_resources/en/tokenize/partut.pt.
2021-03-01 09:33:35 INFO: File exists: /root/stanza_resources/en/mwt/partut.pt.
2021-03-01 09:33:35 INFO: File exists: /root/stanza_resources/en/pos/partut.pt.
2021-03-01 09:33:35 INFO: File exists: /root/stanza_resources/en/lemma/partut.pt.
2021-03-01 09:33:35 INFO: File exists: /root/stanza_resources/en/depparse/partut.pt.
2021-03-01 09:33:35 INFO: File exists: /root/stanza_resources/en/pretrain/partut.pt.
2021-03-01 09:33:35 INFO: Finished downloading models and saved to /

In [None]:
type(nlp)

stanza.pipeline.core.Pipeline

In [None]:
stanza.download('hi', processors='tokenize,pos')
nlp = stanza.Pipeline('hi', processors='tokenize,pos', verbose = True)

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 26.3MB/s]                    
2021-03-01 10:08:13 INFO: Downloading these customized packages for language: hi (Hindi)...
| Processor | Package |
-----------------------
| tokenize  | hdtb    |
| pos       | hdtb    |
| pretrain  | hdtb    |

2021-03-01 10:08:13 INFO: File exists: /root/stanza_resources/hi/tokenize/hdtb.pt.
2021-03-01 10:08:13 INFO: File exists: /root/stanza_resources/hi/pos/hdtb.pt.
2021-03-01 10:08:13 INFO: File exists: /root/stanza_resources/hi/pretrain/hdtb.pt.
2021-03-01 10:08:13 INFO: Finished downloading models and saved to /root/stanza_resources.
2021-03-01 10:08:13 INFO: Loading these models for language: hi (Hindi):
| Processor | Package |
-----------------------
| tokenize  | hdtb    |
| pos       | hdtb    |

2021-03-01 10:08:13 INFO: Use device: cpu
2021-03-01 10:08:13 INFO: Loading: tokenize
2021-03-01 10:08:13 INFO: Loading: pos
2021-03-0

In [None]:
stanza.download('it', processors='tokenize,mwt', package='twittiro')
nlp = stanza.Pipeline('it', processors='tokenize,mwt', package='twittiro')


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 25.9MB/s]                    
2021-03-01 09:50:23 INFO: Downloading these customized packages for language: it (Italian)...
| Processor | Package  |
------------------------
| tokenize  | twittiro |
| mwt       | twittiro |

Downloading http://nlp.stanford.edu/software/stanza/1.2.0/it/tokenize/twittiro.pt: 100%|██████████| 632k/632k [00:00<00:00, 2.94MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/it/mwt/twittiro.pt: 100%|██████████| 564k/564k [00:00<00:00, 1.16MB/s]
2021-03-01 09:50:24 INFO: Finished downloading models and saved to /root/stanza_resources.
2021-03-01 09:50:24 INFO: Loading these models for language: it (Italian):
| Processor | Package  |
------------------------
| tokenize  | twittiro |
| mwt       | twittiro |

2021-03-01 09:50:24 INFO: Use device: cpu
2021-03-01 09:50:24 INFO: Loading: tokenize
2021-03-01 09:50:24 INFO: Loading: mwt
20

In [None]:
stanza.download('nl', processors={'ner': 'conll02'})
nlp = stanza.Pipeline('nl', processors={'ner': 'conll02'})

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 37.7MB/s]                    
2021-03-01 09:58:39 INFO: Downloading these customized packages for language: nl (Dutch)...
| Processor       | Package |
-----------------------------
| tokenize        | alpino  |
| pos             | alpino  |
| lemma           | alpino  |
| depparse        | alpino  |
| ner             | conll02 |
| forward_charlm  | ccwiki  |
| pretrain        | alpino  |
| backward_charlm | ccwiki  |

Downloading http://nlp.stanford.edu/software/stanza/1.2.0/nl/tokenize/alpino.pt: 100%|██████████| 628k/628k [00:00<00:00, 1.27MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/nl/pos/alpino.pt: 100%|██████████| 20.4M/20.4M [00:04<00:00, 4.17MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/nl/lemma/alpino.pt: 100%|██████████| 3.80M/3.80M [00:00<00:00, 9.85MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/nl/deppar

In [None]:
processor_dict = {
    'tokenize': 'gsd',
    'pos': 'hdt',
    'ner': 'conll03',
    'lemma': 'default'
}
stanza.download('de', processors=processor_dict, package=None)
nlp = stanza.Pipeline('de', processors=processor_dict, package=None)

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 28.0MB/s]                    
2021-03-01 10:08:16 INFO: Downloading these customized packages for language: de (German)...
| Processor       | Package  |
------------------------------
| tokenize        | gsd      |
| mwt             | gsd      |
| pos             | hdt      |
| lemma           | gsd      |
| ner             | conll03  |
| backward_charlm | newswiki |
| pretrain        | hdt      |
| forward_charlm  | newswiki |

2021-03-01 10:08:16 INFO: File exists: /root/stanza_resources/de/tokenize/gsd.pt.
2021-03-01 10:08:16 INFO: File exists: /root/stanza_resources/de/mwt/gsd.pt.
2021-03-01 10:08:16 INFO: File exists: /root/stanza_resources/de/pos/hdt.pt.
2021-03-01 10:08:16 INFO: File exists: /root/stanza_resources/de/lemma/gsd.pt.
2021-03-01 10:08:17 INFO: File exists: /root/stanza_resources/de/ner/conll03.pt.
2021-03-01 10:08:17 INFO: File exists: /root/stanza_

In [None]:
from stanza.pipeline.processor import Processor, register_processor, register_processor_variant

@register_processor("lowercase")
class LowercaseProcessor(Processor):
    ''' Processor that lowercases all text '''
    _requires = set(['tokenize'])
    _provides = set(['lowercase'])

    def __init__(self, config, pipeline, use_gpu):
        pass

    def _set_up_model(self, *args):
        pass

    def process(self, doc):
        doc.text = doc.text.lower()
        for sent in doc.sentences:
            for tok in sent.tokens:
                tok.text = tok.text.lower()

            for word in sent.words:
                word.text = word.text.lower()

        return doc

In [None]:
nlp = stanza.Pipeline(lang='en', processors='tokenize,lowercase')

doc = nlp("Question answering is a task where a sentence or sample of text is provided from which questions are asked and must be answered.")


2021-03-01 12:01:21 INFO: Loading these models for language: en (English):
| Processor | Package  |
------------------------
| tokenize  | combined |
| lowercase | default  |

2021-03-01 12:01:21 INFO: Use device: cpu
2021-03-01 12:01:21 INFO: Loading: tokenize
2021-03-01 12:01:21 INFO: Loading: lowercase
2021-03-01 12:01:21 INFO: Done loading processors!


In [None]:
s =[]
for sentence in doc.sentences:
    for word in sentence.words:
        s.append(word.text)
print(" ".join(s))

question answering is a task where a sentence or sample of text is provided from which questions are asked and must be answered .


In [None]:
import stanza
stanza.download('en', package='craft')
nlp = stanza.Pipeline('en', package='craft')
doc = nlp('A single-cell transcriptomic atlas characterizes ageing tissues in the mouse.')
doc.sentences[0].print_dependencies()

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 23.9MB/s]                    
2021-03-01 11:24:29 INFO: Downloading these customized packages for language: en (English)...
| Processor | Package |
-----------------------
| tokenize  | craft   |
| pos       | craft   |
| lemma     | craft   |
| depparse  | craft   |
| pretrain  | craft   |

Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/tokenize/craft.pt: 100%|██████████| 637k/637k [00:00<00:00, 2.97MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/pos/craft.pt: 100%|██████████| 21.6M/21.6M [00:01<00:00, 19.3MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/lemma/craft.pt: 100%|██████████| 4.55M/4.55M [00:00<00:00, 9.79MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/depparse/craft.pt: 100%|██████████| 110M/110M [00:08<00:00, 12.7MB/s]
Downloading http://nlp.stanford.edu/software/stanza/1.2.0/en/pretrain

('A', 6, 'det')
('single', 4, 'amod')
('-', 4, 'punct')
('cell', 6, 'compound')
('transcriptomic', 6, 'amod')
('atlas', 7, 'nsubj')
('characterizes', 0, 'root')
('ageing', 9, 'compound')
('tissues', 7, 'obj')
('in', 12, 'case')
('the', 12, 'det')
('mouse', 7, 'obl')
('.', 7, 'punct')


In [None]:
# mimic pipeline with an i2b2 NER model
stanza.download('en', package='mimic', processors={'ner': 'i2b2'})
nlp = stanza.Pipeline('en', package='mimic', processors={'ner': 'i2b2'})
# annotate clinical text
doc = nlp('The patient had a dry cough and fever, they were treated with Paracetamol.')
# print out the entities
for ent in doc.entities:
    print(f'{ent.text}\t{ent.type}')

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.2.0.json: 128kB [00:00, 25.0MB/s]                    
2021-03-01 11:36:58 INFO: Downloading these customized packages for language: en (English)...
| Processor       | Package |
-----------------------------
| tokenize        | mimic   |
| pos             | mimic   |
| lemma           | mimic   |
| depparse        | mimic   |
| ner             | i2b2    |
| forward_charlm  | mimic   |
| pretrain        | mimic   |
| backward_charlm | mimic   |

2021-03-01 11:36:58 INFO: File exists: /root/stanza_resources/en/tokenize/mimic.pt.
2021-03-01 11:36:58 INFO: File exists: /root/stanza_resources/en/pos/mimic.pt.
2021-03-01 11:36:58 INFO: File exists: /root/stanza_resources/en/lemma/mimic.pt.
2021-03-01 11:36:58 INFO: File exists: /root/stanza_resources/en/depparse/mimic.pt.
2021-03-01 11:36:58 INFO: File exists: /root/stanza_resources/en/ner/i2b2.pt.
2021-03-01 11:36:58 INFO: File exists: /root/stanza

a dry cough	PROBLEM
fever	PROBLEM
Paracetamol	TREATMENT
