# Tutorial Stanford’s CoreNLP

## Psso 1: Instalar CoreNLP

Primeiramente, temos que baixar o CoreNLP. 
O comando abaixo iniciará o download da versão do CoreNLP (3.9.2 3.9.2 em fevereiro de 2019).

**Pode levar tempo dependendendo de sua velocidade de conexão com a internet ... tome um café ;)

In [None]:
!wget https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip https://nlp.stanford.edu/software/stanford-english-corenlp-2018-10-05-models.jar

Quando o download estiver concluído, descompacte o arquivo com os seguintes comandos:

In [None]:
!unzip stanford-corenlp-full-2018-10-05.zip 

**No cmd, copie e cole os comandos abaixo:

In [None]:
mv stanford-english-corenlp-2018-10-05-models.jar stanford-corenlp-full-2018-10-05 

Configure sua classpath e a localização do Stanford CoreNLP e entre no diretório

In [1]:
import os
os.environ["CLASSPATH"]="/home/elvis/BIG-Oil-NLP/SP/stanford-corenlp-full-2018-10-05"
os.environ["CORENLP_HOME"]="/home/elvis/BIG-Oil-NLP/SP/stanford-corenlp-full-2018-10-05"

## Passo 2: Iniciar o servidor e instalar a API Python

Antes de fazer qualquer outra coisa, precisamos instalar a biblioteca. Assumimos que o Python 3.6 ou posterior já esteja instalado. Conforme explicado pelos desenvolvedores, a maneira mais simples de instalar o StanfordNLP é usando o pip:

In [None]:
!pip3 install stanfordnlp


Abra o terminal e cole os comandos abaixo.
O parâmetro -mx8g especifica a quantidade de memória que o CoreNLP pode usar. Nesse caso, são oito gigabytes. O parâmetro -timeout 5000 especifica o tempo limite em milissegundos.

In [None]:
cd stanford-corenlp-full-2018-10-05
java -mx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 5000

Se desejar desligar o servidor, navegue até a janela do terminal usada para iniciá-lo anteriormente e pressione __Ctrl + C__.

## Parser com CoreNLP

Em seguida, devemos baixar os idiomas com os quais queremos trabalhar.

**Você só baixa idiomas uma vez

**Cada idioma requer mais de 1 GB de espaço em disco

**Leva tempo ... tome mais um café ;)

In [2]:
import stanfordnlp

#MODELS_DIR = '.'
#stanfordnlp.download('pt', MODELS_DIR) 

In [None]:
## Exemplo Básico

In [None]:
# Inglês é o idioma padrão
# Invoque stanfordnlp.Pipeline ()
# Para o português:

nlp = stanfordnlp.Pipeline(lang='pt',treebank='pt_bosque',processors='tokenize,mwt,pos,lemma,depparse', models_dir=MODELS_DIR,  use_gpu=True, pos_batch_size=3000) # Construa o pipeline, especificamente o tamanho do lote do part-of-speech processor
doc = nlp("Processamento de língua natural (PLN) é uma subárea da ciência da computação, inteligência artificial e da linguística que estuda os problemas da geração e compreensão automática de línguas humanas naturais.") # Run the pipeline on input text
doc.sentences[0].print_tokens() # Look at the result

In [None]:
config = {
'processors': 'tokenize,pos,depparse', # Comma-separated list of processors to use
'lang': 'pt', # Language code for the language to build the Pipeline in
'pos_pretrain_path': '/path/to/stanfordnlp_resources/pt_bosque_models/pt_bosque.pretrain.pt',
'depparse_pretrain_path': '/path/to/stanfordnlp_resources/pt_bosque_models/pt_bosque.pretrain.pt',
'output_format': 'conllu'
}
nlp = stanfordnlp.Pipeline(**config) # Initialize the pipeline using a configuration dict
doc = nlp('''
Processamento de língua natural (PLN) é uma subárea da ciência da computação, inteligência artificial 
e da linguística que estuda os problemas da geração e compreensão automática de línguas humanas naturais. 
Sistemas de geração de língua natural convertem informação de bancos de dados de computadores em linguagem 
compreensível ao ser humano e sistemas de compreensão de língua natural convertem ocorrências de linguagem 
humana em representações mais formais, mais facilmente manipuláveis por programas de computador. 
Alguns desafios do PLN são compreensão de língua natural, 
fazer com que computadores extraiam sentido de linguagem humana ou natural e geração de língua natural.''') # Run the pipeline on input text
print(doc.conll_file.conll_as_string())

In [2]:
import stanfordnlp

config = {
        'lang': 'pt',
        'processors': 'tokenize,pos',
        'tokenize_pretokenized': True,
        'pos_pretrain_path': './portuguese-ud.tagger',
        'pos_batch_size': 1000
         }

nlp = stanfordnlp.Pipeline(**config)
pretokenized_text = [['PNL','é', 'desafiador','.'],['PNL','é', 'o', 'futuro']]
doc = nlp(pretokenized_text)
print(doc.conll_file.conll_as_string())

Use device: cpu
---
Loading: tokenize
With settings: 
{'model_path': '/home/juliana/stanfordnlp_resources/pt_bosque_models/pt_bosque_tokenizer.pt', 'pretokenized': True, 'lang': 'pt', 'shorthand': 'pt_bosque', 'mode': 'predict'}
---
Loading: pos
With settings: 
{'model_path': '/home/juliana/stanfordnlp_resources/pt_bosque_models/pt_bosque_tagger.pt', 'pretrain_path': './portuguese-ud.tagger', 'batch_size': 1000, 'lang': 'pt', 'shorthand': 'pt_bosque', 'mode': 'predict'}
Pretrained file exists but cannot be loaded from ./portuguese-ud.tagger, due to the following exception:
	invalid load key, '\xac'.


Exception: Vector file is not provided.

# Treinamento do modelo
## POS TAGGUER

Para treinar um modelo do tagger, é preciso criar um arquivo de propriedades (que indica parâmetros de treinamento). 

In [3]:
!java  -cp "*:stanford-corenlp-full-2018-10-05/*" edu.stanford.nlp.tagger.maxent.MaxentTagger -genprops

## Sample properties file for maxent tagger. This file is used for three main
## operations: training, testing, and tagging. It may also be used to dump
## the contents of a model.
## To train or test a model, or to tag something, run:
##   java edu.stanford.nlp.tagger.maxent.MaxentTagger -prop <properties file>
## Arguments can be overridden on the commandline, e.g.:
##   java ....MaxentTagger -prop <properties file> -testFile /other/file 

# Model file name (created at train time; used at tag and test time)
# (you can leave this blank and specify it on the commandline with -model)
# model = 

# Path to file to be operated on (trained from, tested against, or tagged)
# Specify -textFile <filename> to tag text in the given file, -trainFile <filename> to
# to train a model using data in the given file, or -testFile <filename> to test your
# model using data in the given file.  Alternatively, you may specify
# -dump <filename> to dump the parameters stored in a model or 

Pronto o arquivo, é só rodar o tagger:

O CoreNLP vai mostrar na tela o progresso do treinamento. Para avaliar o modelo treinado, basta trocar a linha do trainFile por testFile e dar o nome do arquivo com dados de teste. 


In [None]:
!java -cp "javanlp-core.jar:stanford-corenlp-full-2018-10-05/*" edu.stanford.nlp.tagger.maxent.MaxentTagger -prop portuguese-ud-postagger.props

[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - ## tagger training invoked at Mon Feb 10 15:37:36 BRT 2020 with arguments:
                   model = portuguese-ud-postagger.sp
                    arch = left3words,naacl2003unknowns,unicodeshapes(-1,1)
            wordFunction = 
               trainFile = format=TSV,wordColumn=1,tagColumn=3,/home/elvis/BIG-Oil-NLP/SP/pt-ud-train_sem_metadados.conllu
         closedClassTags = 
 closedClassTagThreshold = 40
 curWordMinFeatureThresh = 2
                   debug = false
             debugPrefix = 
            tagSeparator = 
                encoding = utf-8
              iterations = 100
                    lang = 
    learnClosedClassTags = false
        minFeatureThresh = 2
           openClassTags = 
rareWordMinFeatureThresh = 10
          rareWordThresh = 5
                  search = qn
                    sgml = false
            sigmaSquared = 0.0
                   regL1 = 0.75
               tagInside = 
               

[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 16550 too big: 279.8631578946973
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 18467 too big: 253.94736842094895
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 18865 too big: 307.2473684205311
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 18867 too big: 471.947368420488
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 21194 too big: 405.01578947333826
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 21658 too big: 213.2157894736362
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 22239 too big: 788.9684210526635
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 23360 too big: 580.0578947368397
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 24765 too big: 1600.4105263178415
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lambda 26137 too big: 221.1105263156273
[main] INFO edu.stanford.nlp.maxent.iis.LambdaSolve - lamb

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 5: neg. log cond. likelihood = 235655.37116005854 [8 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 5 evals 7 <D> [M 1.000E0] 2.357E5 6.40s |1.274E4| {2.052E-1} 1.919E-1 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 6: neg. log cond. likelihood = 206283.6543074275 [9 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 6 evals 8 <D> [M 1.000E0] 2.063E5 7.33s |1.047E4| {1.686E-1} 2.064E-1 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 7: neg. log cond. likelihood = 184760.31495985834 [10 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 7 evals 9 <D> [M 1.000E0] 1.848E5 8.26s |9.365E3| {1.508E-1} 2.141E-1 - 
[main] INFO CoreNLP - class edu.stanford.nl

[main] INFO edu.stanford.nlp.tagger.maxent.LambdaSolveTagger - Constraint 10956 not satisfied emp 0.0111 exp 0.0124 diff 0.0012 lambda 4.643
[main] INFO edu.stanford.nlp.tagger.maxent.LambdaSolveTagger - Constraint 13584 not satisfied emp 0.0173 exp 0.0159 diff 0.0014 lambda 3.2956
[main] INFO edu.stanford.nlp.tagger.maxent.LambdaSolveTagger - Constraint 18376 not satisfied emp 0.0115 exp 0.0105 diff 0.001 lambda 0.1018
[main] INFO edu.stanford.nlp.tagger.maxent.LambdaSolveTagger - Constraint 18865 not satisfied emp 0.0223 exp 0.0212 diff 0.0011 lambda 5.5534
[main] INFO edu.stanford.nlp.tagger.maxent.LambdaSolveTagger - Constraint 37622 not satisfied emp 0.0115 exp 0.0105 diff 0.001 lambda 0.1018
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 30 evals 34 <D> [M 1.000E0] 4.561E4 31.34s |1.324E3| {2.133E-2} 4.600E-2 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 31: neg. log cond. likelihood = 43090.7500

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 54: neg. log cond. likelihood = 20998.87226596783 [61 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 54 evals 60 <D> [M 1.000E0] 2.100E4 55.48s |9.512E2| {1.532E-2} 3.054E-2 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 55: neg. log cond. likelihood = 20494.92109728354 [62 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 55 evals 61 <D> [M 1.000E0] 2.049E4 56.41s |4.498E2| {7.245E-3} 3.020E-2 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 56: neg. log cond. likelihood = 20115.9978558483 [63 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 56 evals 62 <D> [M 1.000E0] 2.012E4 57.34s |4.352E2| {7.009E-3} 2.853E-2 - 
[main] INFO CoreNLP - class edu.

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 79: neg. log cond. likelihood = 12567.199588450409 [89 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 79 evals 87 <D> [1M 4.029E-1] 1.257E4 81.34s |5.073E2| {8.170E-3} 2.073E-2 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 80: neg. log cond. likelihood = 12353.327543418922 [90 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 80 evals 89 <D> [M 1.000E0] 1.235E4 82.27s |4.112E2| {6.623E-3} 2.011E-2 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 81: neg. log cond. likelihood = 12183.322182492177 [91 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 81 evals 90 <D> [M 1.000E0] 1.218E4 83.20s |3.038E2| {4.893E-3} 1.819E-2 - 
[main] INFO CoreNLP - clas

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 104: neg. log cond. likelihood = 8549.513486297074 [116 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 104 evals 114 <D> [1M 4.116E-1] 8.550E3 106.71s |2.127E2| {3.426E-3} 1.234E-2 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 105: neg. log cond. likelihood = 8466.302040411068 [117 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 105 evals 116 <D> [M 1.000E0] 8.466E3 107.74s |1.259E2| {2.028E-3} 1.213E-2 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 106: neg. log cond. likelihood = 8371.34022434521 [118 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 106 evals 117 <D> [M 1.000E0] 8.371E3 108.75s |1.062E2| {1.711E-3} 1.166E-2 - 
[main] INFO Cor

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 129: neg. log cond. likelihood = 7051.506123680886 [143 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 129 evals 142 <D> [M 1.000E0] 7.052E3 132.11s |6.719E1| {1.082E-3} 5.493E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 130: neg. log cond. likelihood = 7001.768191984616 [144 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 130 evals 143 <D> [M 1.000E0] 7.002E3 133.06s |7.932E1| {1.277E-3} 5.676E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 131: neg. log cond. likelihood = 6960.475880093706 [145 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 131 evals 144 <D> [M 1.000E0] 6.960E3 133.99s |9.997E1| {1.610E-3} 5.607E-3 - 
[main] INFO Core

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 154: neg. log cond. likelihood = 6348.561781025876 [168 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 154 evals 167 <D> [M 1.000E0] 6.349E3 155.78s |6.486E1| {1.045E-3} 3.183E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 155: neg. log cond. likelihood = 6336.396247056182 [169 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 155 evals 168 <D> [M 1.000E0] 6.336E3 156.72s |1.277E2| {2.057E-3} 2.881E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 156: neg. log cond. likelihood = 6307.083411030467 [170 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 156 evals 169 <D> [M 1.000E0] 6.307E3 157.68s |5.102E1| {8.217E-4} 3.056E-3 - 
[main] INFO Core

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 179: neg. log cond. likelihood = 5983.500346466131 [194 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 179 evals 193 <D> [M 1.000E0] 5.984E3 180.14s |2.366E1| {3.811E-4} 1.786E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 180: neg. log cond. likelihood = 5973.074615945627 [195 calls to valueAt]
[main] INFO edu.stanford.nlp.tagger.maxent.LambdaSolveTagger - Checking model correctness; x size 177080 , ysize 19
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 180 evals 194 <D> [M 1.000E0] 5.973E3 181.06s |1.071E2| {1.724E-3} 1.810E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 181: neg. log cond. likelihood = 5954.270965541998 [196 calls to valueAt]
[main] INFO edu.stanford.nlp.optimizatio

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 204: neg. log cond. likelihood = 5763.33486066464 [221 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 204 evals 220 <D> [M 1.000E0] 5.763E3 204.83s |4.560E1| {7.345E-4} 1.123E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 205: neg. log cond. likelihood = 5757.204078983362 [222 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 205 evals 221 <D> [M 1.000E0] 5.757E3 205.75s |2.577E1| {4.151E-4} 1.083E-3 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 206: neg. log cond. likelihood = 5751.546173194247 [223 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 206 evals 222 <D> [M 1.000E0] 5.752E3 206.68s |4.535E1| {7.304E-4} 1.040E-3 - 
[main] INFO CoreN

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 229: neg. log cond. likelihood = 5655.7442228030095 [247 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 229 evals 246 <D> [M 1.000E0] 5.656E3 229.01s |1.273E1| {2.050E-4} 5.724E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 230: neg. log cond. likelihood = 5652.921182380569 [248 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 230 evals 247 <D> [M 1.000E0] 5.653E3 229.94s |1.477E1| {2.378E-4} 5.520E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 231: neg. log cond. likelihood = 5649.849517493466 [249 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 231 evals 248 <D> [M 1.000E0] 5.650E3 230.85s |5.544E1| {8.929E-4} 5.562E-4 - 
[main] INFO Cor

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 254: neg. log cond. likelihood = 5589.950578447234 [273 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 254 evals 272 <D> [M 1.000E0] 5.590E3 252.85s |1.272E1| {2.048E-4} 3.586E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 255: neg. log cond. likelihood = 5587.67576451437 [274 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 255 evals 273 <D> [M 1.000E0] 5.588E3 253.76s |1.281E1| {2.063E-4} 3.628E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 256: neg. log cond. likelihood = 5585.5302873813525 [275 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 256 evals 274 <D> [M 1.000E0] 5.586E3 254.68s |2.789E1| {4.492E-4} 3.558E-4 - 
[main] INFO Core

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 279: neg. log cond. likelihood = 5543.891790037346 [298 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 279 evals 297 <D> [M 1.000E0] 5.544E3 276.72s |1.431E1| {2.305E-4} 2.861E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 280: neg. log cond. likelihood = 5542.744534010457 [299 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 280 evals 298 <D> [M 1.000E0] 5.543E3 277.67s |1.861E1| {2.997E-4} 2.720E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 281: neg. log cond. likelihood = 5541.093162962464 [300 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 281 evals 299 <D> [M 1.000E0] 5.541E3 278.60s |1.456E1| {2.345E-4} 2.726E-4 - 
[main] INFO Core

[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 304: neg. log cond. likelihood = 5510.669477193124 [325 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 304 evals 324 <D> [M 1.000E0] 5.511E3 301.44s |1.139E1| {1.835E-4} 1.906E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 305: neg. log cond. likelihood = 5509.294600130982 [326 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 305 evals 325 <D> [M 1.000E0] 5.509E3 302.35s |1.588E1| {2.558E-4} 1.876E-4 - 
[main] INFO CoreNLP - class edu.stanford.nlp.maxent.CGRunner
[main] INFO edu.stanford.nlp.maxent.CGRunner - Iter. 306: neg. log cond. likelihood = 5508.804788552146 [327 calls to valueAt]
[main] INFO edu.stanford.nlp.optimization.QNMinimizer - Iter 306 evals 326 <D> [M 1.000E0] 5.509E3 303.27s |1.820E1| {2.931E-4} 1.851E-4 - 
[main] INFO Core

## Parser
Com o tagger treinado, podemos treinar o parser. Note que as ferramentas do CoreNLP não seguem sempre o mesmo padrão de configuração: enquanto o POS tagger usa um arquivo com os parâmetros, para o parser de dependências é tudo via linha de comando

In [None]:
!java -cp "javanlp-core.jar:stanford-corenlp-full-2018-10-05/*" -Xmx4g  edu.stanford.nlp.parser.nndep.DependencyParser -trainFile pt-ud-train_sem_metadados.conllu -embeddingSize 600 -model portuguese-ud-dep.sp

In [None]:
!java -cp "javanlp-core.jar:stanford-corenlp-full-2018-10-05/*" edu.stanford.nlp.parser.nndep.DependencyParser -model portugue-dep-parser -testFile /path/to/stanford-corenlp-full-2018-10-05/bosque2.5/pt-ud-test.conllu

Para rodar o parser com um texto qualquer, é só executar o comando:

In [None]:
!java -cp "javanlp-core.jar:/path/to/stanford-corenlp-full-2018-10-05/*" edu.stanford.nlp.parser.nndep.DependencyParser -model portuguese-dep-parser -tagger.model portuguese-ud.tagger -textFile arquivo.txt