## Avaliação por meio de um método de aprendizado de máquina

Os embeddings podem oferecer uma informação de proximidade de conceitos que o uso de Bag of Words não seria capaz. Mesmo assim, cada representação e preprocessamento tem sua vantagem e desvantagem e não existe um método que será sempre o melhor. Assim, para sabermos qual representação é melhor para uma tarefa, é importante avaliarmos em quais delas são maiores para a tarefa em questão. Como o foco desta prática não é a avaliação, iremos apenas apresentar o resultado, caso queira, você pode [assistir a video aula](https://www.youtube.com/watch?v=Ag06UuWTsr4&list=PLwIaU1DGYV6tUx10fCTw5aPnqypbbK_GJ&index=12) e [fazer a prática sobre avaliação](https://github.com/daniel-hasan/ap-de-maquina-cefetmg-avaliacao/archive/master.zip). Nesta parte, iremos apenas usar a avaliação para verificar qual método é melhor.  

Para que esta seção seja auto contida, iremos fazer toda a preparação que fizemos nas seções anteriores

**Criação da lista de stopwords e de vocabulário:**

In [1]:
# reset para liberar memória
%reset -f
# fim do reset para liberar memória

from embeddings.utils import get_embedding, KDTreeEmbedding

emotion_words = {
                    "pride":{"proud"},
                    "elation":{"ecstatic", "euphoria", "exaltation", "exhilarating"},#vs boredom
                    "happiness":{"joy","cheer", "bliss", "delight", "enjoy", "happy"},#vs sad
                    "satisfaction":{"comfortable","contentment"},#
                    "relief":{},
                    "hope":{"buoyancy", "confident", "faith", "optimistic"},
                    "interest":{"alert", "animation", "ardor", "curious","enthusiasm"},
                    "surprise":{"amazed", "astonishing", "dumbfounded","thunderstruck"},
                    "anxiety":{"anguish","anxiety","apprehensive","jittery","nervous","worry"},
                    "sadness":{"chagrin", "dejected", "gloom", "hopeless", "melancholy", "sad", "tear"},
                    "boredom":{"ennui","indifference","tedious"},
                    "shame":{"abashed", "ashamed", "embarrassing", "humiliating"},
                    "guilt":{"blame", "contrition", "remorse"},
                    "disgust":{"abhor", "aversion", "dislike", "disrelish", "nausea","sick"},
                    "contempt":{"denigration","depreciate","derision","disdain","scorn"},
                    "hostile":{},
                    "anger":{"anger","angry","furious","fury","incense","infuriating",
                                "mad","rage","resent","temper","wrath"},
                    "recognition":{"respect","acknowledgement"}
            }
dict_embedding = get_embedding("glove.pt.100.txt") 
kdtree_embedding = KDTreeEmbedding(dict_embedding, "kdt_pt.p")

#obtem as stopwords
stop_words = set()
with open("datasets/stopwords.txt") as stop_file:
    stop_words = set(stop_word[:-1] for stop_word in stop_file)


#palavras chaves a serem consideradas
set_vocabulary = set()
for key_word, arr_related_words in emotion_words.items():
    set_vocabulary.add(key_word)
    set_vocabulary = set_vocabulary | set(arr_related_words)

#kdtree - para gerar o conjunto com palavras chaves e suas similares
vocabulary_expanded = []
for word in set_vocabulary:
    _, words = kdtree_embedding.get_most_similar_embedding(word,60)
    vocabulary_expanded.extend(words)
vocabulary_expanded = set(vocabulary_expanded)

LInha com erro: 'afeta -0.536855 -0.007495 -0.013442 0.010075 -0.431695 -0.954242 -0.568022 0.298830 0.206329 0.221990 0.448505 0.324589 0.155598 -0.434498 -0.038841 0.351460 -0.219903 0.292218 -0.116971 0.102685 0.944467 0.388329 -0.330937 -0.755884 -0.164395 0.377288 -0.361163 -0.915998 0.161222 0.827306 -0.284279 0.053623 -0.500227 0.372490 -0.171850 -0.247056 0.115936 -0.017340 -0.118077 -0.008613 0.009058 -0.344892 0.526107 0.021267 0.123609 0.112071 0.277755 -0.655675 0.056385 -0.489364 -0.011241 -0.068256 -0.050418 0.283620 1.146130 -1.045703 0.120836 0.311448 -0.007991 -0.395445 -0.616343 -0.102998 0.801631 0.035789 0.522152 -0.000360 0.081070 0.359324 0.164685 0.103358 -0.434422 -0.047618 0.685093 -0.245462 0.899385 0.430083 -0.097732 -0.991104 0.267290 0.055047 0.469607 -0.454359 -0.206270 -0.075901 -0.702083 -0.149101 0.101842 -0.126275 0.175566 -0.050471 -0.131559 0.382135 -0.021810 -0.609549 0.137217 -0.443079 -0.909590 -0.520087 0.576502 -0.078572
'
Palavras ignoradas: 2




**Representações usadas**:Iremos avaliar a filtragem de stopwords e usando um vocabulário restrito da representação bag of words e também da representação usando a média de embeddings.

In [2]:
from embeddings.textual_representation import BagOfWords, AggregateEmbeddings,InstanceWisePreprocess

#gera as representações
aggregate = AggregateEmbeddings(dict_embedding, "avg")
embedding = InstanceWisePreprocess("embbeding",aggregate)

aggregate_stop = AggregateEmbeddings(dict_embedding, "avg",words_to_filter=stop_words)
emb_nostop = InstanceWisePreprocess("emb_nostop",aggregate_stop)


aggregate_keywords_exp = AggregateEmbeddings(dict_embedding, "avg",words_to_consider=vocabulary_expanded)
emb_keywords_exp = InstanceWisePreprocess("emb_keywords_exp",aggregate_keywords_exp)

bow_keywords = BagOfWords("bow_keywords_exp", words_to_consider=vocabulary_expanded)
bow = BagOfWords("bow", stop_words=stop_words)

arr_representations = [embedding,emb_nostop, emb_keywords_exp, bow,bow_keywords]

In [3]:
import pandas as pd
df_hate_speech = pd.read_csv("2019-05-28_portuguese_hate_speech_hierarchical_classification.csv",delimiter=",")

Abaixo, é executado um método de aprendizado  para cada representação. Esse processo pode demorar um pouco pois é feito a procura do melhor parametro do algoritmo. Algumas otimizações que talvez, você precise fazer é no arquivo `embedding/avaliacao_embedding.py` alterar o parametro `n_jobs` no método `obtem_metodo` da classe `OtimizacaoObjetivoRandomForest`. Esse parametro é responsável por utiizar mais threads ao executar o Random Forests.  O valor pode ser levemente inferior a quantidades de núcleos que seu computador tem, caso ele tenha mais de 2, caso contrário, o ideal é colocarmos `n_jobs=1`. Caso queira visualizar resultados mais rapidamente, diminua o valor da variável `num_trials` e `num_folds` abaixo. Atenção que `num_folds` deve ser um valor maior que um.

In [4]:
import pandas as pd
import optuna
from embeddings.avaliacao_embedding import calcula_experimento_representacao, OtimizacaoObjetivoRandomForest

# Método de aprendizado de máquina a ser usado
dict_metodo = {"random_forest":{"classe_otimizacao":OtimizacaoObjetivoRandomForest,
                                "sampler":optuna.samplers.TPESampler(seed=1, n_startup_trials=10)},
              }
df_amazon_reviews = pd.read_csv("2019-05-28_portuguese_hate_speech_hierarchical_classification.csv",delimiter=",")

#executa experimento com a representacao determinada e o método
for metodo, param_metodo in dict_metodo.items():
    for representation in arr_representations:
        print(f"===== Representação: {representation.nome}")
        col_classe = "Hate.speech"
        num_folds = 5
        num_folds_validacao = 3
        num_trials = 100


        nom_experimento = f"{metodo}_"+representation.nome
        experimento = calcula_experimento_representacao(nom_experimento,representation,df_amazon_reviews,
                                            col_classe,num_folds,num_folds_validacao,num_trials,
                                            ClasseObjetivoOtimizacao=param_metodo['classe_otimizacao'],
                                                sampler=param_metodo['sampler'])
        print(f"Representação: {representation.nome} concluida")

===== Representação: embbeding


[32m[I 2021-08-31 04:39:57,905][0m A new study created in RDB with name: random_forest_embbeding_fold_0[0m
[32m[I 2021-08-31 04:41:02,477][0m Trial 0 finished with value: 0.46395400084146643 and parameters: {'min_samples_split': 9, 'max_features': 95, 'num_arvores': 30}. Best is trial 0 with value: 0.46395400084146643.[0m
[32m[I 2021-08-31 04:42:06,026][0m Trial 1 finished with value: 0.4696268945870831 and parameters: {'min_samples_split': 7, 'max_features': 75, 'num_arvores': 30}. Best is trial 1 with value: 0.4696268945870831.[0m
[32m[I 2021-08-31 04:43:11,512][0m Trial 2 finished with value: 0.4940078750901093 and parameters: {'min_samples_split': 5, 'max_features': 80, 'num_arvores': 35}. Best is trial 2 with value: 0.4940078750901093.[0m
[32m[I 2021-08-31 04:44:16,332][0m Trial 3 finished with value: 0.44886101768449277 and parameters: {'min_samples_split': 11, 'max_features': 80, 'num_arvores': 45}. Best is trial 2 with value: 0.4940078750901093.[0m
[32m[I 2021-0

[32m[I 2021-08-31 05:11:15,516][0m Trial 27 finished with value: 0.5317898990674448 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:11:39,525][0m Trial 28 finished with value: 0.5801268222615478 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:12:02,078][0m Trial 29 finished with value: 0.45890047750861074 and parameters: {'min_samples_split': 9, 'max_features': 75, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:12:28,316][0m Trial 30 finished with value: 0.45004571995493475 and parameters: {'min_samples_split': 11, 'max_features': 70, 'num_arvores': 50}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:12:56,063][0m Trial 31 finished with value: 0.5801268222615478 and parameters: {'min_samples_s

[32m[I 2021-08-31 05:26:09,525][0m Trial 59 finished with value: 0.4416826250932426 and parameters: {'min_samples_split': 17, 'max_features': 70, 'num_arvores': 40}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:26:34,645][0m Trial 60 finished with value: 0.4691909945387964 and parameters: {'min_samples_split': 7, 'max_features': 70, 'num_arvores': 40}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:27:00,925][0m Trial 61 finished with value: 0.5834263121791105 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:27:26,989][0m Trial 62 finished with value: 0.5820114387527547 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 40}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:27:51,042][0m Trial 63 finished with value: 0.5380602970826632 and parameters: {'min_samples_spl

[32m[I 2021-08-31 05:39:19,528][0m Trial 90 finished with value: 0.44387377073408324 and parameters: {'min_samples_split': 13, 'max_features': 70, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:39:45,967][0m Trial 91 finished with value: 0.5834263121791105 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:40:11,831][0m Trial 92 finished with value: 0.5834263121791105 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:40:37,896][0m Trial 93 finished with value: 0.5834263121791105 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 45}. Best is trial 12 with value: 0.5834263121791105.[0m
[32m[I 2021-08-31 05:41:03,754][0m Trial 94 finished with value: 0.5834263121791105 and parameters: {'min_samples_sp

0.5632093510979521


[32m[I 2021-08-31 05:43:52,465][0m Trial 0 finished with value: 0.5567696124734333 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 50}. Best is trial 0 with value: 0.5567696124734333.[0m
[32m[I 2021-08-31 05:44:15,843][0m Trial 1 finished with value: 0.44294751187994613 and parameters: {'min_samples_split': 11, 'max_features': 90, 'num_arvores': 35}. Best is trial 0 with value: 0.5567696124734333.[0m
[32m[I 2021-08-31 05:44:38,396][0m Trial 2 finished with value: 0.44019541185482064 and parameters: {'min_samples_split': 15, 'max_features': 95, 'num_arvores': 30}. Best is trial 0 with value: 0.5567696124734333.[0m
[32m[I 2021-08-31 05:45:02,894][0m Trial 3 finished with value: 0.44034071239243416 and parameters: {'min_samples_split': 17, 'max_features': 100, 'num_arvores': 45}. Best is trial 0 with value: 0.5567696124734333.[0m
[32m[I 2021-08-31 05:45:26,926][0m Trial 4 finished with value: 0.49799465503600376 and parameters: {'min_samples_split

[32m[I 2021-08-31 05:55:05,667][0m Trial 24 finished with value: 0.49527270327566164 and parameters: {'min_samples_split': 7, 'max_features': 90, 'num_arvores': 45}. Best is trial 10 with value: 0.5874529135920508.[0m
[32m[I 2021-08-31 05:55:33,555][0m Trial 25 finished with value: 0.5874529135920508 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 50}. Best is trial 10 with value: 0.5874529135920508.[0m
[32m[I 2021-08-31 05:56:02,094][0m Trial 26 finished with value: 0.5270021569316506 and parameters: {'min_samples_split': 5, 'max_features': 80, 'num_arvores': 45}. Best is trial 10 with value: 0.5874529135920508.[0m
[32m[I 2021-08-31 05:56:33,062][0m Trial 27 finished with value: 0.5591365191474865 and parameters: {'min_samples_split': 3, 'max_features': 90, 'num_arvores': 50}. Best is trial 10 with value: 0.5874529135920508.[0m
[32m[I 2021-08-31 05:57:03,997][0m Trial 28 finished with value: 0.587291150681335 and parameters: {'min_samples_spli

[32m[I 2021-08-31 06:06:36,669][0m Trial 49 finished with value: 0.5842419355876779 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 30}. Best is trial 44 with value: 0.5889817376578631.[0m
[32m[I 2021-08-31 06:07:05,247][0m Trial 50 finished with value: 0.549615263666075 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 40}. Best is trial 44 with value: 0.5889817376578631.[0m
[32m[I 2021-08-31 06:07:34,618][0m Trial 51 finished with value: 0.5889817376578631 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 50}. Best is trial 44 with value: 0.5889817376578631.[0m
[32m[I 2021-08-31 06:08:04,356][0m Trial 52 finished with value: 0.5828667593240708 and parameters: {'min_samples_split': 1, 'max_features': 75, 'num_arvores': 50}. Best is trial 44 with value: 0.5889817376578631.[0m
[32m[I 2021-08-31 06:08:33,710][0m Trial 53 finished with value: 0.5889817376578631 and parameters: {'min_samples_split

[32m[I 2021-08-31 06:23:15,586][0m Trial 84 finished with value: 0.5587396077705827 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 50}. Best is trial 79 with value: 0.5935908236045333.[0m
[32m[I 2021-08-31 06:23:44,055][0m Trial 85 finished with value: 0.5587396077705827 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 50}. Best is trial 79 with value: 0.5935908236045333.[0m
[32m[I 2021-08-31 06:24:13,845][0m Trial 86 finished with value: 0.5917586787608754 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 50}. Best is trial 79 with value: 0.5935908236045333.[0m
[32m[I 2021-08-31 06:24:42,857][0m Trial 87 finished with value: 0.5917586787608754 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 50}. Best is trial 79 with value: 0.5935908236045333.[0m
[32m[I 2021-08-31 06:25:07,939][0m Trial 88 finished with value: 0.44034071239243416 and parameters: {'min_samples

0.583564869612888


[32m[I 2021-08-31 06:31:10,111][0m Trial 0 finished with value: 0.5492851222904879 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 45}. Best is trial 0 with value: 0.5492851222904879.[0m
[32m[I 2021-08-31 06:31:33,611][0m Trial 1 finished with value: 0.45560194004109017 and parameters: {'min_samples_split': 9, 'max_features': 70, 'num_arvores': 40}. Best is trial 0 with value: 0.5492851222904879.[0m
[32m[I 2021-08-31 06:31:57,991][0m Trial 2 finished with value: 0.43954852776107933 and parameters: {'min_samples_split': 15, 'max_features': 85, 'num_arvores': 50}. Best is trial 0 with value: 0.5492851222904879.[0m
[32m[I 2021-08-31 06:32:20,656][0m Trial 3 finished with value: 0.44359811734522275 and parameters: {'min_samples_split': 13, 'max_features': 100, 'num_arvores': 30}. Best is trial 0 with value: 0.5492851222904879.[0m
[32m[I 2021-08-31 06:32:45,775][0m Trial 4 finished with value: 0.5533158189480941 and parameters: {'min_samples_split':

[32m[I 2021-08-31 06:42:29,749][0m Trial 27 finished with value: 0.5382465172935801 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 50}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 06:42:52,544][0m Trial 28 finished with value: 0.438580837094785 and parameters: {'min_samples_split': 21, 'max_features': 80, 'num_arvores': 45}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 06:43:18,558][0m Trial 29 finished with value: 0.5492851222904879 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 45}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 06:43:44,873][0m Trial 30 finished with value: 0.5747915398504674 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 40}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 06:44:13,218][0m Trial 31 finished with value: 0.5768451782613814 and parameters: {'min_samples_spli

[32m[I 2021-08-31 06:57:54,092][0m Trial 51 finished with value: 0.5768451782613814 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 40}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 06:59:10,622][0m Trial 52 finished with value: 0.5768451782613814 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 40}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 07:00:12,370][0m Trial 53 finished with value: 0.571390389844473 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 35}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 07:01:22,435][0m Trial 54 finished with value: 0.5768451782613814 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 40}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 07:02:26,435][0m Trial 55 finished with value: 0.5454402313329401 and parameters: {'min_samples_split

[32m[I 2021-08-31 07:38:10,699][0m Trial 89 finished with value: 0.5433183388595267 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 30}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 07:39:11,642][0m Trial 90 finished with value: 0.5792105813010731 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 30}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 07:40:09,032][0m Trial 91 finished with value: 0.5792105813010731 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 30}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 07:41:04,942][0m Trial 92 finished with value: 0.5792105813010731 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 30}. Best is trial 17 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 07:42:13,994][0m Trial 93 finished with value: 0.5792105813010731 and parameters: {'min_samples_spli

0.5843185987336529


[32m[I 2021-08-31 07:50:00,842][0m A new study created in RDB with name: random_forest_embbeding_fold_3[0m
[32m[I 2021-08-31 07:51:21,795][0m Trial 0 finished with value: 0.5561030060367801 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 40}. Best is trial 0 with value: 0.5561030060367801.[0m
[32m[I 2021-08-31 07:52:20,077][0m Trial 1 finished with value: 0.44212646280614926 and parameters: {'min_samples_split': 13, 'max_features': 80, 'num_arvores': 35}. Best is trial 0 with value: 0.5561030060367801.[0m
[32m[I 2021-08-31 07:53:32,043][0m Trial 2 finished with value: 0.441062013640106 and parameters: {'min_samples_split': 19, 'max_features': 90, 'num_arvores': 30}. Best is trial 0 with value: 0.5561030060367801.[0m
[32m[I 2021-08-31 07:54:42,943][0m Trial 3 finished with value: 0.4431331120155457 and parameters: {'min_samples_split': 13, 'max_features': 80, 'num_arvores': 40}. Best is trial 0 with value: 0.5561030060367801.[0m
[32m[I 2021-0

[32m[I 2021-08-31 08:26:47,905][0m Trial 32 finished with value: 0.5746000838428167 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 08:28:04,684][0m Trial 33 finished with value: 0.540670330866042 and parameters: {'min_samples_split': 3, 'max_features': 75, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 08:29:19,211][0m Trial 34 finished with value: 0.5746000838428167 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 08:30:40,984][0m Trial 35 finished with value: 0.519749598174898 and parameters: {'min_samples_split': 5, 'max_features': 80, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 08:31:54,632][0m Trial 36 finished with value: 0.441062013640106 and parameters: {'min_samples_split':

[32m[I 2021-08-31 09:59:58,522][0m Trial 63 finished with value: 0.5884486312972249 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 10:00:32,445][0m Trial 64 finished with value: 0.5561030060367801 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 10:01:05,896][0m Trial 65 finished with value: 0.5884486312972249 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 10:01:42,741][0m Trial 66 finished with value: 0.5551069076357159 and parameters: {'min_samples_split': 3, 'max_features': 95, 'num_arvores': 40}. Best is trial 11 with value: 0.5884486312972249.[0m
[32m[I 2021-08-31 10:02:19,282][0m Trial 67 finished with value: 0.5208116971778392 and parameters: {'min_samples_s

0.5807189085616072


[32m[I 2021-08-31 10:20:26,369][0m Trial 0 finished with value: 0.5741446398335216 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 30}. Best is trial 0 with value: 0.5741446398335216.[0m
[32m[I 2021-08-31 10:20:58,245][0m Trial 1 finished with value: 0.509875427519041 and parameters: {'min_samples_split': 5, 'max_features': 100, 'num_arvores': 40}. Best is trial 0 with value: 0.5741446398335216.[0m
[32m[I 2021-08-31 10:21:26,042][0m Trial 2 finished with value: 0.43950110122050196 and parameters: {'min_samples_split': 13, 'max_features': 95, 'num_arvores': 30}. Best is trial 0 with value: 0.5741446398335216.[0m
[32m[I 2021-08-31 10:22:00,028][0m Trial 3 finished with value: 0.47756347816954875 and parameters: {'min_samples_split': 7, 'max_features': 90, 'num_arvores': 50}. Best is trial 0 with value: 0.5741446398335216.[0m
[32m[I 2021-08-31 10:22:26,332][0m Trial 4 finished with value: 0.43964259606286177 and parameters: {'min_samples_split': 1

[32m[I 2021-08-31 10:32:14,798][0m Trial 24 finished with value: 0.4848240881952508 and parameters: {'min_samples_split': 7, 'max_features': 80, 'num_arvores': 35}. Best is trial 11 with value: 0.5789061522644361.[0m
[32m[I 2021-08-31 10:32:47,161][0m Trial 25 finished with value: 0.5055588723080777 and parameters: {'min_samples_split': 5, 'max_features': 75, 'num_arvores': 35}. Best is trial 11 with value: 0.5789061522644361.[0m
[32m[I 2021-08-31 10:33:18,184][0m Trial 26 finished with value: 0.5812845932288232 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 30}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 10:33:45,899][0m Trial 27 finished with value: 0.48294905327674104 and parameters: {'min_samples_split': 7, 'max_features': 90, 'num_arvores': 35}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 10:34:18,563][0m Trial 28 finished with value: 0.545549454629798 and parameters: {'min_samples_spli

[32m[I 2021-08-31 10:51:51,948][0m Trial 58 finished with value: 0.4385366871188751 and parameters: {'min_samples_split': 17, 'max_features': 85, 'num_arvores': 45}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 10:52:24,713][0m Trial 59 finished with value: 0.5377704788258647 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 30}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 10:53:03,142][0m Trial 60 finished with value: 0.5812845932288232 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 30}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 10:53:40,863][0m Trial 61 finished with value: 0.5812845932288232 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 30}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 10:54:27,281][0m Trial 62 finished with value: 0.5812845932288232 and parameters: {'min_samples_spl

[32m[I 2021-08-31 11:08:59,016][0m Trial 87 finished with value: 0.5389634776385721 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 30}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 11:09:41,947][0m Trial 88 finished with value: 0.5696801869542932 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 45}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 11:10:17,138][0m Trial 89 finished with value: 0.5812845932288232 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 30}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 11:10:46,686][0m Trial 90 finished with value: 0.5389634776385721 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 30}. Best is trial 26 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 11:11:23,604][0m Trial 91 finished with value: 0.5812845932288232 and parameters: {'min_samples_spli

0.6035348360655737
Representação: embbeding concluida
===== Representação: emb_nostop


[32m[I 2021-08-31 11:15:58,277][0m A new study created in RDB with name: random_forest_emb_nostop_fold_0[0m
[32m[I 2021-08-31 11:16:34,697][0m Trial 0 finished with value: 0.5748437263223859 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 40}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08-31 11:17:05,686][0m Trial 1 finished with value: 0.4927071341493232 and parameters: {'min_samples_split': 5, 'max_features': 75, 'num_arvores': 45}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08-31 11:17:39,529][0m Trial 2 finished with value: 0.4918501434390281 and parameters: {'min_samples_split': 5, 'max_features': 90, 'num_arvores': 50}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08-31 11:18:07,531][0m Trial 3 finished with value: 0.4416826250932426 and parameters: {'min_samples_split': 19, 'max_features': 75, 'num_arvores': 40}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08

[32m[I 2021-08-31 11:25:57,269][0m Trial 18 finished with value: 0.5378833585550059 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 40}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08-31 11:26:39,614][0m Trial 19 finished with value: 0.45170501639734423 and parameters: {'min_samples_split': 9, 'max_features': 95, 'num_arvores': 45}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08-31 11:27:20,371][0m Trial 20 finished with value: 0.5419619157090386 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 50}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08-31 11:27:59,533][0m Trial 21 finished with value: 0.5739642091292675 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 50}. Best is trial 0 with value: 0.5748437263223859.[0m
[32m[I 2021-08-31 11:28:40,434][0m Trial 22 finished with value: 0.5739642091292675 and parameters: {'min_samples_split

[32m[I 2021-08-31 11:44:24,843][0m Trial 48 finished with value: 0.44059943711237165 and parameters: {'min_samples_split': 21, 'max_features': 70, 'num_arvores': 40}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 11:44:56,347][0m Trial 49 finished with value: 0.5249522913275675 and parameters: {'min_samples_split': 3, 'max_features': 75, 'num_arvores': 35}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 11:45:30,286][0m Trial 50 finished with value: 0.4707567420813262 and parameters: {'min_samples_split': 7, 'max_features': 95, 'num_arvores': 30}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 11:46:15,066][0m Trial 51 finished with value: 0.5739642091292675 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 50}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 11:46:56,687][0m Trial 52 finished with value: 0.5739642091292675 and parameters: {'min_samples_s

[32m[I 2021-08-31 12:03:12,676][0m Trial 79 finished with value: 0.530516663369042 and parameters: {'min_samples_split': 3, 'max_features': 75, 'num_arvores': 45}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 12:03:44,208][0m Trial 80 finished with value: 0.5627805128631164 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 50}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 12:04:16,054][0m Trial 81 finished with value: 0.5755153886212588 and parameters: {'min_samples_split': 1, 'max_features': 75, 'num_arvores': 50}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 12:04:47,411][0m Trial 82 finished with value: 0.5755153886212588 and parameters: {'min_samples_split': 1, 'max_features': 75, 'num_arvores': 50}. Best is trial 25 with value: 0.5762787775786111.[0m
[32m[I 2021-08-31 12:05:16,826][0m Trial 83 finished with value: 0.5344920207686505 and parameters: {'min_samples_split

0.5722836493634672


[32m[I 2021-08-31 12:14:17,420][0m Trial 0 finished with value: 0.44676444847002483 and parameters: {'min_samples_split': 13, 'max_features': 75, 'num_arvores': 45}. Best is trial 0 with value: 0.44676444847002483.[0m
[32m[I 2021-08-31 12:14:51,493][0m Trial 1 finished with value: 0.5888862873992109 and parameters: {'min_samples_split': 1, 'max_features': 75, 'num_arvores': 50}. Best is trial 1 with value: 0.5888862873992109.[0m
[32m[I 2021-08-31 12:15:27,197][0m Trial 2 finished with value: 0.52987069691204 and parameters: {'min_samples_split': 5, 'max_features': 90, 'num_arvores': 40}. Best is trial 1 with value: 0.5888862873992109.[0m
[32m[I 2021-08-31 12:15:54,194][0m Trial 3 finished with value: 0.43949396169147553 and parameters: {'min_samples_split': 21, 'max_features': 75, 'num_arvores': 30}. Best is trial 1 with value: 0.5888862873992109.[0m
[32m[I 2021-08-31 12:16:23,234][0m Trial 4 finished with value: 0.43949396169147553 and parameters: {'min_samples_split': 1

[32m[I 2021-08-31 12:21:40,017][0m Trial 15 finished with value: 0.5833171092673157 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 35}. Best is trial 1 with value: 0.5888862873992109.[0m
[32m[I 2021-08-31 12:22:05,213][0m Trial 16 finished with value: 0.4727006202524868 and parameters: {'min_samples_split': 9, 'max_features': 90, 'num_arvores': 35}. Best is trial 1 with value: 0.5888862873992109.[0m
[32m[I 2021-08-31 12:22:35,250][0m Trial 17 finished with value: 0.5588672980695312 and parameters: {'min_samples_split': 3, 'max_features': 90, 'num_arvores': 45}. Best is trial 1 with value: 0.5888862873992109.[0m
[32m[I 2021-08-31 12:23:01,072][0m Trial 18 finished with value: 0.4933636733455328 and parameters: {'min_samples_split': 7, 'max_features': 85, 'num_arvores': 30}. Best is trial 1 with value: 0.5888862873992109.[0m
[32m[I 2021-08-31 12:23:28,679][0m Trial 19 finished with value: 0.5569349380276978 and parameters: {'min_samples_split': 

[32m[I 2021-08-31 13:32:44,551][0m Trial 43 finished with value: 0.5908324264844841 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 30}. Best is trial 23 with value: 0.5908324264844841.[0m
[32m[I 2021-08-31 13:33:14,767][0m Trial 44 finished with value: 0.562673122058338 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 30}. Best is trial 23 with value: 0.5908324264844841.[0m
[32m[I 2021-08-31 13:33:42,997][0m Trial 45 finished with value: 0.5522966067658381 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 35}. Best is trial 23 with value: 0.5908324264844841.[0m
[32m[I 2021-08-31 13:34:10,664][0m Trial 46 finished with value: 0.5908324264844841 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 30}. Best is trial 23 with value: 0.5908324264844841.[0m
[32m[I 2021-08-31 13:34:42,623][0m Trial 47 finished with value: 0.5833121525408699 and parameters: {'min_samples_split

[32m[I 2021-08-31 13:45:20,332][0m Trial 68 finished with value: 0.5632297703103532 and parameters: {'min_samples_split': 3, 'max_features': 95, 'num_arvores': 35}. Best is trial 56 with value: 0.5993704981904535.[0m
[32m[I 2021-08-31 13:46:12,176][0m Trial 69 finished with value: 0.5907704394109846 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 40}. Best is trial 56 with value: 0.5993704981904535.[0m
[32m[I 2021-08-31 13:46:56,799][0m Trial 70 finished with value: 0.5931901268807919 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 35}. Best is trial 56 with value: 0.5993704981904535.[0m
[32m[I 2021-08-31 13:47:31,533][0m Trial 71 finished with value: 0.5931901268807919 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 35}. Best is trial 56 with value: 0.5993704981904535.[0m
[32m[I 2021-08-31 13:48:08,936][0m Trial 72 finished with value: 0.5931901268807919 and parameters: {'min_samples_s

[32m[I 2021-08-31 14:03:42,561][0m A new study created in RDB with name: random_forest_emb_nostop_fold_2[0m


0.5692207908391483


[32m[I 2021-08-31 14:04:10,791][0m Trial 0 finished with value: 0.4849982286302021 and parameters: {'min_samples_split': 9, 'max_features': 100, 'num_arvores': 30}. Best is trial 0 with value: 0.4849982286302021.[0m
[32m[I 2021-08-31 14:04:38,739][0m Trial 1 finished with value: 0.5462305706356979 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 40}. Best is trial 1 with value: 0.5462305706356979.[0m
[32m[I 2021-08-31 14:05:14,355][0m Trial 2 finished with value: 0.5799850951042554 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 50}. Best is trial 2 with value: 0.5799850951042554.[0m
[32m[I 2021-08-31 14:05:40,776][0m Trial 3 finished with value: 0.5785830648620574 and parameters: {'min_samples_split': 1, 'max_features': 75, 'num_arvores': 35}. Best is trial 2 with value: 0.5799850951042554.[0m
[32m[I 2021-08-31 14:06:09,206][0m Trial 4 finished with value: 0.559507752765058 and parameters: {'min_samples_split': 3, '

[32m[I 2021-08-31 14:15:05,415][0m Trial 22 finished with value: 0.5487223051500804 and parameters: {'min_samples_split': 3, 'max_features': 75, 'num_arvores': 45}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:15:44,619][0m Trial 23 finished with value: 0.5739009852020913 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 50}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:16:17,709][0m Trial 24 finished with value: 0.5462305706356979 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 40}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:16:49,122][0m Trial 25 finished with value: 0.49633855805790555 and parameters: {'min_samples_split': 7, 'max_features': 80, 'num_arvores': 45}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:17:19,725][0m Trial 26 finished with value: 0.5823841366666848 and parameters: {'min_samples_spl

[32m[I 2021-08-31 14:33:58,909][0m Trial 57 finished with value: 0.439601282027515 and parameters: {'min_samples_split': 19, 'max_features': 85, 'num_arvores': 40}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:34:33,972][0m Trial 58 finished with value: 0.5831104899022752 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 45}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:35:03,640][0m Trial 59 finished with value: 0.5538141003069318 and parameters: {'min_samples_split': 3, 'max_features': 90, 'num_arvores': 45}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:35:32,082][0m Trial 60 finished with value: 0.5322732884073242 and parameters: {'min_samples_split': 5, 'max_features': 90, 'num_arvores': 35}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:36:03,524][0m Trial 61 finished with value: 0.5831104899022752 and parameters: {'min_samples_spli

[32m[I 2021-08-31 14:51:35,866][0m Trial 88 finished with value: 0.5834526696867123 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 40}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:52:08,743][0m Trial 89 finished with value: 0.5834526696867123 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 40}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:52:40,314][0m Trial 90 finished with value: 0.5193916905475938 and parameters: {'min_samples_split': 5, 'max_features': 80, 'num_arvores': 40}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:53:14,025][0m Trial 91 finished with value: 0.5834526696867123 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 40}. Best is trial 16 with value: 0.5834526696867123.[0m
[32m[I 2021-08-31 14:53:47,193][0m Trial 92 finished with value: 0.5834526696867123 and parameters: {'min_samples_spli

0.6033626670092498


[32m[I 2021-08-31 14:57:51,488][0m Trial 0 finished with value: 0.49042906276879145 and parameters: {'min_samples_split': 7, 'max_features': 100, 'num_arvores': 40}. Best is trial 0 with value: 0.49042906276879145.[0m
[32m[I 2021-08-31 14:58:21,582][0m Trial 1 finished with value: 0.51566632352323 and parameters: {'min_samples_split': 5, 'max_features': 95, 'num_arvores': 45}. Best is trial 1 with value: 0.51566632352323.[0m
[32m[I 2021-08-31 14:58:51,338][0m Trial 2 finished with value: 0.5818056936202214 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 45}. Best is trial 2 with value: 0.5818056936202214.[0m
[32m[I 2021-08-31 14:59:15,827][0m Trial 3 finished with value: 0.43892716427737327 and parameters: {'min_samples_split': 21, 'max_features': 70, 'num_arvores': 50}. Best is trial 2 with value: 0.5818056936202214.[0m
[32m[I 2021-08-31 14:59:46,583][0m Trial 4 finished with value: 0.4665596916413899 and parameters: {'min_samples_split': 9, '

[32m[I 2021-08-31 15:44:33,659][0m Trial 28 finished with value: 0.5533673965722441 and parameters: {'min_samples_split': 3, 'max_features': 75, 'num_arvores': 50}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 15:45:36,959][0m Trial 29 finished with value: 0.48766559830861017 and parameters: {'min_samples_split': 7, 'max_features': 80, 'num_arvores': 40}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 15:46:47,958][0m Trial 30 finished with value: 0.46329033823507987 and parameters: {'min_samples_split': 9, 'max_features': 75, 'num_arvores': 40}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 15:48:03,060][0m Trial 31 finished with value: 0.5779283070421088 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 35}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 15:49:07,185][0m Trial 32 finished with value: 0.573710549806277 and parameters: {'min_samples_spl

[32m[I 2021-08-31 16:20:41,382][0m Trial 61 finished with value: 0.5779283070421088 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 35}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 16:21:53,850][0m Trial 62 finished with value: 0.5779283070421088 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 35}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 16:23:00,315][0m Trial 63 finished with value: 0.5779283070421088 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 35}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 16:24:03,760][0m Trial 64 finished with value: 0.546134760680379 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 35}. Best is trial 26 with value: 0.5818337029729599.[0m
[32m[I 2021-08-31 16:25:09,038][0m Trial 65 finished with value: 0.5779283070421088 and parameters: {'min_samples_split

[32m[I 2021-08-31 17:00:37,255][0m Trial 97 finished with value: 0.5844937138375026 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 30}. Best is trial 96 with value: 0.5844937138375026.[0m
[32m[I 2021-08-31 17:01:37,310][0m Trial 98 finished with value: 0.44099015070043396 and parameters: {'min_samples_split': 19, 'max_features': 80, 'num_arvores': 30}. Best is trial 96 with value: 0.5844937138375026.[0m
[32m[I 2021-08-31 17:02:41,267][0m Trial 99 finished with value: 0.555802762546423 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 30}. Best is trial 96 with value: 0.5844937138375026.[0m


0.5711470188994466


[32m[I 2021-08-31 17:03:10,035][0m A new study created in RDB with name: random_forest_emb_nostop_fold_4[0m
[32m[I 2021-08-31 17:04:18,398][0m Trial 0 finished with value: 0.4658426095974549 and parameters: {'min_samples_split': 9, 'max_features': 80, 'num_arvores': 45}. Best is trial 0 with value: 0.4658426095974549.[0m
[32m[I 2021-08-31 17:05:25,365][0m Trial 1 finished with value: 0.46779371354466875 and parameters: {'min_samples_split': 9, 'max_features': 100, 'num_arvores': 45}. Best is trial 1 with value: 0.46779371354466875.[0m
[32m[I 2021-08-31 17:06:28,457][0m Trial 2 finished with value: 0.5172054534119953 and parameters: {'min_samples_split': 5, 'max_features': 80, 'num_arvores': 35}. Best is trial 2 with value: 0.5172054534119953.[0m
[32m[I 2021-08-31 17:07:28,859][0m Trial 3 finished with value: 0.4385366871188751 and parameters: {'min_samples_split': 17, 'max_features': 100, 'num_arvores': 50}. Best is trial 2 with value: 0.5172054534119953.[0m
[32m[I 2021

[32m[I 2021-08-31 17:23:45,089][0m Trial 18 finished with value: 0.4475168906024945 and parameters: {'min_samples_split': 11, 'max_features': 90, 'num_arvores': 40}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 17:24:51,824][0m Trial 19 finished with value: 0.48726564090042346 and parameters: {'min_samples_split': 7, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 17:26:05,342][0m Trial 20 finished with value: 0.5449511318003021 and parameters: {'min_samples_split': 3, 'max_features': 95, 'num_arvores': 40}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 17:27:15,408][0m Trial 21 finished with value: 0.5788311931757981 and parameters: {'min_samples_split': 1, 'max_features': 95, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 17:28:24,366][0m Trial 22 finished with value: 0.5449955174151663 and parameters: {'min_samples_split

[32m[I 2021-08-31 18:07:51,781][0m Trial 50 finished with value: 0.5170787598384212 and parameters: {'min_samples_split': 5, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:08:28,973][0m Trial 51 finished with value: 0.5817603255145944 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:09:02,076][0m Trial 52 finished with value: 0.5817603255145944 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:09:46,085][0m Trial 53 finished with value: 0.5449955174151663 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:10:25,048][0m Trial 54 finished with value: 0.5788311931757981 and parameters: {'min_samples_spli

[32m[I 2021-08-31 18:23:35,941][0m Trial 81 finished with value: 0.5817603255145944 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:24:08,190][0m Trial 82 finished with value: 0.5449955174151663 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:24:40,768][0m Trial 83 finished with value: 0.5817603255145944 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:25:11,264][0m Trial 84 finished with value: 0.5817603255145944 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 9 with value: 0.5817603255145944.[0m
[32m[I 2021-08-31 18:25:40,810][0m Trial 85 finished with value: 0.5817603255145944 and parameters: {'min_samples_spli

0.5956615153633092
Representação: emb_nostop concluida
===== Representação: emb_keywords_exp


[32m[I 2021-08-31 18:32:52,230][0m Trial 0 finished with value: 0.44387377073408324 and parameters: {'min_samples_split': 17, 'max_features': 100, 'num_arvores': 45}. Best is trial 0 with value: 0.44387377073408324.[0m
[32m[I 2021-08-31 18:33:16,767][0m Trial 1 finished with value: 0.44599138003818695 and parameters: {'min_samples_split': 13, 'max_features': 75, 'num_arvores': 40}. Best is trial 1 with value: 0.44599138003818695.[0m
[32m[I 2021-08-31 18:33:43,806][0m Trial 2 finished with value: 0.4726891781437161 and parameters: {'min_samples_split': 7, 'max_features': 75, 'num_arvores': 40}. Best is trial 2 with value: 0.4726891781437161.[0m
[32m[I 2021-08-31 18:34:13,804][0m Trial 3 finished with value: 0.4760556414786585 and parameters: {'min_samples_split': 7, 'max_features': 100, 'num_arvores': 50}. Best is trial 3 with value: 0.4760556414786585.[0m
[32m[I 2021-08-31 18:34:46,831][0m Trial 4 finished with value: 0.4894135904484093 and parameters: {'min_samples_split

[32m[I 2021-08-31 18:46:22,397][0m Trial 25 finished with value: 0.5747181668768114 and parameters: {'min_samples_split': 1, 'max_features': 95, 'num_arvores': 45}. Best is trial 15 with value: 0.578335926762438.[0m
[32m[I 2021-08-31 18:46:56,452][0m Trial 26 finished with value: 0.49442509371907656 and parameters: {'min_samples_split': 5, 'max_features': 95, 'num_arvores': 45}. Best is trial 15 with value: 0.578335926762438.[0m
[32m[I 2021-08-31 18:47:33,157][0m Trial 27 finished with value: 0.5439189078767585 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 45}. Best is trial 15 with value: 0.578335926762438.[0m
[32m[I 2021-08-31 18:48:04,530][0m Trial 28 finished with value: 0.4416826250932426 and parameters: {'min_samples_split': 21, 'max_features': 95, 'num_arvores': 45}. Best is trial 15 with value: 0.578335926762438.[0m
[32m[I 2021-08-31 18:48:37,019][0m Trial 29 finished with value: 0.44387377073408324 and parameters: {'min_samples_spli

[32m[I 2021-08-31 19:02:09,415][0m Trial 53 finished with value: 0.5788024360743207 and parameters: {'min_samples_split': 1, 'max_features': 95, 'num_arvores': 40}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:02:39,913][0m Trial 54 finished with value: 0.5419516770136591 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 40}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:03:09,143][0m Trial 55 finished with value: 0.5789103125122171 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 40}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:03:38,608][0m Trial 56 finished with value: 0.5769501170655121 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 35}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:04:04,906][0m Trial 57 finished with value: 0.539113896889931 and parameters: {'min_samples_spli

[32m[I 2021-08-31 19:18:47,404][0m Trial 88 finished with value: 0.5789103125122171 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 40}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:19:18,064][0m Trial 89 finished with value: 0.5412245713938093 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 40}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:19:46,534][0m Trial 90 finished with value: 0.5496252444633153 and parameters: {'min_samples_split': 3, 'max_features': 90, 'num_arvores': 40}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:20:14,355][0m Trial 91 finished with value: 0.5789103125122171 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 40}. Best is trial 37 with value: 0.5801268222615478.[0m
[32m[I 2021-08-31 19:21:13,178][0m Trial 92 finished with value: 0.5789103125122171 and parameters: {'min_samples_spli

0.5757752461391146


[32m[I 2021-08-31 19:31:08,142][0m Trial 0 finished with value: 0.5623688834556534 and parameters: {'min_samples_split': 3, 'max_features': 95, 'num_arvores': 35}. Best is trial 0 with value: 0.5623688834556534.[0m
[32m[I 2021-08-31 19:32:11,438][0m Trial 1 finished with value: 0.4509834609323633 and parameters: {'min_samples_split': 11, 'max_features': 95, 'num_arvores': 30}. Best is trial 0 with value: 0.5623688834556534.[0m
[32m[I 2021-08-31 19:33:23,477][0m Trial 2 finished with value: 0.5012956580987531 and parameters: {'min_samples_split': 7, 'max_features': 100, 'num_arvores': 50}. Best is trial 0 with value: 0.5623688834556534.[0m
[32m[I 2021-08-31 19:34:34,654][0m Trial 3 finished with value: 0.4475403920253566 and parameters: {'min_samples_split': 11, 'max_features': 100, 'num_arvores': 50}. Best is trial 0 with value: 0.5623688834556534.[0m
[32m[I 2021-08-31 19:35:37,147][0m Trial 4 finished with value: 0.44034071239243416 and parameters: {'min_samples_split': 

[32m[I 2021-08-31 19:52:30,695][0m Trial 19 finished with value: 0.4404858780522259 and parameters: {'min_samples_split': 21, 'max_features': 85, 'num_arvores': 30}. Best is trial 10 with value: 0.5902738641054041.[0m
[32m[I 2021-08-31 19:53:38,135][0m Trial 20 finished with value: 0.5445581425109167 and parameters: {'min_samples_split': 3, 'max_features': 75, 'num_arvores': 40}. Best is trial 10 with value: 0.5902738641054041.[0m
[32m[I 2021-08-31 19:54:56,070][0m Trial 21 finished with value: 0.5902738641054041 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 35}. Best is trial 10 with value: 0.5902738641054041.[0m
[32m[I 2021-08-31 19:56:13,099][0m Trial 22 finished with value: 0.5518359084534477 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 35}. Best is trial 10 with value: 0.5902738641054041.[0m
[32m[I 2021-08-31 19:57:25,663][0m Trial 23 finished with value: 0.5921510948186549 and parameters: {'min_samples_spl

[32m[I 2021-08-31 20:28:31,997][0m Trial 50 finished with value: 0.4443720731982907 and parameters: {'min_samples_split': 11, 'max_features': 70, 'num_arvores': 50}. Best is trial 23 with value: 0.5921510948186549.[0m
[32m[I 2021-08-31 20:29:46,212][0m Trial 51 finished with value: 0.5902738641054041 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 35}. Best is trial 23 with value: 0.5921510948186549.[0m
[32m[I 2021-08-31 20:30:53,484][0m Trial 52 finished with value: 0.5902738641054041 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 35}. Best is trial 23 with value: 0.5921510948186549.[0m
[32m[I 2021-08-31 20:32:10,560][0m Trial 53 finished with value: 0.5861673303346449 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 35}. Best is trial 23 with value: 0.5921510948186549.[0m
[32m[I 2021-08-31 20:33:19,478][0m Trial 54 finished with value: 0.5561419382239664 and parameters: {'min_samples_spl

[32m[I 2021-08-31 21:04:40,718][0m Trial 79 finished with value: 0.59353399157733 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 70 with value: 0.59353399157733.[0m
[32m[I 2021-08-31 21:05:55,064][0m Trial 80 finished with value: 0.5651061310738218 and parameters: {'min_samples_split': 3, 'max_features': 100, 'num_arvores': 30}. Best is trial 70 with value: 0.59353399157733.[0m
[32m[I 2021-08-31 21:07:13,359][0m Trial 81 finished with value: 0.59353399157733 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 70 with value: 0.59353399157733.[0m
[32m[I 2021-08-31 21:08:31,157][0m Trial 82 finished with value: 0.59353399157733 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 70 with value: 0.59353399157733.[0m
[32m[I 2021-08-31 21:09:48,025][0m Trial 83 finished with value: 0.59353399157733 and parameters: {'min_samples_split': 1, 'max_

0.5807536476264081


[32m[I 2021-08-31 21:18:38,035][0m Trial 0 finished with value: 0.48217826473027703 and parameters: {'min_samples_split': 7, 'max_features': 75, 'num_arvores': 50}. Best is trial 0 with value: 0.48217826473027703.[0m
[32m[I 2021-08-31 21:19:06,186][0m Trial 1 finished with value: 0.4546426403588832 and parameters: {'min_samples_split': 11, 'max_features': 100, 'num_arvores': 45}. Best is trial 0 with value: 0.48217826473027703.[0m
[32m[I 2021-08-31 21:19:31,697][0m Trial 2 finished with value: 0.438580837094785 and parameters: {'min_samples_split': 21, 'max_features': 85, 'num_arvores': 40}. Best is trial 0 with value: 0.48217826473027703.[0m
[32m[I 2021-08-31 21:19:58,384][0m Trial 3 finished with value: 0.5506160734361295 and parameters: {'min_samples_split': 3, 'max_features': 80, 'num_arvores': 30}. Best is trial 3 with value: 0.5506160734361295.[0m
[32m[I 2021-08-31 21:20:27,067][0m Trial 4 finished with value: 0.5589504987324521 and parameters: {'min_samples_split':

[32m[I 2021-08-31 21:30:31,519][0m Trial 25 finished with value: 0.5778765933713338 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 35}. Best is trial 25 with value: 0.5778765933713338.[0m
[32m[I 2021-08-31 21:30:57,765][0m Trial 26 finished with value: 0.5142122616155539 and parameters: {'min_samples_split': 5, 'max_features': 80, 'num_arvores': 40}. Best is trial 25 with value: 0.5778765933713338.[0m
[32m[I 2021-08-31 21:31:23,177][0m Trial 27 finished with value: 0.47126240295689675 and parameters: {'min_samples_split': 9, 'max_features': 95, 'num_arvores': 35}. Best is trial 25 with value: 0.5778765933713338.[0m
[32m[I 2021-08-31 21:31:52,866][0m Trial 28 finished with value: 0.536452760092535 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 40}. Best is trial 25 with value: 0.5778765933713338.[0m
[32m[I 2021-08-31 21:32:22,401][0m Trial 29 finished with value: 0.4778892696590991 and parameters: {'min_samples_spli

[32m[I 2021-08-31 21:45:54,962][0m Trial 57 finished with value: 0.45223576435280727 and parameters: {'min_samples_split': 11, 'max_features': 90, 'num_arvores': 30}. Best is trial 32 with value: 0.5792105813010731.[0m
[32m[I 2021-08-31 21:46:29,227][0m Trial 58 finished with value: 0.4404959181417521 and parameters: {'min_samples_split': 15, 'max_features': 75, 'num_arvores': 35}. Best is trial 32 with value: 0.5792105813010731.[0m
[32m[I 2021-08-31 21:47:06,769][0m Trial 59 finished with value: 0.5213990574925113 and parameters: {'min_samples_split': 5, 'max_features': 85, 'num_arvores': 40}. Best is trial 32 with value: 0.5792105813010731.[0m
[32m[I 2021-08-31 21:47:36,522][0m Trial 60 finished with value: 0.5699182289453857 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 30}. Best is trial 32 with value: 0.5792105813010731.[0m
[32m[I 2021-08-31 21:48:07,413][0m Trial 61 finished with value: 0.5751954409036435 and parameters: {'min_samples_s

[32m[I 2021-08-31 22:02:30,638][0m Trial 89 finished with value: 0.5395660480162419 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 40}. Best is trial 76 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 22:03:02,446][0m Trial 90 finished with value: 0.5757629747625321 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 50}. Best is trial 76 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 22:03:32,345][0m Trial 91 finished with value: 0.5774962153337139 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 40}. Best is trial 76 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 22:04:03,470][0m Trial 92 finished with value: 0.5774962153337139 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 40}. Best is trial 76 with value: 0.5797928214631566.[0m
[32m[I 2021-08-31 22:04:32,621][0m Trial 93 finished with value: 0.5774962153337139 and parameters: {'min_samples_spli

0.5843185987336529


[32m[I 2021-08-31 22:08:16,674][0m Trial 0 finished with value: 0.5237889226005511 and parameters: {'min_samples_split': 5, 'max_features': 95, 'num_arvores': 35}. Best is trial 0 with value: 0.5237889226005511.[0m
[32m[I 2021-08-31 22:08:49,634][0m Trial 1 finished with value: 0.5826332750480204 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 50}. Best is trial 1 with value: 0.5826332750480204.[0m
[32m[I 2021-08-31 22:09:17,849][0m Trial 2 finished with value: 0.45964980460538757 and parameters: {'min_samples_split': 9, 'max_features': 95, 'num_arvores': 40}. Best is trial 1 with value: 0.5826332750480204.[0m
[32m[I 2021-08-31 22:09:49,258][0m Trial 3 finished with value: 0.4488777298654523 and parameters: {'min_samples_split': 11, 'max_features': 100, 'num_arvores': 50}. Best is trial 1 with value: 0.5826332750480204.[0m
[32m[I 2021-08-31 22:10:25,818][0m Trial 4 finished with value: 0.583425225957621 and parameters: {'min_samples_split': 1, 

[32m[I 2021-08-31 22:23:52,046][0m Trial 31 finished with value: 0.5826332750480204 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 50}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:24:24,661][0m Trial 32 finished with value: 0.5511819229301668 and parameters: {'min_samples_split': 3, 'max_features': 95, 'num_arvores': 50}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:25:00,618][0m Trial 33 finished with value: 0.5826332750480204 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 50}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:25:33,318][0m Trial 34 finished with value: 0.518101371711874 and parameters: {'min_samples_split': 5, 'max_features': 85, 'num_arvores': 50}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:26:03,929][0m Trial 35 finished with value: 0.5515542611361218 and parameters: {'min_samples_split': 3, 'm

[32m[I 2021-08-31 22:41:27,712][0m Trial 66 finished with value: 0.5515542611361218 and parameters: {'min_samples_split': 3, 'max_features': 90, 'num_arvores': 50}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:41:59,413][0m Trial 67 finished with value: 0.5826332750480204 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 50}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:42:29,504][0m Trial 68 finished with value: 0.5757499961843607 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 45}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:42:58,387][0m Trial 69 finished with value: 0.5551069076357159 and parameters: {'min_samples_split': 3, 'max_features': 95, 'num_arvores': 40}. Best is trial 4 with value: 0.583425225957621.[0m
[32m[I 2021-08-31 22:43:26,151][0m Trial 70 finished with value: 0.441062013640106 and parameters: {'min_samples_split': 19, '

0.5634474535854199


[32m[I 2021-08-31 23:00:55,176][0m Trial 0 finished with value: 0.5324039644574317 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 30}. Best is trial 0 with value: 0.5324039644574317.[0m
[32m[I 2021-08-31 23:01:28,598][0m Trial 1 finished with value: 0.5812845932288232 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:02:01,233][0m Trial 2 finished with value: 0.5754730448051064 and parameters: {'min_samples_split': 1, 'max_features': 95, 'num_arvores': 35}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:02:29,346][0m Trial 3 finished with value: 0.4385366871188751 and parameters: {'min_samples_split': 17, 'max_features': 85, 'num_arvores': 40}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:02:57,873][0m Trial 4 finished with value: 0.4413907298949162 and parameters: {'min_samples_split': 11, 

[32m[I 2021-08-31 23:10:25,232][0m Trial 20 finished with value: 0.5373394245298925 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 35}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:10:53,176][0m Trial 21 finished with value: 0.5650237918440045 and parameters: {'min_samples_split': 1, 'max_features': 90, 'num_arvores': 35}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:11:24,161][0m Trial 22 finished with value: 0.5791042144769887 and parameters: {'min_samples_split': 1, 'max_features': 95, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:11:51,035][0m Trial 23 finished with value: 0.4852580408241976 and parameters: {'min_samples_split': 7, 'max_features': 80, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:12:18,260][0m Trial 24 finished with value: 0.5389634776385721 and parameters: {'min_samples_split': 

[32m[I 2021-08-31 23:25:15,368][0m Trial 51 finished with value: 0.5744991910338456 and parameters: {'min_samples_split': 1, 'max_features': 100, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:25:44,361][0m Trial 52 finished with value: 0.5791042144769887 and parameters: {'min_samples_split': 1, 'max_features': 95, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:26:13,049][0m Trial 53 finished with value: 0.545549454629798 and parameters: {'min_samples_split': 3, 'max_features': 90, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:26:42,911][0m Trial 54 finished with value: 0.5791042144769887 and parameters: {'min_samples_split': 1, 'max_features': 95, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:27:14,974][0m Trial 55 finished with value: 0.5390854203830963 and parameters: {'min_samples_split': 

[32m[I 2021-08-31 23:41:41,007][0m Trial 85 finished with value: 0.5389634776385721 and parameters: {'min_samples_split': 3, 'max_features': 85, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:42:09,550][0m Trial 86 finished with value: 0.5812845932288232 and parameters: {'min_samples_split': 1, 'max_features': 85, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:42:37,879][0m Trial 87 finished with value: 0.5702183661325418 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:43:03,554][0m Trial 88 finished with value: 0.4455999446341292 and parameters: {'min_samples_split': 11, 'max_features': 85, 'num_arvores': 30}. Best is trial 1 with value: 0.5812845932288232.[0m
[32m[I 2021-08-31 23:43:35,583][0m Trial 89 finished with value: 0.5812845932288232 and parameters: {'min_samples_split':

0.6035348360655737
Representação: emb_keywords_exp concluida
===== Representação: bow


[32m[I 2021-08-31 23:50:11,469][0m A new study created in RDB with name: random_forest_bow_fold_0[0m
[32m[I 2021-09-01 03:11:18,681][0m Trial 0 finished with value: 0.7815944011621748 and parameters: {'min_samples_split': 5, 'max_features': 90, 'num_arvores': 50}. Best is trial 0 with value: 0.7815944011621748.[0m
[32m[I 2021-09-01 11:19:41,797][0m Trial 1 finished with value: 0.7814450931541131 and parameters: {'min_samples_split': 21, 'max_features': 90, 'num_arvores': 45}. Best is trial 0 with value: 0.7815944011621748.[0m
[32m[I 2021-09-01 11:33:33,764][0m Trial 2 finished with value: 0.7790714395450004 and parameters: {'min_samples_split': 21, 'max_features': 80, 'num_arvores': 35}. Best is trial 0 with value: 0.7815944011621748.[0m
[32m[I 2021-09-01 11:49:43,146][0m Trial 3 finished with value: 0.7881230677083457 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 50}. Best is trial 3 with value: 0.7881230677083457.[0m
[32m[I 2021-09-01 12:

[32m[I 2021-09-02 10:24:07,170][0m Trial 37 finished with value: 0.7816642241206478 and parameters: {'min_samples_split': 7, 'max_features': 75, 'num_arvores': 30}. Best is trial 14 with value: 0.7898298654519779.[0m
[32m[I 2021-09-02 10:37:24,093][0m Trial 38 finished with value: 0.7781873702298515 and parameters: {'min_samples_split': 11, 'max_features': 80, 'num_arvores': 40}. Best is trial 14 with value: 0.7898298654519779.[0m
[32m[I 2021-09-02 10:52:43,822][0m Trial 39 finished with value: 0.7804312949213047 and parameters: {'min_samples_split': 5, 'max_features': 90, 'num_arvores': 35}. Best is trial 14 with value: 0.7898298654519779.[0m
[32m[I 2021-09-02 11:04:08,093][0m Trial 40 finished with value: 0.7839736490160664 and parameters: {'min_samples_split': 1, 'max_features': 80, 'num_arvores': 45}. Best is trial 14 with value: 0.7898298654519779.[0m
[32m[I 2021-09-02 11:12:12,054][0m Trial 41 finished with value: 0.7882678941588869 and parameters: {'min_samples_spl

[32m[I 2021-09-02 18:20:03,586][0m Trial 75 finished with value: 0.7893981382966498 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 30}. Best is trial 60 with value: 0.7913187838147083.[0m
[32m[I 2021-09-02 18:30:32,651][0m Trial 76 finished with value: 0.7913187838147083 and parameters: {'min_samples_split': 1, 'max_features': 70, 'num_arvores': 30}. Best is trial 60 with value: 0.7913187838147083.[0m
[32m[I 2021-09-02 18:40:19,549][0m Trial 77 finished with value: 0.7893981382966498 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 30}. Best is trial 60 with value: 0.7913187838147083.[0m
[32m[I 2021-09-02 18:49:21,765][0m Trial 78 finished with value: 0.7893981382966498 and parameters: {'min_samples_split': 3, 'max_features': 70, 'num_arvores': 30}. Best is trial 60 with value: 0.7913187838147083.[0m
[32m[I 2021-09-02 18:58:16,873][0m Trial 79 finished with value: 0.7913187838147083 and parameters: {'min_samples_spli

KeyboardInterrupt: 

Como a experimentação é uma tarefa custosa, todos os resultados são salvos na pasta "resultados" - inclusive os valores dos parametros na classe optuna (a prática de avaliação apresenta mais detalhes da biblioteca Optuna). A macro f1 é uma métrica relacionada a taxa de acerto (se necessário, [veja a explicação neste video - tópico 2 e 3)](https://www.youtube.com/watch?v=u7o7CSeXaNs&list=PLwIaU1DGYV6tUx10fCTw5aPnqypbbK_GJ&index=13). Analise os resultados abaixo: qual representação foi melhor? A restrição de vocabulário ou eliminação de stopwords auxiliou? 

In [5]:
import os
import pandas as pd
from base_am.avaliacao import Experimento

arr_resultado = []
for resultado_csv in os.listdir("resultados"):
    if resultado_csv.endswith("csv"):
        nom_experimento = resultado_csv.split(".")[0]
        
        #carrega resultados previamente realizados
        experimento = Experimento(nom_experimento,[])
        experimento.carrega_resultados_existentes()
        
        #adiciona experimento
        num_folds = len(experimento.resultados)
        dict_resultados = {"nom_experimento":nom_experimento, 
                            "macro-f1":sum([r.macro_f1 for r in experimento.resultados])/num_folds}
        #resultados por classe
        for classe in experimento.resultados[0].mat_confusao.keys():

            dict_resultados[f"f1-{classe}"] = sum([r.f1_por_classe[classe] for r in experimento.resultados])/num_folds
            dict_resultados[f"precision-{classe}"] = sum([r.precisao[classe] for r in experimento.resultados])/num_folds
            dict_resultados[f"recall-{classe}"] = sum([r.revocacao[classe] for r in experimento.resultados])/num_folds

        arr_resultado.append(dict_resultados)

resultado = pd.DataFrame.from_dict(arr_resultado)
resultado.sort_values(['macro-f1'], ascending = False, inplace = True)
resultado

[32m[I 2021-09-03 08:45:53,168][0m Using an existing study with name 'random_forest_embbeding_fold_0' instead of creating a new one.[0m
[32m[I 2021-09-03 08:45:53,212][0m Using an existing study with name 'random_forest_embbeding_fold_1' instead of creating a new one.[0m
[32m[I 2021-09-03 08:45:53,250][0m Using an existing study with name 'random_forest_embbeding_fold_2' instead of creating a new one.[0m
[32m[I 2021-09-03 08:45:53,292][0m Using an existing study with name 'random_forest_embbeding_fold_3' instead of creating a new one.[0m
[32m[I 2021-09-03 08:45:53,332][0m Using an existing study with name 'random_forest_embbeding_fold_4' instead of creating a new one.[0m
[32m[I 2021-09-03 08:45:53,382][0m Using an existing study with name 'random_forest_emb_keywords_exp_fold_0' instead of creating a new one.[0m
[32m[I 2021-09-03 08:45:53,421][0m Using an existing study with name 'random_forest_emb_keywords_exp_fold_1' instead of creating a new one.[0m
[32m[I 2021-

Unnamed: 0,nom_experimento,macro-f1,f1-0,precision-0,recall-0,f1-1,precision-1,recall-1
0,random_forest_embbeding,0.583069,0.882247,0.810996,0.967329,0.283891,0.611769,0.185323
2,random_forest_emb_nostop,0.582335,0.880947,0.810847,0.964414,0.283723,0.590858,0.186958
1,random_forest_emb_keywords_exp,0.581566,0.881958,0.810649,0.967117,0.281174,0.607288,0.183389


### Discussão

Estudando o resultado acima, ordenado em forma decrescente pelo critério macro $F1$ que leva em consideração as predições corretas, os falsos negativos e os falsos positivos do classificador para calcular o quão bom ele é em classificar os individuos de determinados grupos (quanto mais próximo de 1 esse valor for, melhor o classificador)

Assim, verificamos que a representação bag of words é a que apresenta o maior macro $F1$, ou seja, a que mais analisa corretamente os sentimentos das revisões dos usuários. Porém, como dito anteriormente, essa representação é sujeita a uma limitação de generalização, visto que ela não permite calcular a distância entre palavras e consequentemente, a construir estruturas de sinônimos ou analogias por exemplo.

Por fim, random_forest_bow é seguida pela random forest embedding e random forest embedding que desconsidera stopwords. Considerei isso um aspecto interessante, pois, stopwords em tese são palavras sem grande contribuição semântica para a extração do sentimento de sentenças, mas no caso presente, elas dão um incremento significativo (aproximadamente 2%) no macro $F1$ do classificador

## Bibliografia

Bolukbasi, T., Chang, K. W., Zou, J., Saligrama, V., & Kalai, A. (2016). **[Man is to computer programmer as woman is to homemaker? Debiasing word embeddings](https://arxiv.org/abs/1607.06520)**. 

Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., & Aluisio, S. (2017). [**Portuguese word embeddings: Evaluating on word analogies and natural language tasks.**](https://arxiv.org/abs/1708.06025)


Pennington, J., Socher, R., & Manning, C. D. (2014, October).**[GloVe: Global Vectors for Word Representation](https://nlp.stanford.edu/pubs/glove.pdf)**. In EMNLP 2015 


Scherer, Klaus R. **[What are emotions? And how can they be measured?](https://journals.sagepub.com/doi/pdf/10.1177/0539018405058216)**. Social science information, v. 44, n. 4, p. 695-729, 2005.

Shen, D., Wang, G., Wang, W., Min, M. R., Su, Q., Zhang, Y., Carin, L. (2018). [Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms](https://arxiv.org/pdf/1805.09843.pdf).




<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Licença Creative Commons" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />Este obra está licenciado com uma Licença <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Atribuição-CompartilhaIgual 4.0 Internacional</a>.