# *Se* multiclass classification




01/11/2020





---



This notebook is part of a set of research experiments on Spanish *se* constructions. This notebook contains the code for training and testing *se* multiclass classification models. The data to do so is a gold standard corpus composed of 2,140 sentences (1,713 training sentences; 427 testing sentences) containing the word *se*. The multiclass classification task proposed in this notebook has to do with assigning a single defining tag to every instance of *se* that appears in the cotpus, that is, predicting the specific and exclusing properties of *se* in each sentence. The tag set used to annotate the corpus is composed of four tags: *expl*, *se-mark*, *iobj* and *obj*. The notebook is structured as follows:
    
1.   Preliminaries    
    1.1. Data loading    
    1.2. Data preparation

2.   Modelling and evaluation    
    2.1.   Baseline models    
    2.2.   Bag of words models     
    2.3.   HashingVectorizer models    
    2.4.   TF-IDF models    
    2.5.   BETO models    




---



## 1. Preliminaries

### 1.1. Data loading

Load train and test data from a `data` folder. Train and test partitions are set apart beforehand to use the same test dataset in case the train dataset grows. Besides, create a `preds` folder to save the predicitions the models generate. Since the tag distribution in the gold standard corpus is very unbalanced, four different scenarios strategies are tested to raise the number of correct cases of the less frequent categories: 
1.   Benchmark: using train and test datasets.    
2.   Using `f1-macro` parameter over GridSearch.    
3.   Using `class_weighted='balanced'` parameter over LinearSVC.     
4.   Using an oversampled version of the train dataset (`se_classification_balanced_train`).    



In [None]:
import pandas as pd
train = pd.read_csv('./se_classification_train.csv', delimiter='\t', index_col='id')
train_oversampling = pd.read_csv('./se_classification_balanced_train.csv', delimiter='\t', index_col='id')
test = pd.read_csv('./se_classification_test.csv', delimiter='\t', index_col='id')

Check the content in saved in the variables `train` and `test`.

In [None]:
train.head()

Unnamed: 0_level_0,text,se_tag
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Finalmente, el aragonés se hizo con su tercera...",expl
2,Les recomienda que visiten la web (www.farmace...,expl
3,"Ahí se desfondó el Deportivo, igual que Guarda...",expl
4,Dio un golpe de timón para adjudicarse la prim...,iobj
5,"Sólo se dirigía a mí para pedirme cosas, que s...",expl


In [None]:
train_oversampling.head()

Unnamed: 0_level_0,text,se_tag
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Finalmente, el aragonés se hizo con su tercera...",expl
2,Les recomienda que visiten la web (www.farmace...,expl
3,"Ahí se desfondó el Deportivo, igual que Guarda...",expl
4,Dio un golpe de timón para adjudicarse la prim...,iobj
5,"Sólo se dirigía a mí para pedirme cosas, que s...",expl


In [None]:
test.head()

Unnamed: 0_level_0,text,se_tag
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1713,Las calabazas son el elemento decorativo con e...,se-mark
1714,"Según el alcalde de la localidad, José Dorado ...",expl
1715,En diez días empezará en Barcelona el juego me...,expl
1716,Sin inmutarse consideró importante ver los ext...,expl
1717,"Llega con desventaja a la segunda fase, donde ...",obj


### 1.2. Data preparation

Divide the train and test datasets into:    
*   X takes the variable *text* 
*   Y takes the variable *se_tag*

Besides, assign a number (0 to 3) to each of the four values the *se_tag* field might adquire through the LabelEncoder.

   

In [None]:
from sklearn.preprocessing import LabelEncoder

X_train = train['text'].values
X_train_oversampling = train_oversampling['text'].values
X_test = test['text'].values

label_encoder = LabelEncoder()
Y_train = label_encoder.fit_transform(train['se_tag'].values)
Y_train_oversampling = label_encoder.fit_transform(train_oversampling['se_tag'].values)
Y_test = label_encoder.transform(test['se_tag'].values)

Check whether the number of tags is right.



In [None]:
set(Y_train)

{0, 1, 2, 3}

In [None]:
set(Y_train_oversampling)

{0, 1, 2, 3}

In [None]:
set(Y_test)

{0, 1, 2, 3}

### 1.3 Additional packages

The following packages are required to run and optimize this model

In [None]:
pip install transformers==3.5.1 scikit-optimize==0.8.1 spacy==3.0.* fasttext==0.9.2

Collecting transformers==3.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/3a/83/e74092e7f24a08d751aa59b37a9fc572b2e4af3918cb66f7766c3affb1b4/transformers-3.5.1-py3-none-any.whl (1.3MB)
[K     |████████████████████████████████| 1.3MB 17.6MB/s 
[?25hCollecting scikit-optimize==0.8.1
[?25l  Downloading https://files.pythonhosted.org/packages/8b/03/be33e89f55866065a02e515c5b319304a801a9f1027a9b311a9b1d1f8dc7/scikit_optimize-0.8.1-py2.py3-none-any.whl (101kB)
[K     |████████████████████████████████| 102kB 15.0MB/s 
[?25hCollecting spacy==3.0.*
[?25l  Downloading https://files.pythonhosted.org/packages/c5/5d/20f8252a9dfe7057721136d83cecb1ca1e0936b21fd7a0a4889d1d6650a8/spacy-3.0.1-cp36-cp36m-manylinux2014_x86_64.whl (12.8MB)
[K     |████████████████████████████████| 12.8MB 226kB/s 
[?25hCollecting fasttext==0.9.2
[?25l  Downloading https://files.pythonhosted.org/packages/f8/85/e2b368ab6d3528827b147fdb814f8189acc981a4bc2f99ab894650e05c40/fasttext-0.9.2.tar.gz (68kB)


### 1.4 General imports

In [None]:
import numpy as np
from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer, HashingVectorizer, TfidfVectorizer
from sklearn.metrics import f1_score, classification_report
from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold
from sklearn.multiclass import OneVsRestClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC

## 2. Modelling and evaluation

In this section, eight different models are trained with different configurations to maximize the number of correct answers. After the training procedure, models are avaluated using *macro avg F-score*. Some of the parameters that govern the training procedure are defined here:

In [None]:
TUNING_ITERATIONS = 30
VECTORIZER_BINARY = [True, False]
VECTORIZER_N_GRAM = [(1,1), (1,2), (1,3), (1,4), (1,5), (2,2), (3,3), (3,5), (5,5), (5,7), (7,7), (7,9), (10,10)]
RF_ESTIMATORS = [10, 100, 1000]
RF_MAX_DEPTH = [3, 5, 10, 15, 20, 25, 30, None]
SVC_C = [1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 1e3, 1e4]

### 2.1. Base line models

#### 2.1.1 Base line model

This model asigns the most frequent tag (se-mark, tag 3) to the whole test set and checks the number of correct answers.

In [None]:
import numpy as np
from sklearn.metrics import f1_score


baseline_preds = np.full(Y_test.shape, 3)
np.save('preds/baseline_preds', baseline_preds)
baseline_preds

array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,

In [None]:
from sklearn.metrics import classification_report
print(classification_report(Y_test, baseline_preds))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.49      1.00      0.66       210

    accuracy                           0.49       428
   macro avg       0.12      0.25      0.16       428
weighted avg       0.24      0.49      0.32       428



  _warn_prf(average, modifier, msg_start, len(result))


### 2.2. Bag of Words models

#### 2.2.1. Non-linear CountVectorizer

##### 2.2.1.1. Non-linear CountVectorizer (benchmark)

In [None]:
cvrfgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

cvrfgs_model = RandomizedSearchCV(cvrfgs_model, params, n_iter=TUNING_ITERATIONS, n_jobs = -1, cv=StratifiedKFold(), random_state=12345, verbose=2)

cvrfgs_model.fit(X_train, Y_train)

print(cvrfgs_model.best_params_)

{'vectorizer__ngram_range': (3, 5), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': 30}


In [None]:
cvrfgs_preds = cvrfgs_model.predict(X_test)
np.save('preds/cvrfgs_preds', cvrfgs_preds)
print(classification_report(Y_test, cvrfgs_preds))

              precision    recall  f1-score   support

           0       0.55      0.73      0.63       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.68      0.65      0.67       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.56      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.2.1.2. Non-linear CountVectorizer (`f1-macro`)

In [None]:
cvrfgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

cvrfgs_f1_macro_model = RandomizedSearchCV(cvrfgs_model, params, scoring='f1_macro', n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvrfgs_f1_macro_model.fit(X_train, Y_train)

print(cvrfgs_f1_macro_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   37.5s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  4.7min finished


{'vectorizer__ngram_range': (3, 5), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': 30}


In [None]:
cvrfgs_f1_macro_preds = cvrfgs_f1_macro_model.predict(X_test)
np.save('preds/cvrfgs_f1_macro_preds', cvrfgs_f1_macro_preds)
print(classification_report(Y_test, cvrfgs_f1_macro_preds))

              precision    recall  f1-score   support

           0       0.55      0.73      0.63       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.68      0.65      0.67       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.56      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.2.1.3. Non-linear CountVectorizer (`class_weight='balanced'`)

In [None]:
cvrfgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, class_weight='balanced'))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

cvrfgs_class_weight_model = RandomizedSearchCV(cvrfgs_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvrfgs_class_weight_model.fit(X_train, Y_train)

print(cvrfgs_class_weight_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   36.5s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  4.9min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': None}


In [None]:
cvrfgs_class_weight_model_preds = cvrfgs_class_weight_model.predict(X_test)
np.save('preds/cvrfgs_class_weight_model_preds', cvrfgs_class_weight_model_preds)
print(classification_report(Y_test, cvrfgs_class_weight_model_preds))

              precision    recall  f1-score   support

           0       0.58      0.65      0.61       173
           1       1.00      0.03      0.06        31
           2       0.00      0.00      0.00        14
           3       0.66      0.74      0.70       210

    accuracy                           0.63       428
   macro avg       0.56      0.35      0.34       428
weighted avg       0.63      0.63      0.59       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.2.1.4. Non-linear CountVectorizer (oversampling)

In [None]:
cvrfgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

cvrfgs_oversampling_model = RandomizedSearchCV(cvrfgs_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvrfgs_oversampling_model.fit(X_train_oversampling, Y_train_oversampling)

print(cvrfgs_oversampling_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   54.7s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  6.7min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': None}


In [None]:
cvrfgs_oversampling_preds = cvrfgs_oversampling_model.predict(X_test)
np.save('preds/cvrfgs_oversampling_preds', cvrfgs_oversampling_preds)
print(classification_report(Y_test, cvrfgs_oversampling_preds))

              precision    recall  f1-score   support

           0       0.57      0.61      0.59       173
           1       0.50      0.03      0.06        31
           2       0.00      0.00      0.00        14
           3       0.65      0.75      0.70       210

    accuracy                           0.62       428
   macro avg       0.43      0.35      0.34       428
weighted avg       0.59      0.62      0.59       428



  _warn_prf(average, modifier, msg_start, len(result))


#### 2.2.2. Linear CountVectorizer

##### 2.2.2.1. Linear CountVectorizer (benchmark)

In [None]:
cvovrgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

cvovrgs_model = RandomizedSearchCV(cvovrgs_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvovrgs_model.fit(X_train, Y_train)

print(cvovrgs_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  6.5min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
cvovrgs_preds = cvovrgs_model.predict(X_test)
np.save('preds/cvovrgs_preds', cvovrgs_preds)
print(classification_report(Y_test, cvovrgs_preds))

              precision    recall  f1-score   support

           0       0.58      0.65      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.65      0.71      0.68       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.2.2.2. Linear CountVectorizer (`f1_macro`)

In [None]:
cvovrgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

cvovrgs_f1_macro_model = RandomizedSearchCV(cvovrgs_model, params, scoring='f1_macro', n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvovrgs_f1_macro_model.fit(X_train, Y_train)

print(cvovrgs_f1_macro_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  6.5min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
cvovrgs_f1_macro_preds = cvovrgs_f1_macro_model.predict(X_test)
np.save('preds/cvovrgs_f1_macro_preds', cvovrgs_f1_macro_preds)
print(classification_report(Y_test, cvovrgs_f1_macro_preds))

              precision    recall  f1-score   support

           0       0.58      0.65      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.65      0.71      0.68       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.2.2.3. Linear CountVectorizer( `class_weight='balanced'`)

In [None]:
cvovrgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC(class_weight='balanced')), n_jobs=-1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

cvovrgs_class_weight_model = RandomizedSearchCV(cvovrgs_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvovrgs_class_weight_model.fit(X_train, Y_train)

print(cvovrgs_class_weight_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  6.8min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
cvovrgs_class_weight_preds = cvovrgs_class_weight_model.predict(X_test)
np.save('preds/cvovrgs_class_weight_preds', cvovrgs_class_weight_preds)
print(classification_report(Y_test, cvovrgs_class_weight_preds))

              precision    recall  f1-score   support

           0       0.57      0.64      0.60       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.64      0.71      0.67       210

    accuracy                           0.61       428
   macro avg       0.30      0.34      0.32       428
weighted avg       0.54      0.61      0.57       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.2.2.4. Linear CountVectorizer(`f1_macro`, `class_weight='balanced'`)

In [None]:
cvovrgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC(class_weight='balanced')), n_jobs=-1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

cvovrgs_class_f1_model = RandomizedSearchCV(cvovrgs_model, params, scoring='f1_macro', n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvovrgs_class_f1_model.fit(X_train, Y_train)

print(cvovrgs_class_f1_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  6.9min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
cvovrgs_class_f1_preds = cvovrgs_class_f1_model.predict(X_test)
np.save('preds/cvovrgs_class_f1_preds', cvovrgs_class_f1_preds)
print(classification_report(Y_test, cvovrgs_class_f1_preds))

              precision    recall  f1-score   support

           0       0.57      0.64      0.60       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.64      0.71      0.67       210

    accuracy                           0.61       428
   macro avg       0.30      0.34      0.32       428
weighted avg       0.54      0.61      0.57       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.2.2.5. Linear CountVectorizer (oversampling)

In [None]:
cvrfgs_model = Pipeline([
    ('vectorizer', CountVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

cvrfgs_oversampling_model = RandomizedSearchCV(cvrfgs_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

cvrfgs_oversampling_model.fit(X_train_oversampling, Y_train_oversampling)

print(cvrfgs_oversampling_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  3.9min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 23.0min finished


{'vectorizer__ngram_range': (5, 5), 'vectorizer__binary': False, 'classifier__estimator__base_estimator__C': 0.1}


In [None]:
cvrfgs_oversampling_preds = cvrfgs_oversampling_model.predict(X_test)
np.save('preds/cvrfgs_oversampling_preds', cvrfgs_oversampling_preds)
print(classification_report(Y_test, cvrfgs_oversampling_preds))

              precision    recall  f1-score   support

           0       0.55      0.68      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.66      0.66      0.66       210

    accuracy                           0.60       428
   macro avg       0.30      0.33      0.32       428
weighted avg       0.54      0.60      0.57       428



  _warn_prf(average, modifier, msg_start, len(result))


### 2.3. HashingVectorizer models

#### 2.3.1. Non-linear HashingVectorizer




##### 2.3.1.1. Non-linear HashingVectorizer (benchmark)

In [None]:
hvrf_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

hvrfgs_model = RandomizedSearchCV(hvrf_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvrfgs_model.fit(X_train, Y_train)
print(hvrfgs_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  3.1min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 42.8min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': None}


In [None]:
hvrfgs_preds = hvrfgs_model.predict(X_test)
np.save('preds/hvrfgs_preds', hvrfgs_preds)
print(classification_report(Y_test, hvrfgs_preds))

              precision    recall  f1-score   support

           0       0.55      0.66      0.60       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.66      0.69      0.67       210

    accuracy                           0.61       428
   macro avg       0.30      0.34      0.32       428
weighted avg       0.55      0.61      0.57       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.3.1.2. Non-linear HashingVectorizer (`f1_macro`)

In [None]:
hvrf_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

hvrfgs_f1_macro_model = RandomizedSearchCV(hvrf_model, params, n_jobs = -1, scoring='f1_macro', cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvrfgs_f1_macro_model.fit(X_train, Y_train)
print(hvrfgs_f1_macro_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  3.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 43.2min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': None}


In [None]:
hvrfgs_f1_macro_preds = hvrfgs_f1_macro_model.predict(X_test)
np.save('preds/hvrfgs_f1_macro_preds', hvrfgs_f1_macro_preds)
print(classification_report(Y_test, hvrfgs_f1_macro_preds))

              precision    recall  f1-score   support

           0       0.55      0.66      0.60       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.66      0.69      0.67       210

    accuracy                           0.61       428
   macro avg       0.30      0.34      0.32       428
weighted avg       0.55      0.61      0.57       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.3.1.2. Non-linear HashingVectorizer (`class_weight=balanced`)

In [None]:
hvrf_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1, class_weight='balanced'))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

hvrfgs_class_weight_model = RandomizedSearchCV(hvrf_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvrfgs_class_weight_model.fit(X_train, Y_train)
print(hvrfgs_class_weight_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  3.8min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 53.6min finished


{'vectorizer__ngram_range': (3, 5), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': 30}


In [None]:
hvrfgs_class_weight_model_preds = hvrfgs_class_weight_model.predict(X_test)
np.save('preds/hvrfgs_f1_macro_preds', hvrfgs_class_weight_model_preds)
print(classification_report(Y_test, hvrfgs_class_weight_model_preds))

              precision    recall  f1-score   support

           0       0.54      0.63      0.58       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.63      0.68      0.65       210

    accuracy                           0.59       428
   macro avg       0.29      0.33      0.31       428
weighted avg       0.53      0.59      0.55       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.3.1.3. Non-linear HashingVectorizer (oversampling)

In [None]:
hvrf_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1))
    ]
)

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
    'classifier__max_depth' : RF_MAX_DEPTH,
}

hvrfgs_oversampling_model = RandomizedSearchCV(hvrf_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvrfgs_oversampling_model.fit(X_train_oversampling, Y_train_oversampling)
print(hvrfgs_oversampling_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  4.7min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 62.5min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000, 'classifier__max_depth': None}


In [None]:
hvrfgs_oversampling_preds = hvrfgs_oversampling_model.predict(X_test)
np.save('preds/hvrfgs_oversampling_preds', hvrfgs_oversampling_preds)
print(classification_report(Y_test, hvrfgs_oversampling_preds))

              precision    recall  f1-score   support

           0       0.60      0.60      0.60       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.64      0.78      0.70       210

    accuracy                           0.62       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.62      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


#### 2.3.2. Linear HashingVectorizer

##### 2.3.2.1. Linear HashingVectorizer (benchmark)


In [None]:
hvovr_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

hvovr_model = RandomizedSearchCV(hvovr_model, params, n_jobs=-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvovr_model.fit(X_train, Y_train)
print(hvovr_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  8.8min finished


{'vectorizer__ngram_range': (5, 5), 'vectorizer__binary': False, 'classifier__estimator__base_estimator__C': 0.1}


In [None]:
hvovr_preds = hvovr_model.predict(X_test)
np.save('preds/hvovr_preds', hvovr_preds)
print(classification_report(Y_test, hvovr_preds))

              precision    recall  f1-score   support

           0       0.53      0.68      0.60       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.65      0.64      0.65       210

    accuracy                           0.59       428
   macro avg       0.30      0.33      0.31       428
weighted avg       0.53      0.59      0.56       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.3.2.2. Linear HashingVectorizer (`f1_macro`)


In [None]:
hvovr_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

hvovr_f1_macro_model = RandomizedSearchCV(hvovr_model, params, n_jobs=-1, scoring='f1_macro', cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvovr_f1_macro_model.fit(X_train, Y_train)
print(hvovr_f1_macro_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  6.8min finished


{'vectorizer__ngram_range': (5, 5), 'vectorizer__binary': False, 'classifier__estimator__base_estimator__C': 0.1}


In [None]:
hvovr_f1_macro_preds = hvovr_f1_macro_model.predict(X_test)
np.save('preds/hvovr_f1_macro_preds', hvovr_f1_macro_preds)
print(classification_report(Y_test, hvovr_f1_macro_preds))

              precision    recall  f1-score   support

           0       0.53      0.68      0.60       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.65      0.64      0.65       210

    accuracy                           0.59       428
   macro avg       0.30      0.33      0.31       428
weighted avg       0.53      0.59      0.56       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.3.2.3. Linear HashingVectorizer (`class_weight`)

In [None]:
hvovr_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC(class_weight='balanced')), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

hvovr_class_weight_model = RandomizedSearchCV(hvovr_model, params, n_jobs=-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvovr_class_weight_model.fit(X_train, Y_train)
print(hvovr_class_weight_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.3min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  7.1min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
hvovr_class_weight_preds = hvovr_class_weight_model.predict(X_test)
np.save('preds/hvovr_class_weight_preds', hvovr_class_weight_preds)
print(classification_report(Y_test, hvovr_class_weight_preds))

              precision    recall  f1-score   support

           0       0.56      0.66      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.66      0.70      0.68       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.3.2.4. Linear HashingVectorizer (`f1_macro`, `class_weight`)

In [None]:
hvovr_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC(class_weight='balanced')), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

hvovr_class_f1_model = RandomizedSearchCV(hvovr_model, params, n_jobs=-1, scoring='f1_macro', cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvovr_class_f1_model.fit(X_train, Y_train)
print(hvovr_class_f1_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  7.1min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
hvovr_class_f1_preds = hvovr_class_f1_model.predict(X_test)
np.save('preds/hvovr_class_f1_preds', hvovr_class_f1_preds)
print(classification_report(Y_test, hvovr_class_f1_preds))

              precision    recall  f1-score   support

           0       0.56      0.66      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.66      0.70      0.68       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.3.2.5. Linear HashingVectorizer (oversampling)

In [None]:
hvovr_model = Pipeline([
    ('vectorizer', HashingVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

hvovr_oversampling_model = RandomizedSearchCV(hvovr_model, params, n_jobs=-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

hvovr_oversampling_model.fit(X_train_oversampling, Y_train_oversampling)
print(hvovr_oversampling_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  3.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 27.1min finished


{'vectorizer__ngram_range': (1, 5), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 10.0}


In [None]:
hvovr_oversampling_preds = hvovr_oversampling_model.predict(X_test)
np.save('preds/hvovr_oversampling_preds', hvovr_oversampling_preds)
print(classification_report(Y_test, hvovr_oversampling_preds))

              precision    recall  f1-score   support

           0       0.55      0.69      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.68      0.68      0.68       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


### 2.4. TF-IDF models

#### 2.4.1 Non-linear TFD-IDF

##### 2.4.1.1. Non-linear TFI-DF (benchmark)

In [None]:
tf_idf_rf_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
}

tf_idf_rf_gs_model = RandomizedSearchCV(tf_idf_rf_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tf_idf_rf_gs_model.fit(X_train, Y_train)

print(tf_idf_rf_gs_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 10.6min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000}


In [None]:
tf_idf_rf_gs_preds = tf_idf_rf_gs_model.predict(X_test)
np.save('preds/tf_idf_rf_gs_preds', tf_idf_rf_gs_preds)
print(classification_report(Y_test, tf_idf_rf_gs_preds))

              precision    recall  f1-score   support

           0       0.55      0.67      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.67      0.70      0.68       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.4.1.2. Non-linear TF-IDF (`f1_macro`)


In [None]:
tfidfrf_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
}

tfidfrfgs_f1_macro_model = RandomizedSearchCV(tfidfrf_model, params, scoring='f1_macro', n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tfidfrfgs_f1_macro_model.fit(X_train, Y_train)

print(tfidfrfgs_f1_macro_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 10.5min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000}


In [None]:
tfidfrfgs_f1_macro_preds = tfidfrfgs_f1_macro_model.predict(X_test)
np.save('preds/tfidfrfgs_f1_macro_preds', tfidfrfgs_f1_macro_preds)
print(classification_report(Y_test, tfidfrfgs_f1_macro_preds))

              precision    recall  f1-score   support

           0       0.55      0.67      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.67      0.70      0.68       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.55      0.61      0.58       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.4.1.3. Non-linear TF-IDF (`class_weight='balanced'`)

In [None]:
tf_idf_rf_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1, class_weight='balanced'))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
}

tf_idf_rf_gs_model = RandomizedSearchCV(tf_idf_rf_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tf_idf_rf_gs_model.fit(X_train, Y_train)

print(tf_idf_rf_gs_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 11.1min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000}


In [None]:
tf_idf_rf_gs_preds = tf_idf_rf_gs_model.predict(X_test)
np.save('preds/tf_idf_rf_gs_preds', tf_idf_rf_gs_preds)
print(classification_report(Y_test, tf_idf_rf_gs_preds))

              precision    recall  f1-score   support

           0       0.58      0.67      0.62       173
           1       1.00      0.03      0.06        31
           2       0.00      0.00      0.00        14
           3       0.67      0.72      0.70       210

    accuracy                           0.63       428
   macro avg       0.56      0.36      0.35       428
weighted avg       0.64      0.63      0.60       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.4.1.4. Non-linear TF-IDF (oversampling)

In [None]:
tfidfrf_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', RandomForestClassifier(random_state=42, n_jobs = -1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__n_estimators' : RF_ESTIMATORS,
}

tfidfrfgs_oversampling_model = RandomizedSearchCV(tfidfrf_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tfidfrfgs_oversampling_model.fit(X_train_oversampling, Y_train_oversampling)

print(tfidfrfgs_oversampling_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  3.4min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 16.9min finished


{'vectorizer__ngram_range': (5, 7), 'vectorizer__binary': True, 'classifier__n_estimators': 1000}


In [None]:
tfidfrfgs_oversampling_preds = tfidfrfgs_oversampling_model.predict(X_test)
np.save('preds/tfidfrfgs_oversampling_preds', tfidfrfgs_oversampling_preds)
print(classification_report(Y_test, tfidfrfgs_oversampling_preds))

              precision    recall  f1-score   support

           0       0.59      0.60      0.59       173
           1       0.50      0.03      0.06        31
           2       0.00      0.00      0.00        14
           3       0.64      0.77      0.70       210

    accuracy                           0.62       428
   macro avg       0.43      0.35      0.34       428
weighted avg       0.59      0.62      0.59       428



  _warn_prf(average, modifier, msg_start, len(result))


#### 2.4.2. Linear TF-IDF

##### 2.4.2.1. Linear TF-IDF (benchmark)

In [None]:
tfidfovrsvc_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

tfidfovrsvcgs_model = RandomizedSearchCV(tfidfovrsvc_model, params, n_jobs =-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tfidfovrsvcgs_model.fit(X_train, Y_train)

print(tfidfovrsvcgs_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   53.2s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  2.9min finished


{'vectorizer__ngram_range': (1, 5), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.001}


In [None]:
tfidfovrsvcgs_preds = tfidfovrsvcgs_model.predict(X_test)
np.save('preds/tfidfovrsvcgs_preds', tfidfovrsvcgs_preds)
print(classification_report(Y_test, tfidfovrsvcgs_preds))

              precision    recall  f1-score   support

           0       0.57      0.75      0.65       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.72      0.70      0.71       210

    accuracy                           0.64       428
   macro avg       0.32      0.36      0.34       428
weighted avg       0.58      0.64      0.61       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.4.2.2. Linear TF-IDF (`f1_macro`)

In [None]:
tfidfovrsvc_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

tfidfovrsvcgs_f1_macro_model = RandomizedSearchCV(tfidfovrsvc_model, params, scoring='f1_macro', n_jobs =-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tfidfovrsvcgs_f1_macro_model.fit(X_train, Y_train)

print(tfidfovrsvcgs_f1_macro_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   52.5s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  2.9min finished


{'vectorizer__ngram_range': (1, 5), 'vectorizer__binary': True, 'classifier__estimator__base_estimator__C': 0.001}


In [None]:
tfidfovrsvcgs_f1_macro_preds = tfidfovrsvcgs_f1_macro_model.predict(X_test)
np.save('preds/tfidfovrsvcgs_f1_macro_preds', tfidfovrsvcgs_f1_macro_preds)
print(classification_report(Y_test, tfidfovrsvcgs_f1_macro_preds))

              precision    recall  f1-score   support

           0       0.57      0.75      0.65       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.72      0.70      0.71       210

    accuracy                           0.64       428
   macro avg       0.32      0.36      0.34       428
weighted avg       0.58      0.64      0.61       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.4.2.3. Linear TF-IDF (`class_weight`)

In [None]:
tfidfovrsvc_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC(class_weight='balanced')), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

tfidfovrsvcgs_class_weight_model = RandomizedSearchCV(tfidfovrsvc_model, params, n_jobs =-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tfidfovrsvcgs_class_weight_model.fit(X_train, Y_train)

print(tfidfovrsvcgs_class_weight_model.best_params_)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   53.5s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  2.9min finished


{'vectorizer__ngram_range': (3, 5), 'vectorizer__binary': False, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
tfidfovrsvcgs_class_weight_preds = tfidfovrsvcgs_class_weight_model.predict(X_test)
np.save('preds/tfidfovrsvcgs_class_weight_preds', tfidfovrsvcgs_class_weight_preds)
print(classification_report(Y_test, tfidfovrsvcgs_class_weight_preds))

              precision    recall  f1-score   support

           0       0.55      0.71      0.62       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.69      0.68      0.69       210

    accuracy                           0.62       428
   macro avg       0.31      0.35      0.33       428
weighted avg       0.56      0.62      0.59       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.4.2.4. Linear TF-IDF (`f1_macro`, `class_weight`)

In [None]:
tfidfovrsvc_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC(class_weight='balanced')), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

tfidfovrsvcgs_f1_class_model = RandomizedSearchCV(tfidfovrsvc_model, params, scoring='f1_macro', n_jobs =-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tfidfovrsvcgs_f1_class_model.fit(X_train, Y_train)

print(tfidfovrsvcgs_f1_class_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   53.6s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  3.0min finished


{'vectorizer__ngram_range': (3, 5), 'vectorizer__binary': False, 'classifier__estimator__base_estimator__C': 0.0001}


In [None]:
tfidfovrsvcgs_f1_class_preds = tfidfovrsvcgs_f1_class_model.predict(X_test)
np.save('preds/tfidfovrsvcgs_f1_class_preds', tfidfovrsvcgs_f1_class_preds)
print(classification_report(Y_test, tfidfovrsvcgs_f1_class_preds))

              precision    recall  f1-score   support

           0       0.55      0.71      0.62       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.69      0.68      0.69       210

    accuracy                           0.62       428
   macro avg       0.31      0.35      0.33       428
weighted avg       0.56      0.62      0.59       428



  _warn_prf(average, modifier, msg_start, len(result))


##### 2.4.2.5. Linear TF-IDF (oversampling)

In [None]:
tfidfovrsvc_model = Pipeline([
    ('vectorizer', TfidfVectorizer(analyzer='char_wb')),
    ('classifier', OneVsRestClassifier(CalibratedClassifierCV(LinearSVC()), n_jobs=-1))
])

params = {
    'vectorizer__binary' : VECTORIZER_BINARY,
    'vectorizer__ngram_range': VECTORIZER_N_GRAM,
    'classifier__estimator__base_estimator__C': SVC_C,
}

tfidfovrsvcgs_oversampling_model = RandomizedSearchCV(tfidfovrsvc_model, params, n_jobs =-1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

tfidfovrsvcgs_oversampling_model.fit(X_train_oversampling, Y_train_oversampling)

print(tfidfovrsvcgs_oversampling_model.best_params_)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:  2.0min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 16.3min finished


{'vectorizer__ngram_range': (5, 5), 'vectorizer__binary': False, 'classifier__estimator__base_estimator__C': 0.1}


In [None]:
tfidfovrsvcgs_oversampling_preds = tfidfovrsvcgs_oversampling_model.predict(X_test)
np.save('preds/tfidfovrsvcgs_oversampling_preds', tfidfovrsvcgs_oversampling_preds)
print(classification_report(Y_test, tfidfovrsvcgs_oversampling_preds))

              precision    recall  f1-score   support

           0       0.57      0.65      0.61       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.66      0.71      0.69       210

    accuracy                           0.61       428
   macro avg       0.31      0.34      0.32       428
weighted avg       0.56      0.61      0.58       428



### 2.5. Pre-trained language models (transformers)

#### 2.5.1. Collator

Collator function for batching the data for the model



In [None]:
import torch

class TextClassificationCollator:
    """Data collator for a text classification problem"""
    
    def __init__(self, tokenizer):
        """Initializes the collator with a tokenizer"""
        self.tokenizer = tokenizer
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    def encode_texts(self, texts):
        """Transforms an iterable of texts into a dictionary of model input tensors, stored in the GPU"""
        # Tokenize and encode texts as tensors, with maximum length
        tensors = self.tokenizer.batch_encode_plus(texts, padding="longest", return_tensors="pt")
        # Move tensors to GPU
        for key in tensors:
            tensors[key] = tensors[key].to(self.device)
        return tensors
    
    def __call__(self, patterns):
        """Collate a batch of patterns
        
        Arguments:
            - patterns: iterable of tuples in the form (text, class)
            
        Output: dictionary of torch tensors ready for model input
        """
        # Check kind of input
        if len(patterns) < 1: raise ValueError(f"At least one pattern is required for training, found {len(patterns)}")
        if not isinstance(patterns[0], (tuple, str)): raise ValueError(f"Each pattern must be one text, or a tuple with text and label. Found {patterns[0]}")
        targets_provided = len(patterns[0]) == 2
        # Split texts and classes from the input list of tuples
        if targets_provided:
            train_idx, targets = zip(*patterns)
        else:
            train_idx = patterns
        # Encode inputs
        input_tensors = self.encode_texts(train_idx)
        if targets_provided:
          # Transform class labels to a tensor in GPU
          Y = torch.tensor(targets).long().to(self.device)
        # Return batch as a dictionary wikth all the inputs tensors and the labels
        batch = {**input_tensors}
        if targets_provided:
          batch["labels"] = Y
        return batch

#### 2.5.2. Model

Define a scikit-learn compatible class for a Tranformers model

In [None]:
from copy import copy
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.metrics import f1_score
from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments

class TransformersClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, pretrained_model='dccuchile/bert-base-spanish-wwm-uncased', 
                 learning_rate=5e-5, num_train_epochs=1, per_device_train_batch_size=8, per_device_eval_batch_size=128, 
                 attention_probs_dropout_prob=0.1, hidden_dropout_prob=0.1, output_dir="./transformers_model", ):
        self.pretrained_model = pretrained_model
        self.learning_rate = learning_rate
        self.num_train_epochs = num_train_epochs
        self.per_device_train_batch_size = per_device_train_batch_size 
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.hidden_dropout_prob = hidden_dropout_prob
        self.output_dir = output_dir
        self.per_device_eval_batch_size = per_device_eval_batch_size

    def fit(self, X, y):
        # Clear GPU memory
        torch.cuda.empty_cache()
        # Prepare config
        num_labels = len(set(y))
        config = AutoConfig.from_pretrained(self.pretrained_model, num_labels=num_labels)
        config.attention_probs_dropout_prob = self.attention_probs_dropout_prob
        config.hidden_dropout_prob = self.hidden_dropout_prob
        # Prepare tokenizer
        tokenizer = AutoTokenizer.from_pretrained(self.pretrained_model)
        # Build collator
        collator = TextClassificationCollator(tokenizer)
        # Initialize model
        model = AutoModelForSequenceClassification.from_pretrained(self.pretrained_model, config=config)
        # Prepare training args
        training_args = TrainingArguments(
            output_dir=self.output_dir,
            overwrite_output_dir=True,
            per_device_eval_batch_size=self.per_device_eval_batch_size,
            disable_tqdm=True,
            learning_rate = self.learning_rate,
            num_train_epochs = self.num_train_epochs,
            per_device_train_batch_size = self.per_device_train_batch_size
        )
        # Initialize trainer
        self._trainer = Trainer(
            model=model,
            data_collator=collator,
            args=training_args,
            train_dataset=list(zip(X, y))
        )
        # Train
        self._trainer.train()

    def predict(self, X):
        preds = self._trainer.predict(X)
        return np.argmax(preds.predictions, axis=1)

    def score(self, X, y):
        preds = self.predict(X)
        return f1_score(y, preds, average="macro")

#### 2.5.3. Model training

Train basic model

In [None]:
model = TransformersClassifier(pretrained_model='dccuchile/bert-base-spanish-wwm-uncased', num_train_epochs=4)
model.fit(X_train, Y_train)

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'loss': 0.6036767578125, 'learning_rate': 2.0794392523364487e-05, 'epoch': 2.336448598130841}
{'epoch': 4.0}


Basic model metrics

In [None]:
from sklearn.metrics import classification_report
preds = model.predict(X_test)
print(classification_report(Y_test, preds))  
# 0.66-0.67 with BETO uncased and 4 epochs

              precision    recall  f1-score   support

           0       0.79      0.89      0.84       173
           1       0.69      0.65      0.67        31
           2       0.40      0.29      0.33        14
           3       0.93      0.87      0.90       210

    accuracy                           0.84       428
   macro avg       0.70      0.67      0.68       428
weighted avg       0.84      0.84      0.84       428



Run hyperparameters optimization

In [None]:
%%time
from skopt import BayesSearchCV
from skopt.space.space import Integer, Real
from sklearn.model_selection import StratifiedKFold

param_grid = {
    "pretrained_model": ['dccuchile/bert-base-spanish-wwm-uncased', 'dccuchile/bert-base-spanish-wwm-cased'],
    "learning_rate": Real(1e-6, 1e-4, "log-uniform"),
    "num_train_epochs": Integer(1, 10, "uniform"),
    "per_device_train_batch_size": [4, 8, 16, 32, 64],
    "attention_probs_dropout_prob": Real(0, 0.9, "uniform"),
    "hidden_dropout_prob": Real(0, 0.9, "uniform"),
}

cv_method = StratifiedKFold()

metamodel = BayesSearchCV(model, param_grid, n_iter=30, verbose=3, cv=cv_method, random_state=12345, error_score=0.0)
metamodel.fit(X_train, Y_train)

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 5.0}
[CV]  attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.203, total= 1.6min
[CV] attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.6min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 5.0}
[CV]  attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.208, total= 1.6min
[CV] attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  3.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 5.0}
[CV]  attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.223, total= 1.6min
[CV] attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 5.0}
[CV]  attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.231, total= 1.6min
[CV] attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 5.0}
[CV]  attention_probs_dropout_prob=0.36712704740889385, hidden_dropout_prob=0.8900230698958589, learning_rate=1.552111989984059e-05, num_train_epochs=5, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.229, total= 1.6min
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  8.1min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=242120.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=112.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=43.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=441944381.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.257, total= 1.1min
[CV] attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.1min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.313, total=  49.2s
[CV] attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.9min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.287, total=  49.7s
[CV] attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.283, total=  49.3s
[CV] attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.4626460576204746, hidden_dropout_prob=0.2866560163294204, learning_rate=3.5353403590329405e-06, num_train_epochs=2, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.277, total=  48.8s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  4.3min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassif

{'loss': 1.0813143310546875, 'learning_rate': 2.3220342612635823e-06, 'epoch': 5.813953488372093}
{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.152, total= 2.1min
[CV] attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.1min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.088072021484375, 'learning_rate': 2.3220342612635823e-06, 'epoch': 5.813953488372093}
{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.156, total= 2.1min
[CV] attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  4.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.0956588134765626, 'learning_rate': 2.3220342612635823e-06, 'epoch': 5.813953488372093}
{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.156, total= 2.1min
[CV] attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.089074462890625, 'learning_rate': 2.3220342612635823e-06, 'epoch': 5.813953488372093}
{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.155, total= 2.1min
[CV] attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.0919476318359376, 'learning_rate': 2.3220342612635823e-06, 'epoch': 5.813953488372093}
{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.041508544006946244, hidden_dropout_prob=0.6683973763039132, learning_rate=7.488560492575054e-05, num_train_epochs=6, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.180, total= 2.1min
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 10.6min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClass

{'loss': 1.11713916015625, 'learning_rate': 9.352646795810219e-06, 'epoch': 5.813953488372093}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.166, total= 3.1min
[CV] attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.1min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'loss': 1.1155830078125, 'learning_rate': 9.352646795810219e-06, 'epoch': 5.813953488372093}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.194, total= 3.1min
[CV] attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  6.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'loss': 1.12138232421875, 'learning_rate': 9.352646795810219e-06, 'epoch': 5.813953488372093}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.222, total= 3.1min
[CV] attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'loss': 1.1212049560546875, 'learning_rate': 9.352646795810219e-06, 'epoch': 5.813953488372093}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.211, total= 3.1min
[CV] attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'loss': 1.1248878173828125, 'learning_rate': 9.352646795810219e-06, 'epoch': 5.813953488372093}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.22314680177365268, hidden_dropout_prob=0.743192364069995, learning_rate=2.641952051079237e-05, num_train_epochs=9, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.182, total= 3.1min
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 15.4min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClass

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.281, total= 1.4min
[CV] attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.4min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.185, total= 1.4min
[CV] attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.8min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.183, total= 1.4min
[CV] attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.286, total= 1.4min
[CV] attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.8685989173436767, hidden_dropout_prob=0.6623935066577347, learning_rate=1.0930694523889127e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.295, total= 1.4min
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  6.9min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassif

{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.163, total= 2.8min
[CV] attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.8min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.292, total= 2.8min
[CV] attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  5.6min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.260, total= 2.8min
[CV] attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.176, total= 2.8min
[CV] attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.3691514693845362, hidden_dropout_prob=0.7104066631507853, learning_rate=1.233186483718184e-06, num_train_epochs=9, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.230, total= 2.8min
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 14.0min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClass

{'epoch': 3.0}
[CV]  attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.413, total= 1.0min
[CV] attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 3.0}
[CV]  attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.390, total= 1.0min
[CV] attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 3.0}
[CV]  attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.384, total= 1.0min
[CV] attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 3.0}
[CV]  attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.401, total= 1.0min
[CV] attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 3.0}
[CV]  attention_probs_dropout_prob=0.03317383878440064, hidden_dropout_prob=0.3094794046302785, learning_rate=4.604040451225829e-05, num_train_epochs=3, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.408, total= 1.0min
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  5.1min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClass

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.203, total=  44.6s
[CV] attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   44.6s remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.244, total=  44.2s
[CV] attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.259, total=  44.4s
[CV] attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.264, total=  44.3s
[CV] attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.8431661252980369, hidden_dropout_prob=0.5125256937648924, learning_rate=2.8419114324804943e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.280, total=  44.8s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.7min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassif

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.340, total= 1.4min
[CV] attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.4min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.357, total= 1.4min
[CV] attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.7min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.346, total= 1.4min
[CV] attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.353, total= 1.4min
[CV] attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.21035384276262642, hidden_dropout_prob=0.3577159636430006, learning_rate=9.098505378842472e-06, num_train_epochs=4, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.359, total= 1.4min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  6.9min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.253, total= 2.5min
[CV] attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.243, total= 2.5min
[CV] attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  5.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.248, total= 2.5min
[CV] attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.185, total= 2.5min
[CV] attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.6629296110884636, hidden_dropout_prob=0.4455171842906857, learning_rate=1.653381915174362e-06, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.190, total= 2.5min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 12.6min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 0.6683850708007812, 'learning_rate': 2.04635118490275e-06, 'epoch': 2.9069767441860463}
{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.414, total= 1.6min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.6min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.6699216918945312, 'learning_rate': 2.04635118490275e-06, 'epoch': 2.9069767441860463}
{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.406, total= 1.6min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  3.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.68366162109375, 'learning_rate': 2.04635118490275e-06, 'epoch': 2.9069767441860463}
{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.466, total= 1.6min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.699948486328125, 'learning_rate': 2.04635118490275e-06, 'epoch': 2.9069767441860463}
{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.419, total= 1.6min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.7024149780273438, 'learning_rate': 2.04635118490275e-06, 'epoch': 2.9069767441860463}
{'epoch': 4.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.04345946292316195, learning_rate=7.488774549005809e-06, num_train_epochs=4, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.420, total= 1.8min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  8.3min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.405, total=  45.1s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   45.1s remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.405, total=  44.2s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.418, total=  44.4s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.420, total=  44.7s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 2.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.14780354243941785, learning_rate=3.270187943309578e-05, num_train_epochs=2, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.422, total=  44.5s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.370, total= 3.0min
[CV] attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.364, total= 3.0min
[CV] attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  6.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.346, total= 3.0min
[CV] attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.390, total= 3.0min
[CV] attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.25058398844147933, hidden_dropout_prob=0.0, learning_rate=5.893355059857079e-06, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.386, total= 3.0min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 15.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.361, total=  23.7s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   23.7s remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.367, total=  23.5s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   47.2s remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.381, total=  23.0s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.381, total=  23.1s
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, learning_rate=0.0001, num_train_epochs=1, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.409, total=  23.6s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  1.9min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'epoch': 7.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.319, total= 2.1min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.1min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 7.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.264, total= 2.1min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  4.3min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 7.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.271, total= 2.1min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 7.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.176, total= 2.2min
[CV] attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 7.0}
[CV]  attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.2133450718085503, learning_rate=1e-06, num_train_epochs=7, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.162, total= 2.1min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 10.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_train_epochs=9, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 0.9052290649414062, 'learning_rate': 1.7931762417415112e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7354226684570313, 'learning_rate': 1.4466017845050382e-05, 'epoch': 2.9154518950437316}
{'loss': 0.5259566650390625, 'learning_rate': 1.1000273272685654e-05, 'epoch': 4.373177842565598}
{'loss': 0.36916845703125, 'learning_rate': 7.534528700320923e-06, 'epoch': 5.830903790087463}
{'loss': 0.231684326171875, 'learning_rate': 4.068784127956192e-06, 'epoch': 7.288629737609329}
{'loss': 0.184382568359375, 'learning_rate': 6.03039555591463e-07, 'epoch': 8.746355685131196}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_train_epochs=9, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.538, total= 5.0min
[CV] attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.8887540283203125, 'learning_rate': 1.7931762417415112e-05, 'epoch': 1.4577259475218658}
{'loss': 0.737337158203125, 'learning_rate': 1.4466017845050382e-05, 'epoch': 2.9154518950437316}
{'loss': 0.5693077392578125, 'learning_rate': 1.1000273272685654e-05, 'epoch': 4.373177842565598}
{'loss': 0.390347412109375, 'learning_rate': 7.534528700320923e-06, 'epoch': 5.830903790087463}
{'loss': 0.24666650390625, 'learning_rate': 4.068784127956192e-06, 'epoch': 7.288629737609329}
{'loss': 0.166180419921875, 'learning_rate': 6.03039555591463e-07, 'epoch': 8.746355685131196}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_train_epochs=9, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.591, total= 5.0min
[CV] attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_t

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 10.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.87962353515625, 'learning_rate': 1.7931762417415112e-05, 'epoch': 1.4577259475218658}
{'loss': 0.661344482421875, 'learning_rate': 1.4466017845050382e-05, 'epoch': 2.9154518950437316}
{'loss': 0.4279091796875, 'learning_rate': 1.1000273272685654e-05, 'epoch': 4.373177842565598}
{'loss': 0.3235322265625, 'learning_rate': 7.534528700320923e-06, 'epoch': 5.830903790087463}
{'loss': 0.20436328125, 'learning_rate': 4.068784127956192e-06, 'epoch': 7.288629737609329}
{'loss': 0.13380517578125, 'learning_rate': 6.03039555591463e-07, 'epoch': 8.746355685131196}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_train_epochs=9, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.642, total= 5.1min
[CV] attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_train_epochs

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.8928929443359375, 'learning_rate': 1.7931762417415112e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7714669189453125, 'learning_rate': 1.4466017845050382e-05, 'epoch': 2.9154518950437316}
{'loss': 0.5113388671875, 'learning_rate': 1.1000273272685654e-05, 'epoch': 4.373177842565598}
{'loss': 0.360656005859375, 'learning_rate': 7.534528700320923e-06, 'epoch': 5.830903790087463}
{'loss': 0.2364052734375, 'learning_rate': 4.068784127956192e-06, 'epoch': 7.288629737609329}
{'loss': 0.169201416015625, 'learning_rate': 6.03039555591463e-07, 'epoch': 8.746355685131196}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_train_epochs=9, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.665, total= 5.1min
[CV] attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_trai

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.8764473876953125, 'learning_rate': 1.7931762417415112e-05, 'epoch': 1.4577259475218658}
{'loss': 0.75494970703125, 'learning_rate': 1.4466017845050382e-05, 'epoch': 2.9154518950437316}
{'loss': 0.4814168701171875, 'learning_rate': 1.1000273272685654e-05, 'epoch': 4.373177842565598}
{'loss': 0.287920654296875, 'learning_rate': 7.534528700320923e-06, 'epoch': 5.830903790087463}
{'loss': 0.1749130859375, 'learning_rate': 4.068784127956192e-06, 'epoch': 7.288629737609329}
{'loss': 0.142530517578125, 'learning_rate': 6.03039555591463e-07, 'epoch': 8.746355685131196}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.5811198071610473, hidden_dropout_prob=0.04922281119176865, learning_rate=2.139750698977984e-05, num_train_epochs=9, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.643, total= 5.0min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 25.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 0.8865775146484375, 'learning_rate': 2.8045437369074426e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7164647216796876, 'learning_rate': 2.3259526555239202e-05, 'epoch': 2.9154518950437316}
{'loss': 0.5023603515625, 'learning_rate': 1.8473615741403974e-05, 'epoch': 4.373177842565598}
{'loss': 0.28520458984375, 'learning_rate': 1.3687704927568749e-05, 'epoch': 5.830903790087463}
{'loss': 0.1622255859375, 'learning_rate': 8.901794113733521e-06, 'epoch': 7.288629737609329}
{'loss': 0.08736083984375, 'learning_rate': 4.115883299898295e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.624, total= 5.5min
[CV] attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_tr

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.922276123046875, 'learning_rate': 2.8045437369074426e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7674910888671875, 'learning_rate': 2.3259526555239202e-05, 'epoch': 2.9154518950437316}
{'loss': 0.6139119873046875, 'learning_rate': 1.8473615741403974e-05, 'epoch': 4.373177842565598}
{'loss': 0.4315439453125, 'learning_rate': 1.3687704927568749e-05, 'epoch': 5.830903790087463}
{'loss': 0.28943505859375, 'learning_rate': 8.901794113733521e-06, 'epoch': 7.288629737609329}
{'loss': 0.1570849609375, 'learning_rate': 4.115883299898295e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.655, total= 5.5min
[CV] attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_t

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 11.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.9309478149414062, 'learning_rate': 2.8045437369074426e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7310270385742188, 'learning_rate': 2.3259526555239202e-05, 'epoch': 2.9154518950437316}
{'loss': 0.4825712890625, 'learning_rate': 1.8473615741403974e-05, 'epoch': 4.373177842565598}
{'loss': 0.34726953125, 'learning_rate': 1.3687704927568749e-05, 'epoch': 5.830903790087463}
{'loss': 0.22194482421875, 'learning_rate': 8.901794113733521e-06, 'epoch': 7.288629737609329}
{'loss': 0.144838134765625, 'learning_rate': 4.115883299898295e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.651, total= 5.5min
[CV] attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_tra

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.904473876953125, 'learning_rate': 2.8045437369074426e-05, 'epoch': 1.4577259475218658}
{'loss': 0.789681396484375, 'learning_rate': 2.3259526555239202e-05, 'epoch': 2.9154518950437316}
{'loss': 0.52916943359375, 'learning_rate': 1.8473615741403974e-05, 'epoch': 4.373177842565598}
{'loss': 0.380943115234375, 'learning_rate': 1.3687704927568749e-05, 'epoch': 5.830903790087463}
{'loss': 0.185896484375, 'learning_rate': 8.901794113733521e-06, 'epoch': 7.288629737609329}
{'loss': 0.1382666015625, 'learning_rate': 4.115883299898295e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.614, total= 5.5min
[CV] attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_trai

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.9049329833984375, 'learning_rate': 2.8045437369074426e-05, 'epoch': 1.4577259475218658}
{'loss': 0.80059033203125, 'learning_rate': 2.3259526555239202e-05, 'epoch': 2.9154518950437316}
{'loss': 0.6377781982421875, 'learning_rate': 1.8473615741403974e-05, 'epoch': 4.373177842565598}
{'loss': 0.40250634765625, 'learning_rate': 1.3687704927568749e-05, 'epoch': 5.830903790087463}
{'loss': 0.286391357421875, 'learning_rate': 8.901794113733521e-06, 'epoch': 7.288629737609329}
{'loss': 0.17134375, 'learning_rate': 4.115883299898295e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6057180619823983, hidden_dropout_prob=0.04161782752970779, learning_rate=3.283134818290965e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.619, total= 5.5min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 27.5min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 1.0764429931640624, 'learning_rate': 8.542274052478134e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0620723876953124, 'learning_rate': 7.08454810495627e-05, 'epoch': 2.9154518950437316}
{'loss': 1.062134765625, 'learning_rate': 5.626822157434403e-05, 'epoch': 4.373177842565598}
{'loss': 1.042786376953125, 'learning_rate': 4.1690962099125366e-05, 'epoch': 5.830903790087463}
{'loss': 1.0551962890625, 'learning_rate': 2.7113702623906705e-05, 'epoch': 7.288629737609329}
{'loss': 1.04329345703125, 'learning_rate': 1.2536443148688048e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.156, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=d

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.6min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.0989385986328124, 'learning_rate': 8.542274052478134e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0586090087890625, 'learning_rate': 7.08454810495627e-05, 'epoch': 2.9154518950437316}
{'loss': 1.057703369140625, 'learning_rate': 5.626822157434403e-05, 'epoch': 4.373177842565598}
{'loss': 1.05137548828125, 'learning_rate': 4.1690962099125366e-05, 'epoch': 5.830903790087463}
{'loss': 1.0471904296875, 'learning_rate': 2.7113702623906705e-05, 'epoch': 7.288629737609329}
{'loss': 1.04178515625, 'learning_rate': 1.2536443148688048e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.153, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dc

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 11.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.10276025390625, 'learning_rate': 8.542274052478134e-05, 'epoch': 1.4577259475218658}
{'loss': 1.090162841796875, 'learning_rate': 7.08454810495627e-05, 'epoch': 2.9154518950437316}
{'loss': 1.079825927734375, 'learning_rate': 5.626822157434403e-05, 'epoch': 4.373177842565598}
{'loss': 1.06939990234375, 'learning_rate': 4.1690962099125366e-05, 'epoch': 5.830903790087463}
{'loss': 1.05262744140625, 'learning_rate': 2.7113702623906705e-05, 'epoch': 7.288629737609329}
{'loss': 1.0617705078125, 'learning_rate': 1.2536443148688048e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.153, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dc

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.0974371337890625, 'learning_rate': 8.542274052478134e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0936380615234376, 'learning_rate': 7.08454810495627e-05, 'epoch': 2.9154518950437316}
{'loss': 1.07395263671875, 'learning_rate': 5.626822157434403e-05, 'epoch': 4.373177842565598}
{'loss': 1.049650390625, 'learning_rate': 4.1690962099125366e-05, 'epoch': 5.830903790087463}
{'loss': 1.05623876953125, 'learning_rate': 2.7113702623906705e-05, 'epoch': 7.288629737609329}
{'loss': 1.03886572265625, 'learning_rate': 1.2536443148688048e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.242, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=d

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.073612548828125, 'learning_rate': 8.542274052478134e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0978017578125, 'learning_rate': 7.08454810495627e-05, 'epoch': 2.9154518950437316}
{'loss': 1.074471435546875, 'learning_rate': 5.626822157434403e-05, 'epoch': 4.373177842565598}
{'loss': 1.07447314453125, 'learning_rate': 4.1690962099125366e-05, 'epoch': 5.830903790087463}
{'loss': 1.0431787109375, 'learning_rate': 2.7113702623906705e-05, 'epoch': 7.288629737609329}
{'loss': 1.0462587890625, 'learning_rate': 1.2536443148688048e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.1531449101061927, learning_rate=0.0001, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.155, total= 5.6min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 28.0min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 0.5979151000976562, 'learning_rate': 2.980860917248832e-05, 'epoch': 5.813953488372093}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.545, total= 3.4min
[CV] attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.4min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.43985409545898435, 'learning_rate': 2.980860917248832e-05, 'epoch': 5.813953488372093}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.740, total= 3.4min
[CV] attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  6.7min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.3979146423339844, 'learning_rate': 2.980860917248832e-05, 'epoch': 5.813953488372093}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.622, total= 3.4min
[CV] attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.5033441772460937, 'learning_rate': 2.980860917248832e-05, 'epoch': 5.813953488372093}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.641, total= 3.4min
[CV] attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.48504736328125, 'learning_rate': 2.980860917248832e-05, 'epoch': 5.813953488372093}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5485102676881799, hidden_dropout_prob=0.0, learning_rate=7.120945524538877e-05, num_train_epochs=10, per_device_train_batch_size=16, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.678, total= 3.4min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 16.9min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.297, total= 3.2min
[CV] attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.297, total= 3.1min
[CV] attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  6.3min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.296, total= 3.2min
[CV] attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.320, total= 3.2min
[CV] attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5505670976126504, hidden_dropout_prob=0.05908245884480623, learning_rate=1e-06, num_train_epochs=10, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.293, total= 3.2min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 15.8min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 0.928730224609375, 'learning_rate': 3.809552018192099e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7583128662109375, 'learning_rate': 3.159457817135427e-05, 'epoch': 2.9154518950437316}
{'loss': 0.4816934814453125, 'learning_rate': 2.5093636160787546e-05, 'epoch': 4.373177842565598}
{'loss': 0.308885986328125, 'learning_rate': 1.8592694150220825e-05, 'epoch': 5.830903790087463}
{'loss': 0.142725830078125, 'learning_rate': 1.2091752139654102e-05, 'epoch': 7.288629737609329}
{'loss': 0.075039794921875, 'learning_rate': 5.590810129087381e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.665, total= 5.4min
[CV] attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_t

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.4min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.8965134887695313, 'learning_rate': 3.809552018192099e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7119526977539062, 'learning_rate': 3.159457817135427e-05, 'epoch': 2.9154518950437316}
{'loss': 0.5003778076171875, 'learning_rate': 2.5093636160787546e-05, 'epoch': 4.373177842565598}
{'loss': 0.328534912109375, 'learning_rate': 1.8592694150220825e-05, 'epoch': 5.830903790087463}
{'loss': 0.173434326171875, 'learning_rate': 1.2091752139654102e-05, 'epoch': 7.288629737609329}
{'loss': 0.071444580078125, 'learning_rate': 5.590810129087381e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.716, total= 5.4min
[CV] attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 10.8min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.9468153076171875, 'learning_rate': 3.809552018192099e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7642442626953125, 'learning_rate': 3.159457817135427e-05, 'epoch': 2.9154518950437316}
{'loss': 0.465255126953125, 'learning_rate': 2.5093636160787546e-05, 'epoch': 4.373177842565598}
{'loss': 0.27895361328125, 'learning_rate': 1.8592694150220825e-05, 'epoch': 5.830903790087463}
{'loss': 0.17352099609375, 'learning_rate': 1.2091752139654102e-05, 'epoch': 7.288629737609329}
{'loss': 0.0810673828125, 'learning_rate': 5.590810129087381e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.643, total= 5.4min
[CV] attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_train

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.87654052734375, 'learning_rate': 3.809552018192099e-05, 'epoch': 1.4577259475218658}
{'loss': 0.683949462890625, 'learning_rate': 3.159457817135427e-05, 'epoch': 2.9154518950437316}
{'loss': 0.3759061279296875, 'learning_rate': 2.5093636160787546e-05, 'epoch': 4.373177842565598}
{'loss': 0.1771209716796875, 'learning_rate': 1.8592694150220825e-05, 'epoch': 5.830903790087463}
{'loss': 0.096230712890625, 'learning_rate': 1.2091752139654102e-05, 'epoch': 7.288629737609329}
{'loss': 0.0425478515625, 'learning_rate': 5.590810129087381e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.631, total= 5.4min
[CV] attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_trai

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.8985631103515626, 'learning_rate': 3.809552018192099e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7414349365234375, 'learning_rate': 3.159457817135427e-05, 'epoch': 2.9154518950437316}
{'loss': 0.458435791015625, 'learning_rate': 2.5093636160787546e-05, 'epoch': 4.373177842565598}
{'loss': 0.258039306640625, 'learning_rate': 1.8592694150220825e-05, 'epoch': 5.830903790087463}
{'loss': 0.134004638671875, 'learning_rate': 1.2091752139654102e-05, 'epoch': 7.288629737609329}
{'loss': 0.053185791015625, 'learning_rate': 5.590810129087381e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5609426282257046, hidden_dropout_prob=0.0, learning_rate=4.459646219248771e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.707, total= 5.4min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 27.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.394, total=  36.0s
[CV] attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   36.0s remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.408, total=  35.7s
[CV] attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.419, total=  35.7s
[CV] attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.420, total=  35.9s
[CV] attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 1.0}
[CV]  attention_probs_dropout_prob=0.4306134471329855, hidden_dropout_prob=0.0, learning_rate=4.6566554126392125e-05, num_train_epochs=1, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.420, total= 1.4min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.8min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'loss': 0.9385889282226563, 'learning_rate': 3.380321982825478e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7461533813476563, 'learning_rate': 2.8034752280771032e-05, 'epoch': 2.9154518950437316}
{'loss': 0.5861463623046875, 'learning_rate': 2.226628473328728e-05, 'epoch': 4.373177842565598}
{'loss': 0.4597294921875, 'learning_rate': 1.649781718580353e-05, 'epoch': 5.830903790087463}
{'loss': 0.28113818359375, 'learning_rate': 1.0729349638319777e-05, 'epoch': 7.288629737609329}
{'loss': 0.142553466796875, 'learning_rate': 4.960882090836027e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.580, total= 5.5min
[CV] attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'loss': 0.8377970581054688, 'learning_rate': 3.380321982825478e-05, 'epoch': 1.4577259475218658}
{'loss': 0.6105617065429687, 'learning_rate': 2.8034752280771032e-05, 'epoch': 2.9154518950437316}
{'loss': 0.315281982421875, 'learning_rate': 2.226628473328728e-05, 'epoch': 4.373177842565598}
{'loss': 0.16090625, 'learning_rate': 1.649781718580353e-05, 'epoch': 5.830903790087463}
{'loss': 0.0462977294921875, 'learning_rate': 1.0729349638319777e-05, 'epoch': 7.288629737609329}
{'loss': 0.0110638427734375, 'learning_rate': 4.960882090836027e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.717, total= 5.5min
[CV] attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_tra

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 10.9min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'loss': 0.8787210083007813, 'learning_rate': 3.380321982825478e-05, 'epoch': 1.4577259475218658}
{'loss': 0.6044907836914063, 'learning_rate': 2.8034752280771032e-05, 'epoch': 2.9154518950437316}
{'loss': 0.291861572265625, 'learning_rate': 2.226628473328728e-05, 'epoch': 4.373177842565598}
{'loss': 0.1618255615234375, 'learning_rate': 1.649781718580353e-05, 'epoch': 5.830903790087463}
{'loss': 0.063600830078125, 'learning_rate': 1.0729349638319777e-05, 'epoch': 7.288629737609329}
{'loss': 0.0043477783203125, 'learning_rate': 4.960882090836027e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.649, total= 5.4min
[CV] attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_dev

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'loss': 0.8787764892578125, 'learning_rate': 3.380321982825478e-05, 'epoch': 1.4577259475218658}
{'loss': 0.6248033447265625, 'learning_rate': 2.8034752280771032e-05, 'epoch': 2.9154518950437316}
{'loss': 0.353352783203125, 'learning_rate': 2.226628473328728e-05, 'epoch': 4.373177842565598}
{'loss': 0.163471923828125, 'learning_rate': 1.649781718580353e-05, 'epoch': 5.830903790087463}
{'loss': 0.071937255859375, 'learning_rate': 1.0729349638319777e-05, 'epoch': 7.288629737609329}
{'loss': 0.030339111328125, 'learning_rate': 4.960882090836027e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.649, total= 5.4min
[CV] attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_devic

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'loss': 0.8611641845703125, 'learning_rate': 3.380321982825478e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7359736328125, 'learning_rate': 2.8034752280771032e-05, 'epoch': 2.9154518950437316}
{'loss': 0.3899293212890625, 'learning_rate': 2.226628473328728e-05, 'epoch': 4.373177842565598}
{'loss': 0.154192138671875, 'learning_rate': 1.649781718580353e-05, 'epoch': 5.830903790087463}
{'loss': 0.08099658203125, 'learning_rate': 1.0729349638319777e-05, 'epoch': 7.288629737609329}
{'loss': 0.050704345703125, 'learning_rate': 4.960882090836027e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.4946962679820598, hidden_dropout_prob=0.0, learning_rate=3.9571687375738534e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.667, total= 5.4min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 27.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 1.07624609375, 'learning_rate': 6.902862518739578e-05, 'epoch': 1.4577259475218658}
{'loss': 1.05771142578125, 'learning_rate': 5.7248996315826544e-05, 'epoch': 2.9154518950437316}
{'loss': 1.05904248046875, 'learning_rate': 4.5469367444257294e-05, 'epoch': 4.373177842565598}
{'loss': 1.05082373046875, 'learning_rate': 3.368973857268805e-05, 'epoch': 5.830903790087463}
{'loss': 1.05059033203125, 'learning_rate': 2.1910109701118796e-05, 'epoch': 7.288629737609329}
{'loss': 1.04964208984375, 'learning_rate': 1.0130480829549553e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.152, total= 5.6min
[CV] attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train_e

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.6min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.0740211181640624, 'learning_rate': 6.902862518739578e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0605758056640624, 'learning_rate': 5.7248996315826544e-05, 'epoch': 2.9154518950437316}
{'loss': 1.060532470703125, 'learning_rate': 4.5469367444257294e-05, 'epoch': 4.373177842565598}
{'loss': 1.0503720703125, 'learning_rate': 3.368973857268805e-05, 'epoch': 5.830903790087463}
{'loss': 1.050845703125, 'learning_rate': 2.1910109701118796e-05, 'epoch': 7.288629737609329}
{'loss': 1.04433447265625, 'learning_rate': 1.0130480829549553e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.156, total= 5.6min
[CV] attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_tr

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 11.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.09622998046875, 'learning_rate': 6.902862518739578e-05, 'epoch': 1.4577259475218658}
{'loss': 1.065376708984375, 'learning_rate': 5.7248996315826544e-05, 'epoch': 2.9154518950437316}
{'loss': 1.050794921875, 'learning_rate': 4.5469367444257294e-05, 'epoch': 4.373177842565598}
{'loss': 1.060814208984375, 'learning_rate': 3.368973857268805e-05, 'epoch': 5.830903790087463}
{'loss': 1.04836083984375, 'learning_rate': 2.1910109701118796e-05, 'epoch': 7.288629737609329}
{'loss': 1.0480185546875, 'learning_rate': 1.0130480829549553e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.317, total= 5.6min
[CV] attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.0855374755859375, 'learning_rate': 6.902862518739578e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0691180419921875, 'learning_rate': 5.7248996315826544e-05, 'epoch': 2.9154518950437316}
{'loss': 1.0617890625, 'learning_rate': 4.5469367444257294e-05, 'epoch': 4.373177842565598}
{'loss': 1.050035400390625, 'learning_rate': 3.368973857268805e-05, 'epoch': 5.830903790087463}
{'loss': 1.054794921875, 'learning_rate': 2.1910109701118796e-05, 'epoch': 7.288629737609329}
{'loss': 1.03597509765625, 'learning_rate': 1.0130480829549553e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.155, total= 5.6min
[CV] attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.071425537109375, 'learning_rate': 6.902862518739578e-05, 'epoch': 1.4577259475218658}
{'loss': 1.072355712890625, 'learning_rate': 5.7248996315826544e-05, 'epoch': 2.9154518950437316}
{'loss': 1.058560302734375, 'learning_rate': 4.5469367444257294e-05, 'epoch': 4.373177842565598}
{'loss': 1.055194091796875, 'learning_rate': 3.368973857268805e-05, 'epoch': 5.830903790087463}
{'loss': 1.0455595703125, 'learning_rate': 2.1910109701118796e-05, 'epoch': 7.288629737609329}
{'loss': 1.04616748046875, 'learning_rate': 1.0130480829549553e-05, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5289975614009103, hidden_dropout_prob=0.10086717174739536, learning_rate=8.080825405896503e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.153, total= 5.6min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 28.0min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.539, total= 2.5min
[CV] attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.654, total= 2.5min
[CV] attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  5.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.553, total= 2.5min
[CV] attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.604, total= 2.5min
[CV] attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 8.0}
[CV]  attention_probs_dropout_prob=0.00425392459679252, hidden_dropout_prob=0.017208961715406236, learning_rate=3.7408016305073955e-05, num_train_epochs=8, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.637, total= 2.5min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 12.5min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 1.066190185546875, 'learning_rate': 3.54351482643156e-05, 'epoch': 1.4577259475218658}
{'loss': 1.047540283203125, 'learning_rate': 2.9388194635592803e-05, 'epoch': 2.9154518950437316}
{'loss': 1.042002197265625, 'learning_rate': 2.3341241006870005e-05, 'epoch': 4.373177842565598}
{'loss': 1.037540771484375, 'learning_rate': 1.7294287378147206e-05, 'epoch': 5.830903790087463}
{'loss': 1.0388828125, 'learning_rate': 1.1247333749424406e-05, 'epoch': 7.288629737609329}
{'loss': 1.03704541015625, 'learning_rate': 5.200380120701608e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.156, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_b

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.6min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.0666702880859376, 'learning_rate': 3.54351482643156e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0526925048828124, 'learning_rate': 2.9388194635592803e-05, 'epoch': 2.9154518950437316}
{'loss': 1.056194580078125, 'learning_rate': 2.3341241006870005e-05, 'epoch': 4.373177842565598}
{'loss': 1.055684814453125, 'learning_rate': 1.7294287378147206e-05, 'epoch': 5.830903790087463}
{'loss': 1.0557587890625, 'learning_rate': 1.1247333749424406e-05, 'epoch': 7.288629737609329}
{'loss': 1.033478515625, 'learning_rate': 5.200380120701608e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.156, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_trai

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 11.2min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 1.0788060302734375, 'learning_rate': 3.54351482643156e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0502347412109374, 'learning_rate': 2.9388194635592803e-05, 'epoch': 2.9154518950437316}
{'loss': 1.047150634765625, 'learning_rate': 2.3341241006870005e-05, 'epoch': 4.373177842565598}
{'loss': 1.05912646484375, 'learning_rate': 1.7294287378147206e-05, 'epoch': 5.830903790087463}
{'loss': 1.04002294921875, 'learning_rate': 1.1247333749424406e-05, 'epoch': 7.288629737609329}
{'loss': 1.04771484375, 'learning_rate': 5.200380120701608e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.167, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.0653226318359375, 'learning_rate': 3.54351482643156e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0649564208984375, 'learning_rate': 2.9388194635592803e-05, 'epoch': 2.9154518950437316}
{'loss': 1.044835205078125, 'learning_rate': 2.3341241006870005e-05, 'epoch': 4.373177842565598}
{'loss': 1.0515322265625, 'learning_rate': 1.7294287378147206e-05, 'epoch': 5.830903790087463}
{'loss': 1.04205908203125, 'learning_rate': 1.1247333749424406e-05, 'epoch': 7.288629737609329}
{'loss': 1.02607421875, 'learning_rate': 5.200380120701608e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.162, total= 5.6min
[CV] attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 1.0626864013671875, 'learning_rate': 3.54351482643156e-05, 'epoch': 1.4577259475218658}
{'loss': 1.0648590087890626, 'learning_rate': 2.9388194635592803e-05, 'epoch': 2.9154518950437316}
{'loss': 1.050308349609375, 'learning_rate': 2.3341241006870005e-05, 'epoch': 4.373177842565598}
{'loss': 1.046664794921875, 'learning_rate': 1.7294287378147206e-05, 'epoch': 5.830903790087463}
{'loss': 1.03724658203125, 'learning_rate': 1.1247333749424406e-05, 'epoch': 7.288629737609329}
{'loss': 1.035388671875, 'learning_rate': 5.200380120701608e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.9, hidden_dropout_prob=0.007168347355307692, learning_rate=4.14821018930384e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.155, total= 5.6min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 28.0min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 0.891261474609375, 'learning_rate': 2.7695305548848594e-05, 'epoch': 1.4577259475218658}
{'loss': 0.711755126953125, 'learning_rate': 2.2969144192389792e-05, 'epoch': 2.9154518950437316}
{'loss': 0.4674609375, 'learning_rate': 1.8242982835930987e-05, 'epoch': 4.373177842565598}
{'loss': 0.239456787109375, 'learning_rate': 1.3516821479472181e-05, 'epoch': 5.830903790087463}
{'loss': 0.142550537109375, 'learning_rate': 8.790660123013376e-06, 'epoch': 7.288629737609329}
{'loss': 0.07103076171875, 'learning_rate': 4.0644987665545726e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.649, total= 5.5min
[CV] attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_train

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  5.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.8847770385742187, 'learning_rate': 2.7695305548848594e-05, 'epoch': 1.4577259475218658}
{'loss': 0.7331024780273437, 'learning_rate': 2.2969144192389792e-05, 'epoch': 2.9154518950437316}
{'loss': 0.4540281982421875, 'learning_rate': 1.8242982835930987e-05, 'epoch': 4.373177842565598}
{'loss': 0.25620947265625, 'learning_rate': 1.3516821479472181e-05, 'epoch': 5.830903790087463}
{'loss': 0.13089013671875, 'learning_rate': 8.790660123013376e-06, 'epoch': 7.288629737609329}
{'loss': 0.074353515625, 'learning_rate': 4.0644987665545726e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.691, total= 5.5min
[CV] attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_t

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 10.9min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.8982708129882813, 'learning_rate': 2.7695305548848594e-05, 'epoch': 1.4577259475218658}
{'loss': 0.6198584594726563, 'learning_rate': 2.2969144192389792e-05, 'epoch': 2.9154518950437316}
{'loss': 0.33041015625, 'learning_rate': 1.8242982835930987e-05, 'epoch': 4.373177842565598}
{'loss': 0.2063953857421875, 'learning_rate': 1.3516821479472181e-05, 'epoch': 5.830903790087463}
{'loss': 0.094479248046875, 'learning_rate': 8.790660123013376e-06, 'epoch': 7.288629737609329}
{'loss': 0.0477451171875, 'learning_rate': 4.0644987665545726e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.698, total= 5.5min
[CV] attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_tr

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.9085072631835938, 'learning_rate': 2.7695305548848594e-05, 'epoch': 1.4577259475218658}
{'loss': 0.6814048461914063, 'learning_rate': 2.2969144192389792e-05, 'epoch': 2.9154518950437316}
{'loss': 0.4446671142578125, 'learning_rate': 1.8242982835930987e-05, 'epoch': 4.373177842565598}
{'loss': 0.2257677001953125, 'learning_rate': 1.3516821479472181e-05, 'epoch': 5.830903790087463}
{'loss': 0.127534423828125, 'learning_rate': 8.790660123013376e-06, 'epoch': 7.288629737609329}
{'loss': 0.074672119140625, 'learning_rate': 4.0644987665545726e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.549, total= 5.5min
[CV] attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05,

Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.8305989379882812, 'learning_rate': 2.7695305548848594e-05, 'epoch': 1.4577259475218658}
{'loss': 0.6307247924804688, 'learning_rate': 2.2969144192389792e-05, 'epoch': 2.9154518950437316}
{'loss': 0.3372464599609375, 'learning_rate': 1.8242982835930987e-05, 'epoch': 4.373177842565598}
{'loss': 0.1381864013671875, 'learning_rate': 1.3516821479472181e-05, 'epoch': 5.830903790087463}
{'loss': 0.0722908935546875, 'learning_rate': 8.790660123013376e-06, 'epoch': 7.288629737609329}
{'loss': 0.0218458251953125, 'learning_rate': 4.0644987665545726e-06, 'epoch': 8.746355685131196}
{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.5582518438325175, hidden_dropout_prob=0.01113854293589334, learning_rate=3.24214669053074e-05, num_train_epochs=10, per_device_train_batch_size=4, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.720, total= 5.4min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 27.3min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were n

{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.494, total= 1.9min
[CV] attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.9min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.410, total= 1.9min
[CV] attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  3.8min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were no

{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.475, total= 1.9min
[CV] attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.447, total= 1.9min
[CV] attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-uncas

{'epoch': 6.0}
[CV]  attention_probs_dropout_prob=0.594666189808815, hidden_dropout_prob=0.0036390475755954596, learning_rate=3.5833975465238777e-05, num_train_epochs=6, per_device_train_batch_size=32, pretrained_model=dccuchile/bert-base-spanish-wwm-uncased, score=0.414, total= 1.9min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  9.6min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'loss': 0.7121862182617188, 'learning_rate': 6.132838263254131e-05, 'epoch': 2.9069767441860463}
{'loss': 0.33038348388671873, 'learning_rate': 3.206865809411511e-05, 'epoch': 5.813953488372093}
{'loss': 0.08090625, 'learning_rate': 2.808933555688915e-06, 'epoch': 8.720930232558139}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.624, total= 3.5min
[CV] attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.5min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.7615147094726562, 'learning_rate': 6.132838263254131e-05, 'epoch': 2.9069767441860463}
{'loss': 0.4382083129882812, 'learning_rate': 3.206865809411511e-05, 'epoch': 5.813953488372093}
{'loss': 0.24873095703125, 'learning_rate': 2.808933555688915e-06, 'epoch': 8.720930232558139}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.496, total= 3.5min
[CV] attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  7.1min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'loss': 0.6813892211914062, 'learning_rate': 6.132838263254131e-05, 'epoch': 2.9069767441860463}
{'loss': 0.20207513427734375, 'learning_rate': 3.206865809411511e-05, 'epoch': 5.813953488372093}
{'loss': 0.03752581787109375, 'learning_rate': 2.808933555688915e-06, 'epoch': 8.720930232558139}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.620, total= 3.5min
[CV] attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.7203907470703125, 'learning_rate': 6.132838263254131e-05, 'epoch': 2.9069767441860463}
{'loss': 0.2709984130859375, 'learning_rate': 3.206865809411511e-05, 'epoch': 5.813953488372093}
{'loss': 0.04492236328125, 'learning_rate': 2.808933555688915e-06, 'epoch': 8.720930232558139}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.524, total= 3.5min
[CV] attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'loss': 0.6601710205078125, 'learning_rate': 6.132838263254131e-05, 'epoch': 2.9069767441860463}
{'loss': 0.17030615234375, 'learning_rate': 3.206865809411511e-05, 'epoch': 5.813953488372093}
{'loss': 0.0173367919921875, 'learning_rate': 2.808933555688915e-06, 'epoch': 8.720930232558139}
{'epoch': 9.0}
[CV]  attention_probs_dropout_prob=0.11877989840278479, hidden_dropout_prob=0.0, learning_rate=9.05881071709675e-05, num_train_epochs=9, per_device_train_batch_size=8, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.617, total= 3.5min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 17.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.401, total= 3.0min
[CV] attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.444, total= 3.0min
[CV] attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  6.0min remaining:    0.0s
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not 

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.443, total= 3.0min
[CV] attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.409, total= 3.0min
[CV] attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased 


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchile/bert-base-spanish-wwm-cased a

{'epoch': 10.0}
[CV]  attention_probs_dropout_prob=0.6093773596951091, hidden_dropout_prob=0.027175797976312522, learning_rate=2.007004297760599e-05, num_train_epochs=10, per_device_train_batch_size=64, pretrained_model=dccuchile/bert-base-spanish-wwm-cased, score=0.410, total= 3.0min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 15.1min finished
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialize

{'loss': 0.874583251953125, 'learning_rate': 3.9386595113926067e-05, 'epoch': 1.1682242990654206}
{'loss': 0.6801448974609375, 'learning_rate': 3.4176728035364416e-05, 'epoch': 2.336448598130841}
{'loss': 0.5423587646484375, 'learning_rate': 2.8966860956802765e-05, 'epoch': 3.5046728971962615}
{'loss': 0.387064208984375, 'learning_rate': 2.3756993878241114e-05, 'epoch': 4.672897196261682}
{'loss': 0.230538818359375, 'learning_rate': 1.8547126799679467e-05, 'epoch': 5.841121495327103}
{'loss': 0.13231640625, 'learning_rate': 1.3337259721117818e-05, 'epoch': 7.009345794392523}
{'loss': 0.064142333984375, 'learning_rate': 8.12739264255617e-06, 'epoch': 8.177570093457945}
{'loss': 0.05944580078125, 'learning_rate': 2.9175255639945227e-06, 'epoch': 9.345794392523365}
{'epoch': 10.0}
CPU times: user 5h 35min 27s, sys: 1h 38min 25s, total: 7h 13min 53s
Wall time: 7h 24min 31s


In [None]:
metamodel.best_params_

OrderedDict([('attention_probs_dropout_prob', 0.5609426282257046),
             ('hidden_dropout_prob', 0.0),
             ('learning_rate', 4.459646219248771e-05),
             ('num_train_epochs', 10),
             ('per_device_train_batch_size', 4),
             ('pretrained_model', 'dccuchile/bert-base-spanish-wwm-cased')])

Optimized model metrics

In [None]:
from sklearn.metrics import classification_report
preds = metamodel.predict(X_test)
print(classification_report(Y_test, preds))

              precision    recall  f1-score   support

           0       0.78      0.84      0.81       173
           1       0.75      0.58      0.65        31
           2       0.29      0.29      0.29        14
           3       0.90      0.87      0.89       210

    accuracy                           0.82       428
   macro avg       0.68      0.65      0.66       428
weighted avg       0.82      0.82      0.82       428



Best optimized result

    OrderedDict([('attention_probs_dropout_prob', 0.371179371670504),
             ('hidden_dropout_prob', 0.0),
             ('learning_rate', 8.835158172392285e-05),
             ('num_train_epochs', 6),
             ('per_device_train_batch_size', 32),
             ('pretrained_model', 'dccuchile/bert-base-spanish-wwm-cased')])

                precision    recall  f1-score   support

              0       0.75      0.89      0.82       173
              1       0.71      0.65      0.68        31
              2       0.50      0.36      0.42        14
              3       0.95      0.84      0.89       210

        accuracy                           0.83       428
      macro avg       0.73      0.68      0.70       428
    weighted avg       0.84      0.83      0.83       428



### 2.6. Basic embeddings model

For reference we also provide experimental results with a simple model making use of pre-trained word embeddings. This model represents a text as the average of the word embeddings of every word in the text.

Install spacy model

In [None]:
!python -m spacy download en_core_web_lg

2021-02-07 10:20:21.230153: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Collecting en-core-web-lg==3.0.0
[?25l  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.0.0/en_core_web_lg-3.0.0-py3-none-any.whl (778.8MB)
[K     |████████████████████████████████| 778.8MB 23kB/s 
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.0.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


Transform texts to embedding vectors

In [None]:
import numpy as np
import spacy

def vectorize_texts(texts, spacy_model='en_core_web_lg'):
    nlp = spacy.load(spacy_model)
    return np.array([vectorize_text(text, nlp) for text in texts])

def vectorize_text(text, nlp):
    tokens = nlp(text)
    vector = np.zeros(tokens[0].vector.shape)
    n_vectors = 0
    for token in tokens:
        if token.has_vector:
            vector += token.vector
            n_vectors +=1
    return vector / n_vectors

In [None]:
E_train = vectorize_texts(X_train)

In [None]:
E_train_oversampling = vectorize_texts(X_train_oversampling)

In [None]:
E_test = vectorize_texts(X_test)

#### 2.6.1. SpacyEmbeddings + RandomForestClassifier

In [None]:
spacy_rf_model = RandomForestClassifier(random_state=42, n_jobs = -1)

params = {
    'n_estimators' : RF_ESTIMATORS,
}

spacy_rf_gs_model = RandomizedSearchCV(spacy_rf_model, params, n_jobs = -1, cv=StratifiedKFold(), n_iter=TUNING_ITERATIONS, random_state=12345, verbose=2)

spacy_rf_gs_model.fit(E_train, Y_train)

print(spacy_rf_gs_model.best_params_)  

Fitting 5 folds for each of 3 candidates, totalling 15 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:  1.0min finished


{'n_estimators': 1000}


In [None]:
spacy_rf_gs_preds = spacy_rf_gs_model.predict(E_test)
np.save('preds/spacy_rf_gs_preds', spacy_rf_gs_preds)
print(classification_report(Y_test, spacy_rf_gs_preds))

              precision    recall  f1-score   support

           0       0.51      0.59      0.55       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.59      0.64      0.62       210

    accuracy                           0.55       428
   macro avg       0.28      0.31      0.29       428
weighted avg       0.50      0.55      0.52       428



  _warn_prf(average, modifier, msg_start, len(result))


Very poor results for this modelling strategy

### 2.7. Word embeddings + recurrent mixing model

We further explore the previous results by devoloping a more advanced embeddings-based model: mixing word embeddings through a Gated Recurrent Unit layer.

First, load fasttext embeddings

In [None]:
import fasttext.util

fasttext.util.download_model('es', if_exists='ignore')
ft = fasttext.load_model('cc.es.300.bin')

Downloading https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.es.300.bin.gz




The following code defines a model with fasttext embeddings followed by a GRU mixing layer.

In [None]:
from keras.models import Model
from keras.layers.embeddings import Embedding
from keras.layers import Bidirectional, Dense, GlobalAveragePooling1D, GRU, Input, SpatialDropout1D, Dropout
from keras.preprocessing.sequence import pad_sequences 
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.metrics import f1_score

class FastTextGRUClassifier(BaseEstimator, ClassifierMixin):
    """Model implementing FastText embeddings for words followed by a GRU recurrent deep network"""
    def __init__(self, spatial_dropout=0.2, gru_layers=1, gru_units=80, gru_dropout=0.0, dense_layers=1, dense_units=16, dense_dropout=0.0,
                 max_tokenizer_words=15000, max_sequence=45, epochs=1, batch_size=64):
        self.spatial_dropout = spatial_dropout
        self.gru_layers = gru_layers
        self.gru_units = gru_units
        self.gru_dropout = gru_dropout
        self.dense_layers = dense_layers
        self.dense_units = dense_units
        self.dense_dropout = dense_dropout
        self.max_tokenizer_words = max_tokenizer_words
        self.max_sequence = max_sequence
        self.epochs = epochs
        self.batch_size = batch_size

    def _create_embedding_matrix(self, fasttext_model):
        """Creates a weight matrix for an Embedding layer using a fasttext_model and a Tokenizer"""
        
        # Compute mean and standard deviation for words embeddings
        all_embs = np.stack([
            fasttext_model.get_word_vector(word) for word in self._tokenizer.word_index.keys()
        ])

        emb_mean, emb_std = all_embs.mean(), all_embs.std()
        
        embedding_size = fasttext_model.get_dimension()
        
        embedding_matrix = np.random.normal(emb_mean, emb_std, (self._tokenizer.num_words, embedding_size))
        for word, i in self._tokenizer.word_index.items():
            if i >= self._tokenizer.num_words: break
            embedding_vector = fasttext_model.get_word_vector(word)
            if embedding_vector is not None: 
                embedding_matrix[i] = embedding_vector
                
        return embedding_matrix

    def fit(self, X, y, fasttext_model):
        # Prepare tokenizer
        self._tokenizer = Tokenizer(num_words=self.max_tokenizer_words)
        self._tokenizer.fit_on_texts(X)

        # Transform texts to sequences
        tokenized = self._tokenizer.texts_to_sequences(X)
        sequences = pad_sequences(tokenized, maxlen=self.max_sequence)

        # One-hot encode output labels
        y_hot = to_categorical(y)

        # Build embeddings matrix
        embedding_matrix = self._create_embedding_matrix(fasttext_model)

        # Build neural network
        inp = Input(shape=(self.max_sequence, ))
        x = Embedding(self.max_tokenizer_words, embedding_matrix.shape[1], weights=[embedding_matrix], trainable=False)(inp)
        x = SpatialDropout1D(self.spatial_dropout)(x)
        x = Bidirectional(GRU(self.gru_units, dropout=self.gru_dropout, return_sequences=True))(x)
        for _ in range(1, self.gru_layers):
            x = GRU(self.gru_units, dropout=self.gru_dropout, return_sequences=True)(x)
        x = GlobalAveragePooling1D()(x)
        for _ in range(self.dense_layers):
            x = Dense(self.dense_units, activation="relu")(x)
            x = Dropout(self.dense_dropout)(x)
        x = Dense(len(set(y)), activation="softmax")(x)
        model = Model(inputs=inp, outputs=x)
        model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])

        model.fit(sequences, y_hot, batch_size=self.batch_size, epochs=self.epochs, verbose=0)
        self._model = model

    def predict(self, X):
        # Transform texts to sequences
        tokenized = self._tokenizer.texts_to_sequences(X)
        sequences = pad_sequences(tokenized, maxlen=self.max_sequence)

        # Predict with model
        preds = self._model.predict(sequences)
        return np.argmax(preds, axis=1)

    def score(self, X, y):
        preds = self.predict(X)
        return f1_score(y, preds, average="macro")

Basic GRU (default parameters)

In [None]:
gru_model = FastTextGRUClassifier(epochs=20)
gru_model.fit(X_train, Y_train, ft)

In [None]:
gru_preds = gru_model.predict(X_test)
np.save('preds/gru_preds', gru_preds)
print(classification_report(Y_test, gru_preds))

              precision    recall  f1-score   support

           0       0.59      0.68      0.63       173
           1       0.00      0.00      0.00        31
           2       0.00      0.00      0.00        14
           3       0.70      0.76      0.73       210

    accuracy                           0.65       428
   macro avg       0.32      0.36      0.34       428
weighted avg       0.58      0.65      0.61       428



  _warn_prf(average, modifier, msg_start, len(result))


Hyperoptimization

In [None]:
%%time
from skopt import BayesSearchCV
from skopt.space.space import Integer, Real
from sklearn.model_selection import StratifiedKFold

param_grid = {
    "spatial_dropout": Real(0, 0.9, "uniform"),
    "gru_layers": [1, 2, 3],
    "gru_units": [16, 32, 64, 128, 256, 512, 1024],
    "gru_dropout": Real(0, 0.9, "uniform"),
    "dense_layers": [1, 2, 3],
    "dense_units" : [16, 32, 64, 128, 256, 512, 1024],
    "dense_dropout": Real(0, 0.9, "uniform"),
    "epochs": Integer(50, 200, "uniform"),
}

cv_method = StratifiedKFold()

gru_bs_model = BayesSearchCV(FastTextGRUClassifier(), param_grid, n_iter=30, verbose=3, cv=cv_method, random_state=12345, error_score=0.0, fit_params={"fasttext_model": ft})
gru_bs_model.fit(X_train, Y_train)

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796, score=0.286, total=  37.2s
[CV] dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   37.2s remaining:    0.0s


[CV]  dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796, score=0.316, total=  37.3s
[CV] dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.2min remaining:    0.0s


[CV]  dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796, score=0.317, total=  36.9s
[CV] dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796 
[CV]  dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796, score=0.326, total=  38.3s
[CV] dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796 
[CV]  dense_dropout=0.36712704740889385, dense_layers=3, dense_units=256, epochs=109, gru_dropout=0.7118832782946124, gru_layers=3, gru_units=64, spatial_dropout=0.5377325391546796, score=0.328, total=  38.1s
Fitting 5 f

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.1min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391, score=0.307, total=  16.7s
[CV] dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   16.7s remaining:    0.0s


[CV]  dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391, score=0.318, total=  16.8s
[CV] dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   33.5s remaining:    0.0s


[CV]  dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391, score=0.315, total=  17.1s
[CV] dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391 
[CV]  dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391, score=0.330, total=  16.2s
[CV] dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391 
[CV]  dense_dropout=0.4626460576204746, dense_layers=2, dense_units=64, epochs=70, gru_dropout=0.13937691274336583, gru_layers=1, gru_units=32, spatial_dropout=0.3450716700768391, score=0.332, total=  17.1s
Fitting 5 folds for e

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  1.4min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135, score=0.323, total=  34.9s
[CV] dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   34.9s remaining:    0.0s


[CV]  dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135, score=0.327, total=  34.8s
[CV] dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.2min remaining:    0.0s


[CV]  dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135, score=0.332, total=  34.9s
[CV] dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135 
[CV]  dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135, score=0.319, total=  34.4s
[CV] dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135 
[CV]  dense_dropout=0.041508544006946244, dense_layers=2, dense_units=1024, epochs=130, gru_dropout=0.37781661913068065, gru_layers=2, gru_units=32, spatial_dropout=0.697509239293135, score=0.314, total=  34.9s
F

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  2.9min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894, score=0.312, total=  55.4s
[CV] dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   55.4s remaining:    0.0s


[CV]  dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894, score=0.336, total=  55.2s
[CV] dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.8min remaining:    0.0s


[CV]  dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894, score=0.323, total=  55.1s
[CV] dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894 
[CV]  dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894, score=0.341, total=  55.9s
[CV] dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894 
[CV]  dense_dropout=0.22314680177365268, dense_layers=3, dense_units=256, epochs=176, gru_dropout=0.44157822614380576, gru_layers=3, gru_units=32, spatial_dropout=0.0033130250892628894, score=0.298, total

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  4.6min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066, score=0.156, total=  35.9s
[CV] dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   35.9s remaining:    0.0s


[CV]  dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066, score=0.156, total=  37.1s
[CV] dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.2min remaining:    0.0s


[CV]  dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066, score=0.156, total=  36.6s
[CV] dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066 
[CV]  dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066, score=0.155, total=  36.7s
[CV] dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066 
[CV]  dense_dropout=0.8685989173436767, dense_layers=2, dense_units=16, epochs=107, gru_dropout=0.5328882997083877, gru_layers=3, gru_units=32, spatial_dropout=0.8449996785268066, score=0.155, total=  36.3s
Fitting 5 folds for e

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.0min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556, score=0.294, total=  36.0s
[CV] dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   36.0s remaining:    0.0s


[CV]  dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556, score=0.300, total=  36.4s
[CV] dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.2min remaining:    0.0s


[CV]  dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556, score=0.310, total=  36.5s
[CV] dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556 
[CV]  dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556, score=0.300, total=  35.8s
[CV] dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556 
[CV]  dense_dropout=0.3691514693845362, dense_layers=3, dense_units=16, epochs=179, gru_dropout=0.6137651209585483, gru_layers=1, gru_units=32, spatial_dropout=0.7269519404781556, score=0.313, total=  36.2s
Fitting 5 folds for e

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.0min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123, score=0.298, total=  31.2s
[CV] dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   31.2s remaining:    0.0s


[CV]  dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123, score=0.307, total=  31.1s
[CV] dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.0min remaining:    0.0s


[CV]  dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123, score=0.303, total=  31.4s
[CV] dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123 
[CV]  dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123, score=0.354, total=  30.7s
[CV] dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123 
[CV]  dense_dropout=0.03317383878440064, dense_layers=2, dense_units=512, epochs=86, gru_dropout=0.7415784521982055, gru_layers=3, gru_units=64, spatial_dropout=0.17930928606619123, score=0.313, total=  30.7s
Fitting 5 f

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  2.6min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371, score=0.242, total=  19.7s
[CV] dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   19.7s remaining:    0.0s


[CV]  dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371, score=0.272, total=  19.1s
[CV] dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   38.8s remaining:    0.0s


[CV]  dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371, score=0.289, total=  19.4s
[CV] dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371 
[CV]  dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371, score=0.265, total=  19.3s
[CV] dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371 
[CV]  dense_dropout=0.8431661252980369, dense_layers=2, dense_units=256, epochs=60, gru_dropout=0.5349015985256025, gru_layers=2, gru_units=16, spatial_dropout=0.8088179031533371, score=0.285, total=  19.2s
Fitting 5 folds for e

[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  1.6min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123, score=0.319, total=  29.6s
[CV] dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   29.6s remaining:    0.0s


[CV]  dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123, score=0.331, total=  29.6s
[CV] dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   59.2s remaining:    0.0s


[CV]  dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123, score=0.316, total=  29.6s
[CV] dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123 
[CV]  dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123, score=0.319, total=  29.3s
[CV] dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123 
[CV]  dense_dropout=0.21035384276262642, dense_layers=2, dense_units=128, epochs=106, gru_dropout=0.5565299474116303, gru_layers=2, gru_units=32, spatial_dropout=0.43430233522902123, score=0.300, total=  29.6s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  2.5min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314, score=0.292, total= 1.1min
[CV] dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.1min remaining:    0.0s


[CV]  dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314, score=0.309, total= 1.1min
[CV] dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.2min remaining:    0.0s


[CV]  dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314, score=0.318, total= 1.1min
[CV] dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314 
[CV]  dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314, score=0.323, total= 1.1min
[CV] dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314 
[CV]  dense_dropout=0.6629296110884636, dense_layers=2, dense_units=32, epochs=166, gru_dropout=0.6431953593736593, gru_layers=2, gru_units=256, spatial_dropout=0.42356806707829314, score=0.330, total= 1.1min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  5.6min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717, score=0.279, total=  39.4s
[CV] dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   39.4s remaining:    0.0s


[CV]  dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717, score=0.359, total=  39.9s
[CV] dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.3min remaining:    0.0s


[CV]  dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717, score=0.316, total=  39.4s
[CV] dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717 
[CV]  dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717, score=0.369, total=  39.9s
[CV] dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717 
[CV]  dense_dropout=0.6162658467335622, dense_layers=2, dense_units=256, epochs=144, gru_dropout=0.5062365776729225, gru_layers=1, gru_units=256, spatial_dropout=0.2366403623296717, score=0.338, total=  39.5s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.3min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873, score=0.274, total=  59.8s
[CV] dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   59.8s remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873, score=0.320, total=  59.4s
[CV] dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.0min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873, score=0.283, total= 1.0min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873, score=0.351, total=  59.6s
[CV] dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=512, epochs=112, gru_dropout=0.0, gru_layers=1, gru_units=512, spatial_dropout=0.856983285449873, score=0.287, total=  59.9s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  5.0min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.156, total= 1.0min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.0min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.273, total= 1.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.1min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.222, total= 1.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.155, total= 1.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=16, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.229, total= 1.0min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  5.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0, score=0.371, total= 8.4min
[CV] dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  8.4min remaining:    0.0s


[CV]  dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0, score=0.379, total= 8.3min
[CV] dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 16.7min remaining:    0.0s


[CV]  dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0, score=0.338, total= 8.3min
[CV] dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0, score=0.371, total= 8.4min
[CV] dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.8390663304905417, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=2, gru_units=1024, spatial_dropout=0.0, score=0.375, total= 8.3min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 41.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475, score=0.299, total= 1.2min
[CV] dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.2min remaining:    0.0s


[CV]  dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475, score=0.316, total= 1.3min
[CV] dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.5min remaining:    0.0s


[CV]  dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475, score=0.316, total= 1.2min
[CV] dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475 
[CV]  dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475, score=0.323, total= 1.3min
[CV] dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475 
[CV]  dense_dropout=0.7402548530860971, dense_layers=2, dense_units=1024, epochs=57, gru_dropout=0.4202306826684827, gru_layers=1, gru_units=1024, spatial_dropout=0.7253027831577475, score=0.317, total= 1.2min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  6.3min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.307, total= 1.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.1min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.291, total= 1.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  2.2min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.308, total= 1.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.292, total= 1.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=50, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.334, total= 1.1min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  5.5min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0, score=0.323, total= 3.7min
[CV] dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.7min remaining:    0.0s


[CV]  dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0, score=0.414, total= 3.7min
[CV] dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  7.5min remaining:    0.0s


[CV]  dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0, score=0.350, total= 3.7min
[CV] dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0 
[CV]  dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0, score=0.385, total= 3.7min
[CV] dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0 
[CV]  dense_dropout=0.7038706820776923, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=512, spatial_dropout=0.0, score=0.351, total= 3.7min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 18.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0, score=0.325, total=  50.3s
[CV] dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   50.3s remaining:    0.0s


[CV]  dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0, score=0.312, total=  50.2s
[CV] dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.7min remaining:    0.0s


[CV]  dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0, score=0.349, total=  50.0s
[CV] dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0 
[CV]  dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0, score=0.325, total=  50.8s
[CV] dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0 
[CV]  dense_dropout=0.7681309803599284, dense_layers=2, dense_units=512, epochs=200, gru_dropout=0.6190918946917064, gru_layers=2, gru_units=16, spatial_dropout=0.0, score=0.306, total=  50.3s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  4.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0, score=0.322, total=11.2min
[CV] dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed: 11.2min remaining:    0.0s


[CV]  dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0, score=0.335, total=11.1min
[CV] dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 22.3min remaining:    0.0s


[CV]  dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0, score=0.284, total=11.1min
[CV] dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0, score=0.393, total=11.1min
[CV] dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.4449384648426246, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.0, score=0.321, total=11.1min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 55.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566, score=0.310, total=  38.1s
[CV] dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   38.1s remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566, score=0.327, total=  38.5s
[CV] dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.3min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566, score=0.332, total=  38.6s
[CV] dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566 
[CV]  dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566, score=0.349, total=  38.8s
[CV] dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566 
[CV]  dense_dropout=0.0, dense_layers=2, dense_units=16, epochs=200, gru_dropout=0.7411463210929242, gru_layers=1, gru_units=16, spatial_dropout=0.27794733329881566, score=0.307, total=  38.9s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.300, total= 4.0min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.0min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.296, total= 4.1min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  8.0min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.320, total= 4.1min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.324, total= 4.0min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.0, dense_layers=1, dense_units=64, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.310, total= 4.1min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 20.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754, score=0.348, total= 4.1min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.1min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754, score=0.351, total= 4.1min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  8.1min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754, score=0.327, total= 4.1min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754, score=0.369, total= 4.1min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.12258485064042754, score=0.330, total= 4.1min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 20.3min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505, score=0.308, total= 4.0min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.0min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505, score=0.338, total= 4.0min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  8.1min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505, score=0.360, total= 4.0min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505, score=0.417, total= 4.1min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=128, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.40775029884205505, score=0.355, total= 4.0min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 20.2min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502, score=0.318, total=11.1min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed: 11.1min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502, score=0.308, total=11.1min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 22.1min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502, score=0.334, total=11.2min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502, score=0.374, total=11.2min
[CV] dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502 
[CV]  dense_dropout=0.0, dense_layers=3, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=3, gru_units=1024, spatial_dropout=0.2649965115093502, score=0.315, total=11.2min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 55.6min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163, score=0.304, total= 4.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.1min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163, score=0.330, total= 4.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  8.2min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163, score=0.336, total= 4.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163, score=0.341, total= 4.1min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.34043448631950163, score=0.338, total= 4.1min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 20.4min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.342, total= 2.9min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.9min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.380, total= 2.9min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  5.9min remaining:    0.0s


[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.303, total= 2.9min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.416, total= 2.9min
[CV] dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0 
[CV]  dense_dropout=0.9, dense_layers=1, dense_units=1024, epochs=143, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.0, score=0.334, total= 2.9min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 14.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174, score=0.316, total=  13.5s
[CV] dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   13.5s remaining:    0.0s


[CV]  dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174, score=0.306, total=  13.3s
[CV] dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   26.8s remaining:    0.0s


[CV]  dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174, score=0.321, total=  13.7s
[CV] dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174 
[CV]  dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174, score=0.329, total=  13.5s
[CV] dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174 
[CV]  dense_dropout=0.38485749342124415, dense_layers=3, dense_units=1024, epochs=50, gru_dropout=0.0, gru_layers=1, gru_units=16, spatial_dropout=0.2963116592281174, score=0.302, total=  13.3s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  1.1min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9, score=0.166, total= 1.7min
[CV] dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.7min remaining:    0.0s


[CV]  dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9, score=0.153, total= 1.7min
[CV] dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  3.4min remaining:    0.0s


[CV]  dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9, score=0.156, total= 1.7min
[CV] dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9 
[CV]  dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9, score=0.170, total= 1.7min
[CV] dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9 
[CV]  dense_dropout=0.3004218585546998, dense_layers=2, dense_units=1024, epochs=200, gru_dropout=0.9, gru_layers=1, gru_units=512, spatial_dropout=0.9, score=0.176, total= 1.7min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  8.7min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049, score=0.302, total= 4.0min
[CV] dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.0min remaining:    0.0s


[CV]  dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049, score=0.286, total= 3.9min
[CV] dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  7.9min remaining:    0.0s


[CV]  dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049, score=0.301, total= 4.0min
[CV] dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049 
[CV]  dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049, score=0.308, total= 3.9min
[CV] dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049 
[CV]  dense_dropout=0.7192262624819056, dense_layers=1, dense_units=16, epochs=200, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.5275237074213049, score=0.315, total= 4.0min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 19.8min finished


Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724, score=0.347, total= 2.3min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.3min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724, score=0.310, total= 2.3min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  4.6min remaining:    0.0s


[CV]  dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724, score=0.289, total= 2.3min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724 
[CV]  dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724, score=0.374, total= 2.3min
[CV] dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724 
[CV]  dense_dropout=0.0, dense_layers=1, dense_units=1024, epochs=110, gru_dropout=0.0, gru_layers=1, gru_units=1024, spatial_dropout=0.41227104827233724, score=0.338, total= 2.3min


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 11.4min finished


CPU times: user 2h 18min 28s, sys: 20min 29s, total: 2h 38min 57s
Wall time: 6h 22min 40s


In [None]:
gru_bs_model.best_params_

OrderedDict([('dense_dropout', 0.8390663304905417),
             ('dense_layers', 1),
             ('dense_units', 1024),
             ('epochs', 200),
             ('gru_dropout', 0.0),
             ('gru_layers', 2),
             ('gru_units', 1024),
             ('spatial_dropout', 0.0)])

In [None]:
gru_bs_preds = gru_bs_model.predict(X_test)
np.save('preds/gru_bs_preds', gru_bs_preds)
print(classification_report(Y_test, gru_bs_preds))

              precision    recall  f1-score   support

           0       0.56      0.71      0.63       173
           1       0.29      0.19      0.23        31
           2       0.50      0.07      0.12        14
           3       0.71      0.64      0.68       210

    accuracy                           0.62       428
   macro avg       0.52      0.40      0.41       428
weighted avg       0.62      0.62      0.61       428



Best params:

      OrderedDict([('dense_dropout', 0.8390663304905417),
                  ('dense_layers', 1),
                  ('dense_units', 1024),
                  ('epochs', 200),
                  ('gru_dropout', 0.0),
                  ('gru_layers', 2),
                  ('gru_units', 1024),
                  ('spatial_dropout', 0.0)])

Best results:

                    precision    recall  f1-score   support

                0       0.56      0.71      0.63       173
                1       0.29      0.19      0.23        31
                2       0.50      0.07      0.12        14
                3       0.71      0.64      0.68       210

          accuracy                           0.62       428
        macro avg       0.52      0.40      0.41       428
      weighted avg       0.62      0.62      0.61       428
