### Universidad Nacional de Córdoba - Facultad de Matemática, Astronomía, Física y Computación

### Diplomatura en Ciencia de Datos, Aprendizaje Automático y sus Aplicaciones 2021
Búsqueda y Recomendación para Textos Legales

Mentor: Jorge E. Pérez Villella

# Práctico Aprendizaje Supervisado

Integrantes:

- Correa Francisco
- Oviedo Christian


https://jurisprudencia.justiciacordoba.gob.ar/cgi-bin/koha/opac-search.pl?limit=yr%2Cst-numeric%3D2020&sort_by=pubdate_dsc
    

https://www.justiciacordoba.gob.ar/consultafallosnet/Pages/Default.aspx
    

El objetivo de este práctico es afianzar los conocimientos adquiridos hasta este momento, haciendo un proceso de re-análisis de los datos para encarar desde distintas perspectivas (selección de features, redefinición de clases y subclases) para conseguir nuevos resultados sobre los modelos ya trabajados, añadiendo ensamble learning al análisis.

La idea es aprender a iterar en el proceso de ciencia de datos, no quedarnos con los resultados obtenidos del primer proceso realizado.

Profundizar el tema de stop words y cómo generar uno propio. 

En este práctico, para resolver el problema de la clasificación se propone entrenar los siguientes modelos de la librería scikit-learn: LogisticRegretion y SGDClassifier. 


Fecha de Entrega: 12 de septiembre de 2021

# Stop words

Al momento de realizar el práctico 2, *Práctico Análisis y Visualización* aplicamos diferents técnicas para generar stop words. La técnica que no aplicamos y entendemos se puede aplicar a las ya aplicadas, es la identificar como stop words aquellas palabras que tengan un IDF 'bajo'. La definición de que es bajo, es subjectivo, y lo aplicaremos según criterio experot.

## Identificamos las palabras cuyo IDF sea "bajo". 
  
Cuando el IDF es bajo, estamos frente a palabras que aparecen en muchos documentos y por ende no brindan información releveante al momento de clasificar. 

In [503]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.model_selection import train_test_split

Recuperamos el corpus curado en los prácticos anteriores

In [504]:
corpus_file_name = 'cleaned_corpus.csv'

#corpus_file_name = 'cleaned_corpus_4.csv'



cleaned_corpus_tmp = pd.read_csv(corpus_file_name)


cleaned_corpus = cleaned_corpus_tmp.drop(cleaned_corpus_tmp.columns[0], axis=1)

X = cleaned_corpus['text']  
y = cleaned_corpus['classifier']



In [505]:
cleaned_corpus.head()

Unnamed: 0,text,id,classifier
0,dato causa sede ciudad cordoba dependencia juz...,4de122c24ab1606c9d67f4ff9e656143,Documentos/MENORES
1,univoco fecha materia revista familia tribunal...,1f9cdcb2c2596656b540c1271fc2d843,Documentos/MENORES
2,juzgado juventud violencia familiar 8ª cordoba...,17dcae14592fc6e87680ccb4251d9395,Documentos/MENORES
3,auto caratulado a. a. denuncia violencia gener...,4b3ae58648b6267ebb332feec8002588,Documentos/MENORES
4,juzg adolescencia violencia familiar 4ta cba s...,1316026beaa1d7e6530bdfe7e54f7b5c,Documentos/MENORES


Realizamos la vectorización con *TfidfVectorizer*

In [506]:
vectorizer = TfidfVectorizer()

X_train = X

vectorizer.fit(X_train)

TfidfVectorizer()

Creamos un data frame con el resultado de TFIDF para poder consultar los datos de manera más fácil

In [507]:
def create_idf_data_frame(vectorizer):
    
    df_idf = pd.DataFrame(data = vectorizer.idf_ , columns= ["idf_weight"])
    df_idf['word'] = vectorizer.get_feature_names()

    sorted_df_idf = df_idf.sort_values(by=['idf_weight'])
    return sorted_df_idf



sorted_df_idf = create_idf_data_frame(vectorizer)


sorted_df_idf.shape


(17964, 2)

In [508]:
print (f"Cantidad total de terminos {sorted_df_idf.shape[0]}")

Cantidad total de terminos 17964


Buscamos los percentiles .05, .1, .25, .5 y .75 de los valores IDF

In [509]:
percent_df = sorted_df_idf.quantile([ .025,.05, 0.075, .1], axis = 0)
percent_df

Unnamed: 0,idf_weight
0.025,1.964569
0.05,2.471817
0.075,2.833607
0.1,3.164964


Generamos la lista de stop words. Notar que esto lo hacemos de manera arbitraria. Lo que se podría hacer es ir variando este valor y entrenadando modelos. Es decir, hacer ademas de un gridsearch para probar hiperparametros, también ir variando la cantidad de stop_words a eliminar del corpus y validar si obtenemos mejoras en nuestros modelos ( En nuestro caso, la mejora del modelo, la valdidamos con f1-score por ser data sets desbalanceados).

De manera arbitraria, eliminamos todas las palabras cuyo IDF sea menor a *3.164964*. 

In [510]:
limit = percent_df.loc[.1].values[0]

stop_words = df_idf[df_idf['idf_weight'] <= limit ]['word'].values.tolist()
#stop_words = []

In [511]:
print (f"Cantidad de stop words {len(stop_words)}")

Cantidad de stop words 1819


In [512]:
vectorizer = TfidfVectorizer(stop_words = stop_words )

X_train = cleaned_corpus['text']

vectorizer.fit(X_train)

TfidfVectorizer(stop_words=['10', '10305', '11', '12', '13', '130', '14', '15',
                            '17', '18', '1º', '20', '21', '26', '26061', '27',
                            '2ª', '2º', '30', '36', '39', '40', '468', '48',
                            '50', '5º', '75', '7676', '9459', '9944', ...])

In [513]:
sorted_df_idf = create_idf_data_frame(vectorizer)

In [514]:
sorted_df_idf

Unnamed: 0,idf_weight,word
6747,3.201331,equivocar
1339,3.201331,ad
11378,3.201331,online
13877,3.201331,rezar
11311,3.201331,oficina
...,...,...
8800,5.804021,incomprensible
8803,5.804021,inconcusa
8806,5.804021,inconducent
2082,5.804021,antropologia


# Entrenamiento de los modelos 


Vamos a entrenar los modelos  LogisticRegretion y SGDClassifier

Definimos el valor del seed para que los experimentos sean repetibles

In [515]:
seed = 42

Separamos el data set en set de entrenamiento y set de test

In [516]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=seed)

In [517]:
def get_vectors(X_train, X_test, vectorizer):
    
    X_train_vect = vectorizer.fit_transform(X_train)
    X_text_vect = vectorizer.transform(X_test)
    
    return (X_train_vect, X_text_vect)

In [518]:
vectorizer = TfidfVectorizer(stop_words = stop_words)
X_train_vect , X_test_vect = get_vectors(X_train, X_test, vectorizer)

X_train_vect.shape

(162, 13877)

In [519]:
X_test_vect.shape

(81, 13877)

Realizamos una implementación de gridsearch con cross validation, que permite pasar diferentes modelos de sickit-learn a ajustar. 
La idea es que este método nos permita hacer pruebas de manera sencilla de diferentes modelos con diferentes parámetros (GridSearchCV de sickit learn no permite hacer pruebas de diferentes modelos.). Luego en base a estos resultados, elegimos que modelos y parámetros presentar en el apartado * Clasificación usando diferentes modelos*

A la función **train_modelos** se le pasan:
-	 Dos diccionarios: los modelos y los parámetros.
-	 Los sets de entrenamiento y test
-	 La cantidad de folds

La función hace el entrenamiento de todos los modelos en base a los parámetros que se le indican y usando el CV indicado. Los resultados de la función son transformados a un data frame. Luego se puede ordenar el data frame por diferentes criterios (recall, f1-score, etc.)

In [520]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn import svm
from sklearn import linear_model

from sklearn.model_selection import KFold

from sklearn import metrics
from sklearn.metrics import roc_auc_score

import itertools as it

from collections import Counter

In [521]:

models1 = {
    'RandomForset': RandomForestClassifier(),
    'MultinomialNB': MultinomialNB(),
    'SVM_01': svm.SVC(),
    'SVM_02': svm.SVC(),
    'LogisticRegressionClassifier': linear_model.LogisticRegression() ,
    'LogisticRegressionClassifier_01': linear_model.LogisticRegression()
    
}

params1 = {
    'RandomForset': {"n_estimators" : [100] , "criterion" : ["gini", "entropy"]},
    'LogisticRegressionClassifier': { "solver":["liblinear" , "sag", "saga","lbfgs"], "multi_class":["ovr"], "penalty":["l2" ] , "C": [1.0,0.7]  } ,
    'LogisticRegressionClassifier_01': { "solver":["liblinear" ], "multi_class":["ovr"], "penalty":["l2","l1"] , "C": [1.0,0.7,0.2]  } ,
    'SVM_01':{"kernel" :['poly'] , "degree" : [2,3,4,5] } ,
    'SVM_02':{"kernel" :['linear', 'rbf', 'sigmoid']  } ,
    'MultinomialNB':{"alpha" :[1.0] }
} 

In [522]:
from copy import copy, deepcopy



def roc_auc_score_macro(actual_class, pred_class, average = "macro"):

    roc_auc = roc_auc_score(actual_class, pred_class, average = average , multi_class ='ovr')
 
    return roc_auc



def roc_auc_score_multiclass(actual_class, pred_class, average = "macro"):

  #creating a set of all the unique classes using the actual class list
  unique_class = set(actual_class)
  roc_auc_dict = {}
  for per_class in unique_class:
    #creating a list of all the classes except the current class 
    other_class = [x for x in unique_class if x != per_class]

    #marking the current class as 1 and all other classes as 0
    new_actual_class = [0 if x in other_class else 1 for x in actual_class]
    new_pred_class = [0 if x in other_class else 1 for x in pred_class]

    #using the sklearn metrics method to calculate the roc_auc_score
    roc_auc = roc_auc_score(new_actual_class, new_pred_class, average = average , multi_class ='ovr')
    roc_auc_dict[per_class] = roc_auc

  return roc_auc_dict

#def train_model(model, folds_index, X_train, Y_train):

def generate_model_params(model_params):
    
    allNames = sorted(model_params)
    combinations = it.product(*(model_params[Name] for Name in allNames))
    return (list(combinations) , allNames)


def train_model(model_id, model, params_names, param_combination, folds_index, X_train, Y_train , X_test, Y_test , output_dict = True , random_state = None):
    
    param_combination = list(param_combination)
    print ("train model")
    print (f"{model} {params_names} {param_combination}")
   
    
    cloned_model = deepcopy(model)
    
    
    for param_name , param_value in zip(params_names,param_combination ):
        #print (f"{param_name} =  {param_value}")
        setattr(cloned_model , param_name , param_value)

    if type(random_state) == int:
        setattr(cloned_model , "random_state" , random_state)
        
    print (cloned_model)
    
    results = []
    
    for train_index, test_index in folds_index:
   
        cloned_model_tmp = deepcopy(cloned_model)
        #print (f"{train_index}")
        #print (f"{test_index}")
    
    
    #X_train, X_test, y_train, y_test
    
        # Se hace el split en base a los CV. Se obtienen los datos de X_train y de X_test con sus respectivos Y
        X_train_tmp, X_test_tmp = X_train[train_index], X_train[test_index]
        
        y_train_tmp, y_test_tmp = Y_train[train_index], Y_train[test_index] 
    
        cloned_model_tmp.fit(X_train_tmp,y_train_tmp)
       
    
        y_test_val_pred = cloned_model_tmp.predict(X_test_tmp)
        
        train_result = metrics.classification_report(y_test_tmp, y_test_val_pred , output_dict = output_dict )
        
        print(train_result)
        
        
        #roc_result = roc_auc_score(y_true = y_test_tmp, y_score = y_test_val_pred , multi_class = "ovr")
        
        roc_result = roc_auc_score_multiclass(actual_class=y_test_tmp, pred_class=y_test_val_pred)
        #roc_result_macro = roc_auc_score_macro(actual_class=y_test_tmp, pred_class=y_test_val_pred)
        
        
        results.append ((f"{model}","Train" , f"{params_names }", train_result , roc_result , f"{param_combination}" , f"{model}_{model_id}" ))
    
    
    cloned_model_tmp = deepcopy(cloned_model)
    
    
    cloned_model_tmp.fit(X_train,Y_train)
        
    y_test_pred = cloned_model_tmp.predict(X_test)
    
    test_result = metrics.classification_report(Y_test, y_test_pred , output_dict = output_dict )
    
    #roc_result = roc_auc_score(y_true = Y_test, y_score = y_test_pred , multi_class = "ovr")
    roc_result = roc_auc_score_multiclass(actual_class=Y_test, pred_class=y_test_pred)
    #roc_result_macro = roc_auc_score_macro(actual_class=y_test_tmp, pred_class=y_test_val_pred)
        
    results.append ((f"{model}","Test", f"{params_names} ", test_result , roc_result , f"{param_combination}" , f"{model}_{model_id}" ))
    
    print("Test")
    print(test_result)
    
    return results
    

def sum_train_values(results):

    
   
    total = (0,0,0)
    
    for model_result in results:
            total = (total[0] + model_result[3]['macro avg']['precision'] , total[1] + model_result[3]['macro avg']['recall'],total [2] + model_result[3]['macro avg']['recall'])

    #total = total / len (results)
    cantidad_filas = len (results)
    
    total = (total[0] / cantidad_filas, total[1] / cantidad_filas, total[2] / cantidad_filas)
    print ("Ponderado")
    print (f"{total}")
    
    
    
    return total

def train_models(X_train,Y_train,X_test, Y_test, cv=5,shuffle=True, models=None ,params=None , output_dict = True , random_state = None):
    
    results = []
    
    kf = KFold(n_splits=cv, random_state=random_state, shuffle=shuffle )
   
    model_id = 0 
    
    folds_index = [(train_index, test_index) for train_index, test_index in kf.split(X_train)  ]

    for param_model in params.keys():
    
        params_combination, params_names = generate_model_params(params.get(param_model))
        #print (f"Modelo a ejecutar: {param_model}, parámetros a probar: {params_combination} , nombre de los parámetros: {params_names} ")
        
        for param_combination in params_combination:
            #print (f"{param_model}: {param_combination} ")
            
            model_result = train_model(model_id = model_id, model = models.get(param_model),params_names = params_names, param_combination = param_combination, folds_index = folds_index, X_train = X_train, Y_train = Y_train , X_test = X_test, Y_test = Y_test , output_dict = output_dict , random_state = random_state )     
            
         
            results.extend( model_result  )
    
    
        model_id = model_id + 1  

    return results        
       

Esta función arma un data frame con el resultado de los entrenamientos. Notar que para calcular el ROC_AUC, se hace una suma de los valores ponderados del ROC_AUC por clase. Notar que el valor ROC_AUC que se obtiene acá es diferente al que se muestran en los diagramas. Esto se debe a que en esta función estamos calculando el ROC_AUC ponderado, mientras que en la librería yellowbrick, se hace el calculo de la ROC_AUC micro y macro que usa otro criterio (ver https://www.scikit-yb.org/en/latest/api/classifier/rocauc.html). Nos parece más conveniente el criterio que planteamos nosotros.

In [523]:
def toDataFrame(results, y_test):
    
    counter = Counter(y_test)
    total = counter['Documentos/FAMILIA'] + counter['Documentos/LABORAL'] + counter['Documentos/MENORES'] + counter['Documentos/PENAL']
    familia = counter['Documentos/FAMILIA'] / total
    laboral = counter['Documentos/LABORAL'] / total
    menores = counter['Documentos/MENORES'] / total
    penal = counter['Documentos/PENAL'] / total
    
    print ("Ponderado fuero")
    print (f"familia: {familia}, laboral: {laboral}, menores: {menores}, penal: {penal} ")
    
    filtered_values =  []
    columns = ["id", "modelo", "modo" , "parametros" , "valores" , "accuracy", "precision" , "recall" , "f1-score" , "roc_penal", "roc_familia" ,"roc_laboral" , "roc_menores" ,]
    for result in results:
        #print (f"{result[0]} {result[1]} {result[2]} {result[3]['macro avg']} \n")
        filtered_values.append(( result[6], result[0], result[1] , result[2] , result[5] , result[3]['accuracy'], result[3]['macro avg']['precision'] , result[3]['macro avg']['recall'] ,  result[3]['macro avg']['f1-score'] , result[4]["Documentos/PENAL"] , result[4]["Documentos/FAMILIA"] , result[4]["Documentos/LABORAL"] , result[4]["Documentos/MENORES"]))

    df= pd.DataFrame(data = filtered_values , columns = columns)
    
    df["roc_ponderado"] = (df["roc_penal"] * penal + df["roc_familia"] * familia + df["roc_laboral"] * laboral + df["roc_menores"] * menores)
    return df

Fecha de Entrega: 12 de septiembre de 2021

In [524]:
results  = train_models(X_train= X_train_vect, Y_train =y_train.values ,  X_test = X_test_vect, Y_test = y_test.values,  models = models1 , params = params1 , cv=5 , output_dict = True , random_state = seed )

train model
RandomForestClassifier() ['criterion', 'n_estimators'] ['gini', 100]
RandomForestClassifier(random_state=42)
{'Documentos/FAMILIA': {'precision': 0.7142857142857143, 'recall': 1.0, 'f1-score': 0.8333333333333333, 'support': 15}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.7142857142857143, 'f1-score': 0.8333333333333333, 'support': 7}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.75, 'f1-score': 0.8571428571428571, 'support': 8}, 'accuracy': 0.8181818181818182, 'macro avg': {'precision': 0.9285714285714286, 'recall': 0.699404761904762, 'f1-score': 0.7559523809523809, 'support': 33}, 'weighted avg': {'precision': 0.8701298701298702, 'recall': 0.8181818181818182, 'f1-score': 0.8088023088023087, 'support': 33}}


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.7857142857142857, 'recall': 1.0, 'f1-score': 0.88, 'support': 22}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 6}, 'accuracy': 0.8181818181818182, 'macro avg': {'precision': 0.6964285714285714, 'recall': 0.625, 'f1-score': 0.6366666666666666, 'support': 33}, 'weighted avg': {'precision': 0.7662337662337662, 'recall': 0.8181818181818182, 'f1-score': 0.7684848484848484, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.7142857142857143, 'recall': 1.0, 'f1-score': 0.8333333333333333, 'support': 15}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.875, 'f1-score': 0.9333333333333333, 'support': 8}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 4}, 'Documentos/PENAL': {'precision

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.6818181818181818, 'recall': 1.0, 'f1-score': 0.8108108108108109, 'support': 15}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.7142857142857143, 'f1-score': 0.8333333333333333, 'support': 7}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.75, 'f1-score': 0.8571428571428571, 'support': 8}, 'accuracy': 0.7878787878787878, 'macro avg': {'precision': 0.6704545454545454, 'recall': 0.6160714285714286, 'f1-score': 0.6253217503217503, 'support': 33}, 'weighted avg': {'precision': 0.7644628099173554, 'recall': 0.7878787878787878, 'f1-score': 0.7531102531102531, 'support': 33}}


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.7857142857142857, 'recall': 1.0, 'f1-score': 0.88, 'support': 22}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.6666666666666666, 'f1-score': 0.8, 'support': 6}, 'accuracy': 0.8181818181818182, 'macro avg': {'precision': 0.6964285714285714, 'recall': 0.5416666666666666, 'f1-score': 0.5866666666666667, 'support': 33}, 'weighted avg': {'precision': 0.7662337662337662, 'recall': 0.8181818181818182, 'f1-score': 0.7725252525252525, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.7142857142857143, 'recall': 1.0, 'f1-score': 0.8333333333333333, 'support': 15}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.875, 'f1-score': 0.9333333333333333, 'support': 8}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 4}, 'Do

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 8}, 'accuracy': 0.5151515151515151, 'macro avg': {'precision': 0.3709677419354839, 'recall': 0.3125, 'f1-score': 0.2630434782608696, 'support': 33}, 'weighted avg': {'precision': 0.46236559139784944, 'recall': 0.5151515151515151, 'f1-score': 0.39341238471673257, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.6875, 'recall': 1.0, 'f1-score': 0.8148148148148148, 'support': 22}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'rec

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.1, 'f1-score': 0.18181818181818182, 'support': 10}, 'accuracy': 0.4375, 'macro avg': {'precision': 0.3548387096774194, 'recall': 0.275, 'f1-score': 0.19318181818181818, 'support': 32}, 'weighted avg': {'precision': 0.4828629032258065, 'recall': 0.4375, 'f1-score': 0.296875, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.11111111111111

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 8}, 'accuracy': 0.5151515151515151, 'macro avg': {'precision': 0.3709677419354839, 'recall': 0.3125, 'f1-score': 0.2630434782608696, 'support': 33}, 'weighted avg': {'precision': 0.46236559139784944, 'recall': 0.5151515151515151, 'f1-score': 0.39341238471673257, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.6875, 'recall': 1.0, 'f1-score': 0.8148148148148148, 'support': 22}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'rec

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.2, 'f1-score': 0.33333333333333337, 'support': 5}, 'accuracy': 0.5, 'macro avg': {'precision': 0.3709677419354839, 'recall': 0.3, 'f1-score': 0.2463768115942029, 'support': 32}, 'weighted avg': {'precision': 0.38306451612903225, 'recall': 0.5, 'f1-score': 0.35778985507246375, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.1, 'f1-score

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.5172413793103449, 'recall': 1.0, 'f1-score': 0.6818181818181819, 'support': 15}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.2222222222222222, 'f1-score': 0.3636363636363636, 'support': 9}, 'accuracy': 0.5625, 'macro avg': {'precision': 0.6293103448275862, 'recall': 0.4305555555555556, 'f1-score': 0.42803030303030304, 'support': 32}, 'weighted avg': {'precision': 0.5862068965517242, 'recall': 0.5625, 'f1-score': 0.46354166666666674, 'support': 32}}


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Test
{'Documentos/FAMILIA': {'precision': 0.6285714285714286, 'recall': 1.0, 'f1-score': 0.7719298245614035, 'support': 44}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.5333333333333333, 'f1-score': 0.6956521739130436, 'support': 15}, 'accuracy': 0.6790123456790124, 'macro avg': {'precision': 0.6571428571428571, 'recall': 0.4458333333333333, 'f1-score': 0.46689549961861176, 'support': 81}, 'weighted avg': {'precision': 0.6747795414462081, 'recall': 0.6790123456790124, 'f1-score': 0.6074036406098445, 'support': 81}}
train model
LogisticRegression() ['C', 'multi_class', 'penalty', 'solver'] [1.0, 'ovr', 'l2', 'lbfgs']
LogisticRegression(multi_class='ovr', random_state=42)


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 8}, 'accuracy': 0.5151515151515151, 'macro avg': {'precision': 0.3709677419354839, 'recall': 0.3125, 'f1-score': 0.2630434782608696, 'support': 33}, 'weighted avg': {'precision': 0.46236559139784944, 'recall': 0.5151515151515151, 'f1-score': 0.39341238471673257, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.6875, 'recall': 1.0, 'f1-score': 0.8148148148148148, 'support': 22}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'rec

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.1, 'f1-score': 0.18181818181818182, 'support': 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.1111111111111111, 'f1-score': 0.19999999999999998, 'support': 9}, 'accuracy': 0.5, 'macro avg': {'precision': 0.3709677419354839, 'recall': 0.2777777777777778, 'f1-score': 0.21304347826086956, 'support': 32}, 'weighted avg': {'precision': 0.5080645161290323, 'recall': 0.5, 'f1-score': 0.3619565217391304, 'support': 32}}
Test
{'Documentos/FAMILIA': {'precision': 0.6197183098591549, 'recall': 1.0, 'f1-score': 0.7652173913043477, 'support': 44}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.16666666666666666, 'f1-score': 0.2857142857142857, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.1, 'f1-score': 0.18181818181818182, 'support': 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.40625, 'recall': 1.0, 'f1-score': 0.5777777777777777, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'accuracy': 0.40625, 'm

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 8}, 'accuracy': 0.5151515151515151, 'macro avg': {'precision': 0.3709677419354839, 'recall': 0.3125, 'f1-score': 0.2630434782608696, 'support': 33}, 'weighted avg': {'precision': 0.46236559139784944, 'recall': 0.5151515151515151, 'f1-score': 0.39341238471673257, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.6666666666666666, 'recall': 1.0, 'f1-score': 0.8, 'support': 22}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-sco

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.1, 'f1-score': 0.18181818181818182, 'support': 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 9}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Test
{'Documentos/FAMILIA': {'precision': 0.5866666666666667, 'recall': 1.0, 'f1-score': 0.7394957983193278, 'support': 44}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.08333333333333333, 'f1-score': 0.15384615384615385, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 15}, 'accuracy': 0.6172839506172839, 'macro avg': {'precision': 0.6466666666666667, 'recall': 0.35416666666666663, 'f1-score': 0.3483354880413704, 'support': 81}, 'weighted avg': {'precision': 0.6520164609053498, 'recall': 0.6172839506172839, 'f1-score': 0.5170860366938799, 'support': 81}}
train model
LogisticRegression() ['C', 'multi_class', 'penalty', 'solver'] [0.7, 'ovr', 'l2', 'lbfgs']
LogisticRegression(C=0.7, multi_class='ovr', random_state=42)


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 8}, 'accuracy': 0.5151515151515151, 'macro avg': {'precision': 0.3709677419354839, 'recall': 0.3125, 'f1-score': 0.2630434782608696, 'support': 33}, 'weighted avg': {'precision': 0.46236559139784944, 'recall': 0.5151515151515151, 'f1-score': 0.39341238471673257, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.6666666666666666, 'recall': 1.0, 'f1-score': 0.8, 'support': 22}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-sco

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.40625, 'recall': 1.0, 'f1-score': 0.5777777777777777, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'accuracy': 0.40625, 'm

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 9}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

Test
{'Documentos/FAMILIA': {'precision': 0.5866666666666667, 'recall': 1.0, 'f1-score': 0.7394957983193278, 'support': 44}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.08333333333333333, 'f1-score': 0.15384615384615385, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 15}, 'accuracy': 0.6172839506172839, 'macro avg': {'precision': 0.6466666666666667, 'recall': 0.35416666666666663, 'f1-score': 0.3483354880413704, 'support': 81}, 'weighted avg': {'precision': 0.6520164609053498, 'recall': 0.6172839506172839, 'f1-score': 0.5170860366938799, 'support': 81}}
train model
LogisticRegression() ['C', 'multi_class', 'penalty', 'solver'] [1.0, 'ovr', 'l2', 'liblinear']
LogisticRegression(multi_class='ovr', random_state=42, solver='liblinear')
{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.652173913

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

{'Documentos/FAMILIA': {'precision': 0.6666666666666666, 'recall': 1.0, 'f1-score': 0.8, 'support': 22}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'accuracy': 0.6666666666666666, 'macro avg': {'precision': 0.16666666666666666, 'recall': 0.25, 'f1-score': 0.2, 'support': 33}, 'weighted avg': {'precision': 0.4444444444444444, 'recall': 0.6666666666666666, 'f1-score': 0.5333333333333333, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.125, 'f1-score': 0.2222222222222222, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.1, 'f1-score': 0.18181818181818182, 'support': 10}, 'accuracy': 0.4375, 'macro avg': {'precision': 0.3548387096774194, 'recall': 0.275, 'f1-score': 0.19318181818181818, 'support': 32}, 'weighted avg': {'precision': 0.4828629032258065, 'recall': 0.4375, 'f1-score': 0.296875, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'supp

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_pr

{'Documentos/FAMILIA': {'precision': 0.45454545454545453, 'recall': 1.0, 'f1-score': 0.625, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'accuracy': 0.45454545454545453, 'macro avg': {'precision': 0.11363636363636363, 'recall': 0.25, 'f1-score': 0.15625, 'support': 33}, 'weighted avg': {'precision': 0.2066115702479339, 'recall': 0.45454545454545453, 'f1-score': 0.2840909090909091, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.6666666666666666, 'recall': 1.0, 'f1-score': 0.8, 'support': 22}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 9}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
Test
{'Documentos/FAMILIA': {'precision': 0.5432098765432098, 'recall': 1.0, 'f1-score': 0.704, 'support': 44}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 15}, 'accuracy': 0.5432

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Test
{'Documentos/FAMILIA': {'precision': 0.5866666666666667, 'recall': 1.0, 'f1-score': 0.7394957983193278, 'support': 44}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.08333333333333333, 'f1-score': 0.15384615384615385, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 15}, 'accuracy': 0.6172839506172839, 'macro avg': {'precision': 0.6466666666666667, 'recall': 0.35416666666666663, 'f1-score': 0.3483354880413704, 'support': 81}, 'weighted avg': {'precision': 0.6520164609053498, 'recall': 0.6172839506172839, 'f1-score': 0.5170860366938799, 'support': 81}}
train model
SVC() ['degree', 'kernel'] [3, 'poly']
SVC(kernel='poly', random_state=42)
{'Documentos/FAMILIA': {'precision': 0.45454545454545453, 'recall': 1.0, 'f1-score': 0.625, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support'

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Test
{'Documentos/FAMILIA': {'precision': 0.5569620253164557, 'recall': 1.0, 'f1-score': 0.7154471544715447, 'support': 44}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.13333333333333333, 'f1-score': 0.23529411764705882, 'support': 15}, 'accuracy': 0.5679012345679012, 'macro avg': {'precision': 0.3892405063291139, 'recall': 0.2833333333333333, 'f1-score': 0.23768531802965087, 'support': 81}, 'weighted avg': {'precision': 0.4877324581965933, 'recall': 0.5679012345679012, 'f1-score': 0.4322109452031339, 'support': 81}}
train model
SVC() ['degree', 'kernel'] [4, 'poly']
SVC(degree=4, kernel='poly', random_state=42)
{'Documentos/FAMILIA': {'precision': 0.45454545454545453, 'recall': 1.0, 'f1-score': 0.625, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7},

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 5}, 'accuracy': 0.46875, 'macro avg': {'precision': 0.1171875, 'recall': 0.25, 'f1-score': 0.1595744680851064, 'support': 32}, 'weighted avg': {'precision': 0.2197265625, 'recall': 0.46875, 'f1-score': 0.2992021276595745, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Test
{'Documentos/FAMILIA': {'precision': 0.55, 'recall': 1.0, 'f1-score': 0.7096774193548387, 'support': 44}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.06666666666666667, 'f1-score': 0.125, 'support': 15}, 'accuracy': 0.5555555555555556, 'macro avg': {'precision': 0.3875, 'recall': 0.26666666666666666, 'f1-score': 0.2086693548387097, 'support': 81}, 'weighted avg': {'precision': 0.48395061728395067, 'recall': 0.5555555555555556, 'f1-score': 0.4086519315013939, 'support': 81}}
train model
SVC() ['degree', 'kernel'] [5, 'poly']
SVC(degree=5, kernel='poly', random_state=42)
{'Documentos/FAMILIA': {'precision': 0.45454545454545453, 'recall': 1.0, 'f1-score': 0.625, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 7}, 'Documentos/MENORES': {'precision': 0.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.41935483870967744, 'recall': 1.0, 'f1-score': 0.5909090909090909, 'support': 13}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.3333333333333333, 'f1-score': 0.5, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'accuracy': 0.4375, 'macro avg': {'precision': 0.3548387096774194, 'recall': 0.3333333333333333, 'f1-score': 0.2727272727272727, 'support': 32}, 'weighted avg': {'precision': 0.2641129032258065, 'recall': 0.4375, 'f1-score': 0.28693181818181823, 'support': 32}}
{'Documentos/FAMILIA': {'precision': 0.4838709677419355, 'recall': 1.0, 'f1-score': 0.6521739130434783, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'Documentos/MENORES': {'precision': 1.0, 'recall': 0.16666666666666666, 'f1-score': 0.2857142857142857, 'support': 6}, 'Documentos/

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'Documentos/FAMILIA': {'precision': 0.6875, 'recall': 1.0, 'f1-score': 0.8148148148148148, 'support': 22}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.6666666666666666, 'support': 2}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 3}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 6}, 'accuracy': 0.696969696969697, 'macro avg': {'precision': 0.421875, 'recall': 0.375, 'f1-score': 0.37037037037037035, 'support': 33}, 'weighted avg': {'precision': 0.5189393939393939, 'recall': 0.696969696969697, 'f1-score': 0.5836139169472502, 'support': 33}}
{'Documentos/FAMILIA': {'precision': 0.46875, 'recall': 1.0, 'f1-score': 0.6382978723404256, 'support': 15}, 'Documentos/LABORAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 8}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 4}, 'Documentos/PENAL': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Test
{'Documentos/FAMILIA': {'precision': 0.6285714285714286, 'recall': 1.0, 'f1-score': 0.7719298245614035, 'support': 44}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.25, 'f1-score': 0.4, 'support': 12}, 'Documentos/MENORES': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 10}, 'Documentos/PENAL': {'precision': 1.0, 'recall': 0.5333333333333333, 'f1-score': 0.6956521739130436, 'support': 15}, 'accuracy': 0.6790123456790124, 'macro avg': {'precision': 0.6571428571428571, 'recall': 0.4458333333333333, 'f1-score': 0.46689549961861176, 'support': 81}, 'weighted avg': {'precision': 0.6747795414462081, 'recall': 0.6790123456790124, 'f1-score': 0.6074036406098445, 'support': 81}}
train model
SVC() ['kernel'] ['sigmoid']
SVC(kernel='sigmoid', random_state=42)
{'Documentos/FAMILIA': {'precision': 0.8333333333333334, 'recall': 1.0, 'f1-score': 0.9090909090909091, 'support': 15}, 'Documentos/LABORAL': {'precision': 1.0, 'recall': 0.8571428571428571, 'f1-score': 0.92307692

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [525]:
result_logistic = toDataFrame(results , y_test.values)

Ponderado fuero
familia: 0.5432098765432098, laboral: 0.14814814814814814, menores: 0.12345679012345678, penal: 0.18518518518518517 


### Ordenamos los modelos (y los parámetros utilizados) según diferentes métricas

#### TEST: f1-score y accuracy 

Al ser ser un data set desbalanceado, tomamos el los mejores f1-score. El accuracy al ser un data set desbalanceado no es recomendable utilizarlo.

In [526]:
result_logistic[result_logistic["modo"] =="Test"].sort_values(by=['f1-score','accuracy'] , ascending = False)

Unnamed: 0,id,modelo,modo,parametros,valores,accuracy,precision,recall,f1-score,roc_penal,roc_familia,roc_laboral,roc_menores,roc_ponderado
125,SVC()_4,SVC(),Test,['kernel'],['linear'],0.901235,0.94902,0.829167,0.871345,0.959091,0.905405,0.791667,0.9,0.89783
137,SVC()_4,SVC(),Test,['kernel'],['sigmoid'],0.888889,0.944872,0.804167,0.852593,0.959091,0.891892,0.791667,0.85,0.884316
5,RandomForestClassifier()_0,RandomForestClassifier(),Test,"['criterion', 'n_estimators']","['gini', 100]",0.82716,0.939655,0.716667,0.790686,0.8,0.810811,0.833333,0.8,0.810811
11,RandomForestClassifier()_0,RandomForestClassifier(),Test,"['criterion', 'n_estimators']","['entropy', 100]",0.82716,0.939655,0.716667,0.790686,0.8,0.810811,0.833333,0.8,0.810811
17,LogisticRegression()_1,LogisticRegression(),Test,"['C', 'multi_class', 'penalty', 'solver']","[1.0, 'ovr', 'l2', 'liblinear']",0.679012,0.657143,0.445833,0.466895,0.766667,0.648649,0.625,0.5,0.648649
29,LogisticRegression()_1,LogisticRegression(),Test,"['C', 'multi_class', 'penalty', 'solver']","[1.0, 'ovr', 'l2', 'saga']",0.679012,0.657143,0.445833,0.466895,0.766667,0.648649,0.625,0.5,0.648649
65,LogisticRegression()_2,LogisticRegression(),Test,"['C', 'multi_class', 'penalty', 'solver']","[1.0, 'ovr', 'l2', 'liblinear']",0.679012,0.657143,0.445833,0.466895,0.766667,0.648649,0.625,0.5,0.648649
131,SVC()_4,SVC(),Test,['kernel'],['rbf'],0.679012,0.657143,0.445833,0.466895,0.766667,0.648649,0.625,0.5,0.648649
143,MultinomialNB()_5,MultinomialNB(),Test,['alpha'],[1.0],0.691358,0.65942,0.458333,0.466119,0.833333,0.662162,0.583333,0.5,0.662162
23,LogisticRegression()_1,LogisticRegression(),Test,"['C', 'multi_class', 'penalty', 'solver']","[1.0, 'ovr', 'l2', 'sag']",0.666667,0.65493,0.425,0.436646,0.766667,0.635135,0.583333,0.5,0.635135


#### Train: f1-score y accuracy 

Al ser ser un data set desbalanceado, tomamos el los mejores f1-score. **Notar que se hace un promedio por modelo**. Es decir se hace un promedio
resultado de aplicar todas las convinaciones de parametros por modelo

In [527]:
result_train = result_logistic[result_logistic["modo"] !="Test"]

result_train.groupby(['id']).mean().sort_values(['f1-score' ,'accuracy'], ascending=False)

Unnamed: 0_level_0,accuracy,precision,recall,f1-score,roc_penal,roc_familia,roc_laboral,roc_menores,roc_ponderado
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
RandomForestClassifier()_0,0.814867,0.856028,0.666131,0.707842,0.856667,0.809498,0.867262,0.608333,0.801956
SVC()_4,0.769003,0.794243,0.642394,0.65856,0.821296,0.772465,0.79127,0.672222,0.771918
MultinomialNB()_5,0.573106,0.434653,0.359167,0.330308,0.668333,0.587413,0.55,0.5,0.586064
LogisticRegression()_1,0.520644,0.289211,0.291667,0.230111,0.545833,0.526957,0.5375,0.5,0.528686
SVC()_3,0.505492,0.224659,0.275,0.204605,0.5,0.511146,0.5,0.55,0.512227
LogisticRegression()_2,0.506376,0.208244,0.272894,0.199497,0.518704,0.512856,0.527083,0.5,0.514459


## Resultados

Al tomar como stop words (y eliminaras del dataset) las palabras del decil más bajo en base a los valores IDF, estamos obtiendo resultados en los modelos más bajos (f1-score, accuracy, ROC_AUC más bajos). Esto puede sugerir que estamos rompiendo el overfitting. Por ejemplo, en el caso del Random Forest estamos obteniendo los siguietes resultados


| % eliminado    | f1-score     |
| :------------- | -----------: |
|  0%            | 0.962362     |
|  2.5%          | 0.911784     |
|  5%            | 0.884469     |
|  7.5%          | 0.811515     |
|  10%           | 0.707842     | 

Para obtener estos resultados, cambiar en la fila X el indice del data frame de resultado, por ejempo:


- limit = percent_df.loc[.025].values[0] para 2.5%
- limit = percent_df.loc[.05].values[0] para 5%
- limit = percent_df.loc[.1].values[0] para 10%

Para trabajar con 0%, comentar *stop_words = df_idf[df_idf['idf_weight'] <= limit ]['word'].values.tolist()*

y descomentar *#stop_words = []*


Notar que cuando eliminamos el 10% de palabras con IDF más bajos, los modelos arrojan malos resultados.