# Práctica 3: Clasificación jerárquica

## **Objetivo**: El objetivo de esta práctica es introducir los conceptos de clasificación jerárquica

La práctica se ha realizado siguiendo la opción siguiente:

**OPCIÓN 2: Comparación de métodos**

Seleccione al menos dos algoritmos de los disponibles en la bibliotecas indicadas. Seleccione al menos tres problemas
de clasificación jerárquica de los repositorios indicados.
Realice las siguientes tareas:
1. Aplique los algoritmos seleccionados a los datasets
2. Compare los resultados y explique qué conclusiones se podrían obtener

In [32]:
from hiclass import LocalClassifierPerNode
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from hiclass import metrics
import pandas as pd
import arff

In [2]:
# Establecer semilla para reproducibilidad
SEED = 751

### Preparación de datos

Se van a utilizar los datasets facilitados en las diapositivas del tema de clasificación jerárquica:
- [TEsHierarchicaldatasets](https://github.com/geantrindade/TEsHierarchicalDatasets)
    - repbase
    - mips
- [lshtc](https://www.kaggle.com/competitions/lshtc/data): Large Scale Hierarchical Text Classification (LSHTC) Challenge.  El LSHTC Challenge es una competición de clasificación de textos jerárquicos que utiliza conjuntos de datos muy grandes.
    - train - Training set
    - test - Test set
    - hierarchy - Wikipedia hierarchy
    - AllZerosBenchmark - example submission file
    - knn-baseline - A simple flat kNN baseline
    - train-remapped, test-remapped - Training and Test sets reformatted per this forum thread



Que son dos datasets preparados para aplicar clasificación jerárquica.

### TEsHierarchicalDatasets

#### MIPS

In [4]:
train_data_path = "TEsHierarchicalDatasets/TEsHierarchicalDatasets-master/arff/mips1trainatt.arff"
test_data_path = "TEsHierarchicalDatasets/TEsHierarchicalDatasets-master/arff/mips1testatt.arff"

# Cargar los datos de entrenamiento y prueba
with open(train_data_path, 'r') as f:
        train_MIPS = arff.load(f)

with open(test_data_path, 'r') as f:
        test_MIPS = arff.load(f)

#### Extracción de datos

In [9]:
# Train
train_MIPS_metadata = train_MIPS.get('attributes', [])
train_MIPS_data = train_MIPS.get('data', [])
train_MIPS_classification = train_MIPS.get('classification_hierarchical', [])

print('Classification')
print(train_MIPS_classification)
print(train_MIPS_metadata)
df_train_MIPS = pd.DataFrame(train_MIPS_data, columns=[attr[0] for attr in train_MIPS_metadata])

# Test
test_MIPS_metadata = test_MIPS.get('attributes', [])
test_MIPS_data = test_MIPS.get('data', [])
test_MIPS_classification = test_MIPS.get('classification_hierarchical', [])
df_test_MIPS = pd.DataFrame(test_MIPS_data, columns=[attr[0] for attr in test_MIPS_metadata])

Classification
[]
[('AA', 'NUMERIC'), ('AT', 'NUMERIC'), ('AC', 'NUMERIC'), ('AG', 'NUMERIC'), ('TT', 'NUMERIC'), ('TA', 'NUMERIC'), ('TC', 'NUMERIC'), ('TG', 'NUMERIC'), ('CC', 'NUMERIC'), ('CA', 'NUMERIC'), ('CG', 'NUMERIC'), ('CT', 'NUMERIC'), ('GG', 'NUMERIC'), ('GA', 'NUMERIC'), ('GT', 'NUMERIC'), ('GC', 'NUMERIC'), ('AAA', 'NUMERIC'), ('AAT', 'NUMERIC'), ('AAC', 'NUMERIC'), ('AAG', 'NUMERIC'), ('ATA', 'NUMERIC'), ('ACA', 'NUMERIC'), ('AGA', 'NUMERIC'), ('ATT', 'NUMERIC'), ('ATC', 'NUMERIC'), ('ATG', 'NUMERIC'), ('ACC', 'NUMERIC'), ('ACT', 'NUMERIC'), ('ACG', 'NUMERIC'), ('AGG', 'NUMERIC'), ('AGC', 'NUMERIC'), ('AGT', 'NUMERIC'), ('CCC', 'NUMERIC'), ('CCT', 'NUMERIC'), ('CCA', 'NUMERIC'), ('CCG', 'NUMERIC'), ('CTC', 'NUMERIC'), ('CAC', 'NUMERIC'), ('CGC', 'NUMERIC'), ('CTT', 'NUMERIC'), ('CTA', 'NUMERIC'), ('CTG', 'NUMERIC'), ('CGG', 'NUMERIC'), ('CGA', 'NUMERIC'), ('CGT', 'NUMERIC'), ('CAA', 'NUMERIC'), ('CAT', 'NUMERIC'), ('CAG', 'NUMERIC'), ('GGG', 'NUMERIC'), ('GGT', 'NUMERIC'

#### Separación de atributos y etiqueta (X e y)

In [24]:
X_train_MIPS =df_train_MIPS[df_train_MIPS.columns[:-1]]
y_train_MIPS =df_train_MIPS[df_train_MIPS.columns[-1]]
y_train_MIPS = pd.DataFrame(y_train_MIPS)

X_test_MIPS =df_test_MIPS[df_train_MIPS.columns[:-1]]
y_test_MIPS =df_test_MIPS[df_train_MIPS.columns[-1]]
y_test_MIPS = pd.DataFrame(y_test_MIPS)

#### Jerarquía

In [25]:
y_train_MIPS[['level1', 'level2', 'level3','level4']] = y_train_MIPS['classification_hierarchical'].str.split('/', expand=True)
y_train_MIPS = y_train_MIPS.drop('classification_hierarchical', axis=1)
print('columns y_train',y_train_MIPS.columns)
y_train_MIPS

columns y_train Index(['level1', 'level2', 'level3', 'level4'], dtype='object')


Unnamed: 0,level1,level2,level3,level4
0,1,4,,
1,1,4,,
2,1,4,,
3,1,4,,
4,1,4,,
...,...,...,...,...
16800,2,1,,
16801,2,1,,
16802,2,1,,
16803,2,1,1,2


In [28]:
y_test_MIPS[['level1', 'level2', 'level3','level4']] = y_test_MIPS['classification_hierarchical'].str.split('/', expand=True)
y_test_MIPS = y_test_MIPS.drop('classification_hierarchical', axis=1)
print('columns y_test',y_test_MIPS.columns)
y_test_MIPS

columns y_test Index(['level1', 'level2', 'level3', 'level4'], dtype='object')


Unnamed: 0,level1,level2,level3,level4
0,1,4,,
1,1,4,,
2,1,4,,
3,1,4,,
4,1,4,,
...,...,...,...,...
1868,2,1,1,9
1869,2,1,,
1870,2,1,,
1871,2,1,,


### REPBASE

In [5]:
train_data_path = "TEsHierarchicalDatasets/TEsHierarchicalDatasets-master/arff/repbase1trainatt.arff"
test_data_path = "TEsHierarchicalDatasets/TEsHierarchicalDatasets-master/arff/repbase1testatt.arff"

# Cargar los datos de entrenamiento y prueba
with open(train_data_path, 'r') as f:
    train_REPBASE = arff.load(f)

with open(test_data_path, 'r') as f:
    test_REPBASE = arff.load(f)

#### Extracción de Datos

In [12]:
# Train
train_REPBASE_metadata = train_REPBASE.get('attributes', [])
train_REPBASE_data = train_REPBASE.get('data', [])
train_REPBASE_classification = train_REPBASE.get('classification_hierarchical', [])

print('Classification')
print(train_REPBASE_classification)
print(train_REPBASE_metadata)
df_train_REPBASE = pd.DataFrame(train_REPBASE_data, columns=[attr[0] for attr in train_REPBASE_metadata])

# Test
test_REPBASE_metadata = test_REPBASE.get('attributes', [])
test_REPBASE_data = test_REPBASE.get('data', [])
test_REPBASE_classification = test_REPBASE.get('classification_hierarchical', [])
df_test_REPBASE = pd.DataFrame(test_REPBASE_data, columns=[attr[0] for attr in test_REPBASE_metadata])

Classification
[]
[('AA', 'NUMERIC'), ('AT', 'NUMERIC'), ('AC', 'NUMERIC'), ('AG', 'NUMERIC'), ('TT', 'NUMERIC'), ('TA', 'NUMERIC'), ('TC', 'NUMERIC'), ('TG', 'NUMERIC'), ('CC', 'NUMERIC'), ('CA', 'NUMERIC'), ('CG', 'NUMERIC'), ('CT', 'NUMERIC'), ('GG', 'NUMERIC'), ('GA', 'NUMERIC'), ('GT', 'NUMERIC'), ('GC', 'NUMERIC'), ('AAA', 'NUMERIC'), ('AAT', 'NUMERIC'), ('AAC', 'NUMERIC'), ('AAG', 'NUMERIC'), ('ATA', 'NUMERIC'), ('ACA', 'NUMERIC'), ('AGA', 'NUMERIC'), ('ATT', 'NUMERIC'), ('ATC', 'NUMERIC'), ('ATG', 'NUMERIC'), ('ACC', 'NUMERIC'), ('ACT', 'NUMERIC'), ('ACG', 'NUMERIC'), ('AGG', 'NUMERIC'), ('AGC', 'NUMERIC'), ('AGT', 'NUMERIC'), ('CCC', 'NUMERIC'), ('CCT', 'NUMERIC'), ('CCA', 'NUMERIC'), ('CCG', 'NUMERIC'), ('CTC', 'NUMERIC'), ('CAC', 'NUMERIC'), ('CGC', 'NUMERIC'), ('CTT', 'NUMERIC'), ('CTA', 'NUMERIC'), ('CTG', 'NUMERIC'), ('CGG', 'NUMERIC'), ('CGA', 'NUMERIC'), ('CGT', 'NUMERIC'), ('CAA', 'NUMERIC'), ('CAT', 'NUMERIC'), ('CAG', 'NUMERIC'), ('GGG', 'NUMERIC'), ('GGT', 'NUMERIC'

#### Separación de atributos y etiqueta (X e y)

In [14]:
X_train_REPBASE =df_train_REPBASE[df_train_REPBASE.columns[:-1]]
y_train_REPBASE =df_train_REPBASE[df_train_REPBASE.columns[-1]]
y_train_REPBASE = pd.DataFrame(y_train_REPBASE)

X_test_REPBASE =df_test_REPBASE[df_train_REPBASE.columns[:-1]]
y_test_REPBASE =df_test_REPBASE[df_train_REPBASE.columns[-1]]
y_test_REPBASE = pd.DataFrame(y_test_REPBASE)

#### Jerarquía

In [26]:
y_train_REPBASE[['level1', 'level2', 'level3','level4']] = y_train_REPBASE['classification_hierarchical'].str.split('/', expand=True)
y_train_REPBASE = y_train_REPBASE.drop('classification_hierarchical', axis=1)
print('columns y_train',y_train_REPBASE.columns)
y_train_REPBASE

columns y_train Index(['level1', 'level2', 'level3', 'level4'], dtype='object')


Unnamed: 0,level1,level2,level3,level4
0,1,4,3,
1,1,4,3,
2,1,4,3,
3,1,4,3,
4,1,4,3,
...,...,...,...,...
31086,1,1,1,
31087,1,1,1,
31088,1,1,1,
31089,1,1,1,


In [27]:
y_test_REPBASE[['level1', 'level2', 'level3','level4']] = y_test_REPBASE['classification_hierarchical'].str.split('/', expand=True)
y_test_REPBASE = y_test_REPBASE.drop('classification_hierarchical', axis=1)
print('columns y_test',y_test_REPBASE.columns)
y_test_REPBASE

columns y_test Index(['level1', 'level2', 'level3', 'level4'], dtype='object')


Unnamed: 0,level1,level2,level3,level4
0,1,4,3,
1,1,4,3,
2,1,4,3,
3,1,4,3,
4,1,4,3,
...,...,...,...,...
3463,1,1,,
3464,1,1,,
3465,1,1,2,
3466,1,1,2,


## Modelos

Función para mostrar las métricas obtenidas con las predicciones del modelo entrenado

In [29]:
def metrics_calculation(Y_test, y_pred_df):
    Y_test['level3'] = Y_test['level3'].replace({'None': '-1', None: '-1'})
    Y_test['level4'] = Y_test['level4'].replace({'None': '-1', None: '-1'})

    # Ensure the 'level3' and 'level4' columns contain integer values
    Y_test['level3'] = Y_test['level3'].astype(int)
    Y_test['level4'] = Y_test['level4'].astype(int)

    # Replace 'None' values with -1 for level3 and level4 in y_pred_df
    y_pred_df['level3'] = y_pred_df['level3'].replace({'None': -1, None: '-1'})
    y_pred_df['level4'] = y_pred_df['level4'].replace({'None': -1, None: '-1'})

    # Ensure the 'level3' and 'level4' columns contain integer values in y_pred_df
    y_pred_df['level3'] = y_pred_df['level3'].astype(int)
    y_pred_df['level4'] = y_pred_df['level4'].astype(int)

    # Debugging code to print unique values in level3 and level4 columns
    unique_values_level3 = Y_test['level3'].unique()
    unique_values_level4 = Y_test['level4'].unique()
    print("Unique values in Y_test['level3']:", unique_values_level3)
    print("Unique values in Y_test['level4']:", unique_values_level4)

    unique_values_level3 = y_pred_df['level3'].unique()
    unique_values_level4 = y_pred_df['level4'].unique()
    print("Unique values in y_pred['level3']:", unique_values_level3)
    print("Unique values in y_pred['level4']:", unique_values_level4)

    # Calculate metrics for level1
    accuracy_score_level_1 = accuracy_score(Y_test['level1'], y_pred_df['level1'])
    precision_score_level_1 = precision_score(Y_test['level1'], y_pred_df['level1'], average='weighted')
    recall_score_level_1 = recall_score(Y_test['level1'], y_pred_df['level1'], average='weighted')
    f1_score_level_1 = f1_score(Y_test['level1'], y_pred_df['level1'], average='weighted')

    print(f'Metrics for level 1:')
    print(f'Accuracy: {accuracy_score_level_1}')
    print(f'Precision: {precision_score_level_1}')
    print(f'Recall: {recall_score_level_1}')
    print(f'F1 Score: {f1_score_level_1}')

    # Calculate metrics for level2
    accuracy_score_level_2 = accuracy_score(Y_test['level2'], y_pred_df['level2'])
    precision_score_level_2 = precision_score(Y_test['level2'], y_pred_df['level2'], average='weighted')
    recall_score_level_2 = recall_score(Y_test['level2'], y_pred_df['level2'], average='weighted')
    f1_score_level_2 = f1_score(Y_test['level2'], y_pred_df['level2'], average='weighted')

    print(f'Metrics for level 2:')
    print(f'Accuracy: {accuracy_score_level_2}')
    print(f'Precision: {precision_score_level_2}')
    print(f'Recall: {recall_score_level_2}')
    print(f'F1 Score: {f1_score_level_2}')

    # Calculate metrics for level3
    accuracy_score_level_3 = accuracy_score(Y_test['level3'].astype(int), y_pred_df['level3'].astype(int))
    precision_score_level_3 = precision_score(Y_test['level3'].astype(int), y_pred_df['level3'].astype(int), average='weighted')
    recall_score_level_3 = recall_score(Y_test['level3'].astype(int), y_pred_df['level3'].astype(int), average='weighted')
    f1_score_level_3 = f1_score(Y_test['level3'].astype(int), y_pred_df['level3'].astype(int), average='weighted')

    print(f'Metrics for level 3:')
    print(f'Accuracy: {accuracy_score_level_3}')
    print(f'Precision: {precision_score_level_3}')
    print(f'Recall: {recall_score_level_3}')
    print(f'F1 Score: {f1_score_level_3}')

    # Calculate metrics for level4
    accuracy_score_level_4 = accuracy_score(Y_test['level4'].astype(int), y_pred_df['level4'].astype(int))
    precision_score_level_4 = precision_score(Y_test['level4'].astype(int), y_pred_df['level4'].astype(int), average='weighted')
    recall_score_level_4 = recall_score(Y_test['level4'].astype(int), y_pred_df['level4'].astype(int), average='weighted')
    f1_score_level_4 = f1_score(Y_test['level4'].astype(int), y_pred_df['level4'].astype(int), average='weighted')

    print(f'Metrics for level 4:')
    print(f'Accuracy: {accuracy_score_level_4}')
    print(f'Precision: {precision_score_level_4}')
    print(f'Recall: {recall_score_level_4}')
    print(f'F1 Score: {f1_score_level_4}')

    ######Métricas hiclass
    ##accuracy_hc
    precision_hc = metrics.precision(Y_test, y_pred_df)
    precision_macro_hc = metrics._precision_macro(Y_test, y_pred_df)
    precision_micro_hc = metrics._precision_micro(Y_test, y_pred_df)
    print("precision_hc:  ", precision_hc," precision_macro_hc: ",precision_macro_hc,"  precision_micro_hc: ", precision_micro_hc)

    recall_hc = metrics.recall(Y_test, y_pred_df)
    recall_macro_hc = metrics._recall_macro(Y_test, y_pred_df)
    recall_micro_hc = metrics._recall_micro(Y_test, y_pred_df)
    print("recall_hc:  ", recall_hc," recall_macro_hc: ",recall_macro_hc,"  recall_micro_hc: ", recall_micro_hc)

    f1_score_hc = metrics.f1(Y_test, y_pred_df)
    f1_score_macro_hc = metrics._f_score_macro(Y_test, y_pred_df)
    f1_score_micro_hc = metrics._f_score_micro(Y_test, y_pred_df)
    print("f1_score_hc:  ", f1_score_hc," f1_score_macro_hc: ",f1_score_macro_hc,"  f1_score_micro_hc: ", f1_score_micro_hc)

## MIPS

### RandomForest

In [30]:
# Use random forest classifiers for every node
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(local_classifier=rf)

# Train local classifier per node
classifier.fit(X_train_MIPS, y_train_MIPS)

# Predict
y_pred = classifier.predict(X_test_MIPS)
#print('Predictions:')
#print(y_pred)

###Calcula metricas por level de la jerarquia
y_pred_df=pd.DataFrame(y_pred,columns=['level1','level2','level3','level4'])

metrics_calculation(y_test_MIPS, y_pred_df)

Predictions:
[['1' '1' '2' 'None']
 ['1' '4' 'None' 'None']
 ['1' '4' 'None' 'None']
 ...
 ['2' '1' 'None' 'None']
 ['1' '1' '2' 'None']
 ['1' '1' '1' 'None']]
Unique values in Y_test['level3']: [-1  1  2]
Unique values in Y_test['level4']: [-1  1  8  2  9  3]
Unique values in y_pred['level3']: [ 2 -1  1]
Unique values in y_pred['level4']: [-1  3  8  9  1]
Metrics for level 1:
Accuracy: 0.9386011745862253
Precision: 0.9367612486110916
Recall: 0.9386011745862253
F1 Score: 0.9343514107365632
Metrics for level 2:
Accuracy: 0.9802455953016551
Precision: 0.9806423993048249
Recall: 0.9802455953016551
F1 Score: 0.9764610619904538
Metrics for level 3:
Accuracy: 0.788574479444741
Precision: 0.7899606228906733
Recall: 0.788574479444741
F1 Score: 0.7838892630904393
Metrics for level 4:
Accuracy: 0.9369994660971703
Precision: 0.9240742069929473
Recall: 0.9369994660971703
F1 Score: 0.924882162293221


  _warn_prf(average, modifier, msg_start, len(result))


precision_hc:   0.9448091916290521  precision_macro_hc:  0.002135611318739989   precision_micro_hc:  1.0
recall_hc:   0.9585761865112407  recall_macro_hc:  0.002135611318739989   recall_micro_hc:  1.0
f1_score_hc:   0.9516429014259145  f1_score_macro_hc:  0.002135611318739989   f1_score_micro_hc:  0.9516429014259145


### AdaBoostClassifier

In [33]:
ada = AdaBoostClassifier()
classifier = LocalClassifierPerNode(local_classifier=ada)

# Train local classifier per node
classifier.fit(X_train_MIPS, y_train_MIPS)

# Predict
y_pred = classifier.predict(X_test_MIPS)
# print('Predictions:')
# print(y_pred)

# Calcula metricas por level de la jerarquia
y_pred_df=pd.DataFrame(y_pred,columns=['level1','level2','level3','level4'])

metrics_calculation(y_test_MIPS, y_pred_df)

Unique values in Y_test['level3']: [-1  1  2]
Unique values in Y_test['level4']: [-1  1  8  2  9  3]
Unique values in y_pred['level3']: [ 2 -1  1]
Unique values in y_pred['level4']: [-1  3  9  2  8  1]
Metrics for level 1:
Accuracy: 0.9167111585691404
Precision: 0.9112002314323564
Recall: 0.9167111585691404
F1 Score: 0.9113986360865303
Metrics for level 2:
Accuracy: 0.9733048585157501
Precision: 0.9697737786339952
Recall: 0.9733048585157501
F1 Score: 0.9705978810719436
Metrics for level 3:
Accuracy: 0.7197010144153764
Precision: 0.7175238631094614
Recall: 0.7197010144153764
F1 Score: 0.7172283658110912
Metrics for level 4:
Accuracy: 0.9145755472504005
Precision: 0.8886937402055967
Recall: 0.9145755472504005
F1 Score: 0.8969793044909753
precision_hc:   0.9314784053156147  precision_macro_hc:  0.002135611318739989   precision_micro_hc:  1.0
recall_hc:   0.9338051623646961  recall_macro_hc:  0.002135611318739989   recall_micro_hc:  1.0
f1_score_hc:   0.9326403326403327  f1_score_macro_hc:

### Gradient Boosting

In [34]:
grad = GradientBoostingClassifier()
classifier = LocalClassifierPerNode(local_classifier=grad)

# Train local classifier per node
classifier.fit(X_train_MIPS, y_train_MIPS)

# Predict
y_pred = classifier.predict(X_test_MIPS)

# print('Predictions:')
# print(y_pred)

# Calcula metricas por level de la jerarquia
y_pred_df=pd.DataFrame(y_pred,columns=['level1','level2','level3','level4'])

metrics_calculation(y_test_MIPS, y_pred_df)

### REPBASE

#### Random Forest

In [31]:
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(local_classifier=rf)

# Train local classifier per node
classifier.fit(X_train_REPBASE, y_train_REPBASE)

# Predict
y_pred = classifier.predict(X_test_REPBASE)
print('Predictions:')
print(y_pred)

###Calcula metricas por level de la jerarquia
y_pred_df=pd.DataFrame(y_pred,columns=['level1','level2','level3','level4'])

metrics_calculation(y_test_REPBASE, y_pred_df)

Predictions:
[['1' '1' '2' 'None']
 ['1' '4' '3' 'None']
 ['1' '4' '3' 'None']
 ...
 ['1' '1' '2' 'None']
 ['1' '1' '2' 'None']
 ['1' '1' '2' 'None']]
Unique values in Y_test['level3']: [ 3  1 -1  2  5  4]
Unique values in Y_test['level4']: [-1  6  3  2  1  9  8  7  4  5]
Unique values in y_pred['level3']: [ 2  3  1 -1  4  5]
Unique values in y_pred['level4']: [-1  2  1  6  3  9  8  7]
Metrics for level 1:
Accuracy: 0.9025374855824683
Precision: 0.9033175832871484
Recall: 0.9025374855824683
F1 Score: 0.9008468480450994
Metrics for level 2:
Accuracy: 0.9495386389850058
Precision: 0.9495628345549899
Recall: 0.9495386389850058
F1 Score: 0.9434506927240034
Metrics for level 3:
Accuracy: 0.701845444059977
Precision: 0.718658236878494
Recall: 0.701845444059977
F1 Score: 0.6877499337955144
Metrics for level 4:
Accuracy: 0.8143021914648212
Precision: 0.8275831298149581
Recall: 0.8143021914648212
F1 Score: 0.8110791577916809
precision_hc:   0.9234799694356511  precision_macro_hc:  0.00115340253

  _warn_prf(average, modifier, msg_start, len(result))


recall_hc:   0.905975583636753  recall_macro_hc:  0.0011534025374855825   recall_micro_hc:  1.0
f1_score_hc:   0.9146440348126926  f1_score_macro_hc:  0.0011534025374855825   f1_score_micro_hc:  0.9146440348126926


#### Ada Boost

In [None]:
ada = AdaBoostClassifier()
classifier = LocalClassifierPerNode(local_classifier=ada)

# Train local classifier per node
classifier.fit(X_train_REPBASE, y_train_REPBASE)

# Predict
y_pred = classifier.predict(X_test_REPBASE)
# print('Predictions:')
# print(y_pred)

# Calcula metricas por level de la jerarquia
y_pred_df=pd.DataFrame(y_pred,columns=['level1','level2','level3','level4'])

metrics_calculation(y_test_REPBASE, y_pred_df)

{'<ROOT>': {}}


#### Gradient Boost

In [None]:
grad = GradientBoostingClassifier()
classifier = LocalClassifierPerNode(local_classifier=grad)

# Train local classifier per node
classifier.fit(X_train_REPBASE, y_train_REPBASE)

# Predict
y_pred = classifier.predict(X_test_REPBASE)

# print('Predictions:')
# print(y_pred)

# Calcula metricas por level de la jerarquia
y_pred_df=pd.DataFrame(y_pred,columns=['level1','level2','level3','level4'])

metrics_calculation(y_test_REPBASE, y_pred_df)