# Masterclass: Pipelines de Sklearn (II)

Ya tenemos nuestros modelos entrenados (sin usar y usando pipelines) es hora de recuperarlos para poder predecir con datos nuevos y fresquitos que nos acaban de llegar

In [1]:
import joblib
import numpy as np
import pandas as pd
import pickle

from matplotlib import pyplot as plt
from sklearn.metrics import classification_report

## Cargamos datos y modelos

In [2]:
dataset_new = pd.read_csv("./data/titanic_test.csv")

X_test = dataset_new
y_test = dataset_new.Survived


In [3]:
# Recuperamos el modelo de pipelines (version pickle)
with open('modelo_pipeline.pkl', 'rb') as archivo: # rb = read binary
    modelo_pipeline = pickle.load(archivo)


# Recuperamos el modelo sin pipelines (version joblib, para ver las dos formas)
modelo_funciones = joblib.load('modelo_funciones.joblib')

In [4]:
modelo_pipeline

In [5]:
modelo_funciones

## Hora de probar: Pipeline

In [6]:
X_test

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,0,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,1,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S
2,894,0,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,0,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,1,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
413,1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
414,1306,1,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C
415,1307,0,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S
416,1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


In [7]:
modelo_pipeline.predict(X_test)

array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1,
       1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1,
       1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
       1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,

In [8]:
print(classification_report(y_test, modelo_pipeline.predict(X_test)))

              precision    recall  f1-score   support

           0       0.80      0.81      0.81       260
           1       0.68      0.67      0.68       158

    accuracy                           0.76       418
   macro avg       0.74      0.74      0.74       418
weighted avg       0.76      0.76      0.76       418



## Hora de probar: Filtrado con funciones...

In [9]:
print(classification_report(y_test, modelo_funciones.predict(X_test)))

ValueError: DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, the experimental DMatrix parameter`enable_categorical` must be set to `True`.  Invalid columns:Name: object, Sex: object, Ticket: object, Cabin: object, Embarked: object

Ahhh, ¡¡Miércoles!! tengo que recuperar todas las funciones que creamos en su momento... Bueno, mejor para otro día.

###### 