### Exportar un modelo de sklearn con **pickle**

El siguiente ejemplo demuestra cómo puede entrenar un modelo de regresión logística usando el conjunto de datos de la diabetes en Pima India. Guardaremos el modelo en un archivo y lo cargaremos para hacer predicciones en el dataset test

Importamos las librerías necesarias. El paquete que se encargará del proceso de guardado del modelo es **pickle**

In [3]:
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
import pickle

Leemos el archivo csv desde una URL

In [5]:
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
dataframe.head(5)


Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [6]:
dataframe.describe()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


Convertimos los datos del dataframe a array, para que la regresión logística pueda trabajar eficientemente.

In [12]:
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]

Separamos la data de entrenamiento y data test

In [13]:
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)

Creamos y entrenamos el modelo

In [14]:
model = LogisticRegression()
model.fit(X_train, Y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

Guardamos el modelo en nuestro disco  (wb indica escritura)

In [15]:
filename = 'finalized_model.sav'
pickle.dump(model, open(filename, 'wb'))

Un tiempo después... cargamos el modelo desde nuestro disco y sacamos el score

In [16]:
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, Y_test)
print(result)

0.7559055118110236
