# Proyecto 7: Deploy de sistema de recomendación con Watson

En este proyecto llevaremos a cabo la puesta en producción del modelo entrenado en el proyecto 5. Es decir, lo subirmos la nube de IBM y utilizando llamados a la API de Watson tendremos acceso a él para realizar predicciones.

In [2]:
import warnings
warnings.filterwarnings("ignore")
import sklearn
from sklearn.datasets import load_files
moviedir = r'./dataset/movie_reviews' 
movie_reviews = load_files(moviedir, shuffle=True)

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    movie_reviews.data, movie_reviews.target, test_size = 0.20, stratify=movie_reviews.target, random_state = 12)

In [4]:
from sklearn.externals import joblib
eclf = joblib.load('sentiment.pkl') 

IBM Watson

**1) Cargamos la biblioteca `WatsonMachineLearningAPIClient`

In [4]:
!pip uninstall watson-machine-learning-client -y

Uninstalling watson-machine-learning-client-1.0.375:
  Successfully uninstalled watson-machine-learning-client-1.0.375


In [5]:
!pip install watson-machine-learning-client-V4

Collecting watson-machine-learning-client-V4
  Downloading https://files.pythonhosted.org/packages/ca/2b/a6be1a2b36138835a66b007f128ffa1ab52398178c4b564534fdc1d30743/watson_machine_learning_client_V4-1.0.34-py3-none-any.whl (974kB)
Installing collected packages: watson-machine-learning-client-V4
Successfully installed watson-machine-learning-client-V4-1.0.34


In [5]:
import warnings
warnings.filterwarnings("ignore")

import json
#import numpy as np
#import pandas as pd
from watson_machine_learning_client import WatsonMachineLearningAPIClient

**2) Creámos la variable con las credenciales que necesita `Watson`. 

In [6]:
wml_credentials={
  "apikey": "cKRbMjSYKI_KXy98jyuVM-AUzhSH2dadrD9abARoK2VB",
  "iam_apikey_description": "Auto-generated for key 7149a15e-ac7a-4bf7-a6eb-b1e6e4bc8ffe",
  "iam_apikey_name": "Credenciales de servicio-1",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/086932dd35c7421f98e661d5c0f7d4be::serviceid:ServiceId-e4e224eb-1719-42a8-9412-c4f65f4475a4",
  "instance_id": "8d0f96fa-4b35-4947-b148-7a96af547eb1",
  "url": "https://us-south.ml.cloud.ibm.com"
}

**3) Declaro** la variable `client` y guardá en ella al objeto `WatsonMachineLearningAPIClient` con las credenciales como parámetro

In [7]:
client = WatsonMachineLearningAPIClient(wml_credentials)

**4) Creo** una variable que guarde las propiedades del modelo. Datos del autor y nombre del proyecto.

In [8]:
model_props = {
    client.repository.ModelMetaNames.NAME: "Voting",
    client.repository.ModelMetaNames.DESCRIPTION: "Clasificación de reviews",
    client.repository.ModelMetaNames.TYPE: "scikit-learn_0.20", 
    client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.20-py3" 
              }

**5) Se crea un pipeline que contenga como primer paso a un `TfidfVectorizer` y como segundo paso, al mejor modelo que hayas obtenido en el proyecto 5. **Entrená** con este pipeline.

In [9]:
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
vect = TfidfVectorizer()
rf_clf = RandomForestClassifier()
ada_clf = ada_clf = AdaBoostClassifier(
     DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R")
mlp_clf = MLPClassifier(random_state=42)

clf = VotingClassifier(
estimators=[ ('rf', rf_clf) , ('ada' , ada_clf) , ('mlp', mlp_clf)],voting='soft')

pipeline = make_pipeline(vect, clf)
pipeline.fit(X_train, y_train)

Pipeline(memory=None,
     steps=[('tfidfvectorizer', TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.float64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), norm='l2', preprocessor=None, smooth...e, warm_start=False))],
         flatten_transform=None, n_jobs=None, voting='soft', weights=None))])

**6) Subí** al modelo a IBM Cloud usando `client.repository.store_model` con los parámetros correctos.

In [10]:
published_model = client.repository.store_model(model=pipeline, 
                                                meta_props=model_props, 
                                                training_data=X_train, 
                                                training_target=y_train)

**7) Obtenemos** el `uid` del modelo y guardalo en una variable.

In [11]:
published_model_uid = client.repository.get_model_uid(published_model)

In [12]:
models_details = client.repository.list_models()

------------------------------------  ------  ------------------------  -----------------
GUID                                  NAME    CREATED                   TYPE
c2d9e717-8e50-4743-9c71-48c696cd442b  Voting  2019-09-08T18:30:42.385Z  scikit-learn_0.20
e52fd9ff-95c3-41f7-93ea-b2be1de48719  Voting  2019-09-07T01:20:36.904Z  scikit-learn_0.20
4796c63b-8ece-4211-a95c-5fdc7cb783b4  Voting  2019-09-07T01:00:13.924Z  scikit-learn_0.20
------------------------------------  ------  ------------------------  -----------------


**8) Cargamos** el modelo basado en su `uid` y utilizalo para realizar la predicción sobre el conjunto de test

In [None]:
loaded_model = client.repository.load(published_model_uid)
test_predictions = loaded_model.predict(X_test)

**9) Mostrar** el `classification_report` obtenido

In [35]:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import classification_report

print("roc_auc score: ", roc_auc_score(y_test, test_predictions))
print(classification_report(y_test, test_predictions))

roc_auc score:  0.8499999999999999
              precision    recall  f1-score   support

           0       0.86      0.84      0.85       200
           1       0.84      0.86      0.85       200

   micro avg       0.85      0.85      0.85       400
   macro avg       0.85      0.85      0.85       400
weighted avg       0.85      0.85      0.85       400

