# Build and deploy an SMS spam classifier with Watson Machine Learning
**An Introduction to the Watson Machine Learning Python Client** <br> <br>
This notebook will show you how to build and deploy an SMS Spam Classifer with Watson Machine Learning and IBM Watson Studio. <br> We will use the new <a href="http://wml-api-pyclient.mybluemix.net/" target="_blank" rel="noopener noreferrer">Watson Machine Learning API Client for Python</a> which is available on `PyPi`. 
______
This notebook was tested in a `Python 3.6` Environment. 
This notebook can be used as a companion to another <a href="https://medium.com/@adammassachi/dsx-hybrid-mode-91b580450c5b" target="_blank" rel="noopener noreferrer">tutorial on our blog</a>.  <br>


## Table of Contents
1. [Load data](#load)
2. [Build model](#build)
3. [Save and deploy](#save)
4. [Make API requests](#api)

_____

## 1. Load data <a id="load"></a>
First, install the Watson Machine Learning library via `pip` if you have not yet installed it. <br> We will use this library to communicate with Watson Machine Learning. The `python client` allows anyone with a Watson Machine Learning instance to programmatically save, load, and deploy models, among other tasks. 

In [1]:
!pip install watson-machine-learning-client --upgrade


Requirement already up-to-date: watson-machine-learning-client in /opt/conda/envs/Python36/lib/python3.6/site-packages (1.0.378)


The data we are looking to classify are SMS Messages which have been labeled `spam` or `ham`. You can find the data in the community. You can either download the data set to your environment and add it to your project, or you can read the data directly into a data frame as shown below. For more details on loading and accessing data, see <a href="https://datascience.ibm.com/docs/content/analyze-data/load-and-access-data.html" target="_blank" rel="noopener noreferrer">Load and access data in a notebook</a>.

<!--comment: Simon, here you need to tell them how to access data. They can either (1) go to the community and download the csv with the 'add data' link above, 
or (2) import the csv directly as in the cell below. Either way, we should be explicit...  I'm trying to find a good example, and frankly, we don't have a really good one.
Having said that, you can have a look at https://github.com/IBMDataScience/sample-notebooks/blob/ec37c3f56f33cf8aa85c00e9081f13012ffd8fd5/Cloud/IPYNB/Watson%2Bconversation%2Bservice.ipynb
-->

In [2]:
import pandas as pd
df = pd.read_csv("https://dataplatform.ibm.com/exchange-api/v1/entries/e39fb7848165baca7fc0395025ba4e48/data?accessKey=36100ef896c27e41fdfc4a3029071d50")

Our first step will be converting the string label into a numeric representation. <br> 
We can use a `pandas.Series method`,  `factorize()[0]`, to convert strings into numeric factors.

In [3]:
df = df[df.columns[:2]]
df.columns = ['ham', 'text']
df['label'] = df.ham.factorize()[0]
df['text'] = df.text.apply(lambda x: x.lower())

In [4]:
df.head()

Unnamed: 0,ham,text,label
0,ham,"go until jurong point, crazy.. available only ...",0
1,ham,ok lar... joking wif u oni...,0
2,spam,free entry in 2 a wkly comp to win fa cup fina...,1
3,ham,u dun say so early hor... u c already then say...,0
4,ham,"nah i don't think he goes to usf, he lives aro...",0


<a id="build"></a>
## 2. Build a model 
We will use `scikit-learn` to create a `Naive Bayes` model. <br>We will use the `HashingVectorizer`, which converts the SMS’ text into a matrix representation suitable for modeling.

In [5]:
from sklearn.feature_extraction.text import HashingVectorizer
vectorizer = HashingVectorizer(n_features=5000, stop_words='english', non_negative=True)

We need to connect the output of the `vectorizer` to the input of the model. We will use `Multinomial Naive Bayes`, a Naive Bayes classifier which works well with the representation of our features  —  integer representations of the word frequencies.


Next, we will use `train_test_split` in order to divide the data into `testing` and `training` sets so that we can evaluate the performance of the model.

In [6]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df['text'], df['label'], random_state=0)

Next, we need to transform the text and fit the model.

In [None]:
# first transform the text data
transformed_x = vectorizer.fit_transform(x_train)

# import the modules and fit
from sklearn.naive_bayes import MultinomialNB
bn = MultinomialNB().fit(transformed_x, y_train)

We’ve got a fit model in `bn`. Let’s evaluate the performance on the test data after creating the pipeline.

In [8]:
# make a pipe
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(vectorizer, bn)

The pipe will sequentially transform the data according to the transformers specified, terminating in what scikit-learn calls an estimator. <br>Then, we can call predict or score, and so on.

In [9]:
pipe.predict_proba(["URGENT! You have built a model - scroll down to see more"])



array([[0.79965197, 0.20034803]])

Let's score.

In [10]:
pipe.score(x_test, y_test)



0.9619799139167863

`96% accuracy`, not bad. You can experiment with different numbers of features and vectorizers for your model. You can also create other features that are not captured by the vectorizer, such as the length of the message. 

<a id="save"></a>
## 3. Save and deploy
After creating a model, you might want to make use of its predictions later. In order to do this, we will persist models with Watson Machine Learning. With WML, you can easily `save` and `deploy` models, among other powerful features. `Saving` a model makes this model portable -- as long as you can connect to WML, you can load your saved models into your environment. `Deploying` a model exposes the predictive capacity of the model as an API endpoint, which you can consume in applications, for example. 

Use the client to save your model to the WML Repository. From there, you can load and deploy models as well. If you don't already have a WML account, you can get more details <a href="https://dataplatform.ibm.com/docs/content/analyze-data/ml-setup.html" target="_blank" rel="noopener noreferrer">here</a>.

First, we will import the library and specify our credentials. If you don't know where to find your credentials, they are available to you both in the Watson Studio Project and in IBM Cloud. 

In [11]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

wml_credentials = {
    "apikey": "",
    "username": "",
    "password": "",
    "instance_id": "",
    "url": ""
}

In [13]:
client = WatsonMachineLearningAPIClient(wml_credentials)

We have connected to our WML Repository using the python package and our credentials. Publishing the model will save the model to our repository. 

In [14]:
metadata = {
            client.repository.ModelMetaNames.NAME: 'SPAM_MODEL',
            client.repository.ModelMetaNames.FRAMEWORK_NAME: 'scikit-learn',
            client.repository.ModelMetaNames.RUNTIME_NAME: 'python',
            client.repository.ModelMetaNames.RUNTIME_VERSION: '3.6'
}

In [15]:
# publish model 
published_model_details = client.repository.store_model(model=pipe, meta_props=metadata, training_data=pd.DataFrame(df['text']), training_target=df['label'])

Now that we have saved the model to the repository, we can load it into a python object using it's `uid`. First, let's look at the code.

In [16]:
# get my model ide
guid = client.repository.get_model_uid(published_model_details)
guid

'4bd39f73-bfff-4660-8e5a-141f18c8c196'

Load the model using its id and the `repository.load()` function. 

In [17]:
mod = client.repository.load(guid)
type(mod)

sklearn.pipeline.Pipeline

Notice the type is an `sklearn` model, just as we created. 

In [18]:
# verify that the loaded model returns the same predictions as the model we developed in our environment. 
mod.predict_proba(["URGENT! You have built a model - scroll down to see more"])



array([[0.79965197, 0.20034803]])

<a id="api"></a>
## 4. Test the API
Now that we have successfully saved and loaded the model, we can create an API endpoint and make requests. 

In [19]:
scoring_endpoint = client.deployments.create(name="SPAM-NEW", model_uid=guid)



#######################################################################################

Synchronous deployment creation for uid: '4bd39f73-bfff-4660-8e5a-141f18c8c196' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='bf6c81d1-10e4-4917-b3fb-39cdddbdeb84'
------------------------------------------------------------------------------------------------




In [20]:
scoring_endpoint_url = scoring_endpoint['entity']['scoring_url']

Let's create some JSON to send. We will use `client.deployments.score(scoring_url, payload)`. <a href="http://wml-api-pyclient.mybluemix.net/" target="_blank" rel="noopener noreferrer">Read our docs</a> for more details. 

In [21]:
# create a payload
my_sms = "Send me to the API por favor"
# list of lists
payload = {"fields": ["text"], "values": [[my_sms]]}

# make a request
response = client.deployments.score(scoring_url=scoring_endpoint_url, payload=payload)

Let's check out the response.

In [22]:
response

{'fields': ['prediction', 'probability'],
 'values': [[0, [0.8211305335459812, 0.17886946645401927]]]}



____________

### Author
Adam Massachi is a Data Scientist with the Watson Studio team at IBM. Before IBM, he worked on political campaigns, building and managing large volunteer operations and organizing campaign finance initiatives. Say hello <a href="https://twitter.com/adammassach?lang=en" target="_blank" rel="noopener noreferrer">@adammassach</a>!

Copyright © IBM Corp. 2018, 2019. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>