<img style="float: right;" src="./assets/solutions-microsoft-logo-small.png">

# AI on IaaS++

## Microsoft Cloud and AI Team

The Team Data Science Process (TDSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. TDSP helps improve team collaboration and learning. It contains a distillation of the best practices and structures from Microsoft and others in the industry that facilitate the successful implementation of data science initiatives. The goal is to help companies fully realize the benefits of their analytics program.

TDSP comprises of the following key components:

 - A data science lifecycle definition
 - A standardized project structure
    Infrastructure and resources for data science projects
    Tools and utilities for project execution
    
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png">**Note:** 
    
*You can follow a complete example of this process using Azure Machine Learning* 
</br>

- ["Biomedical entity recognition using Team Data Science Process (TDSP) Template"](https://docs.microsoft.com/en-us/azure/machine-learning/preview/scenario-tdsp-biomedical-recognition?toc=%2Fen-us%2Fazure%2Fmachine-learning%2Fteam-data-science-process%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json)</p>

*This workshop guides you through a series of exercises you can use to learn to implement the TDSP in your Data Science project, using only Python in a Notebook. You can change the **Setup** and **Lab** cells in this Notebook to use another language, another platform, and with more or fewer prompts based on your audience's needs.*

For the labs below, Look for the sections marked: 

`# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>`

There may be one line needed, but most often more than that - read the entire code snippet to see what you need to do. 

[Try to figure out the labs yourself, then search the web, then ask your neighbor - and if you're really stuck, check the answer-sheet](.\AnswerKey.txt) 

TODO: 
3.	Deploying models: 
a.	AzureML Container/Kubernetes  based deployment (Realtime now, Batch coming soon) – Maybe a pointer to existing AzureML material 
b.	Deploying DL/Spark models for batch scoring on aztk/BatchAI/Databricks
c.	Deploying models on Tensorflow Serving
d.	Other ways to deploy models – Natively on DSVM (for small data) using Flask etc, Azure Functions, App Services
e.	Future ONNX Model export and deployment (TBD)
 
    
<p style="border-bottom: 3px solid lightgrey;"></p>

<p style="border-bottom: 3px solid lightgrey;"></p> 


REFRENCES: IOT - https://github.com/Azure/ai-toolkit-iot-edge 
<h1><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/check.png">Phase Four - Deployment</h1>

Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-deployment)

**Goal**
 - Deploy models with a data pipeline to a production or production-like environment for final user acceptance

**How to do it**
  - Deploy the model and pipeline to a production or production-like environment for application consumption

<p><img style="float: right; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png"><b>Using Azure Machine Learning for this Phase:</b></p>

<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Deploy models in production](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/deploy-models-in-production)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Configure your environment to operationalize](https://docs.microsoft.com/en-us/azure/machine-learning/preview/cli-for-azure-machine-learning#o16n)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Model management](https://docs.microsoft.com/en-us/azure/machine-learning/preview/deployment-setup-configuration)</p>

<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 4.0 - Operationalize
Instructions:
1. Deploy the model and pipeline to a production or production-like environment for application consumption to one or more targets:
  - Online websites
  - Spreadsheets
  - Dashboards
  - Line-of-business applications
  - Back-end applications

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Save on disk a serialized model in Python (Pickle File) and create code that shows the artifacts (schema definition) needed to call it.</p>

In [5]:
#LAB4.0a - Create the Model File
# serialize the best performing model on disk:
print ("Serialize the model to a model.pkl file in the root")
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

#/LAB4.0a

Export the model to outputs/model.pkl


In [8]:
#LAB4.0b - Operationalization: Scoring the calls to the model
# Prepare the web service definition before deploying
# Import for the pickle
from sklearn.externals import joblib

# load the model file
global model
model = joblib.load('model.pkl')

# Import for handling the JSON file
import json
import pandas as pd

# Set up a sample "call" from a client:
input_df = "{\"callfailurerate\": 0, \"education\": \"Bachelor or equivalent\", \"usesinternetservice\": \"No\", \"gender\": \"Male\", \"unpaidbalance\": 19, \"occupation\": \"Technology Related Job\", \"year\": 2015, \"numberofcomplaints\": 0, \"avgcallduration\": 663, \"usesvoiceservice\": \"No\", \"annualincome\": 168147, \"totalminsusedinlastmonth\": 15, \"homeowner\": \"Yes\", \"age\": 12, \"maritalstatus\": \"Single\", \"month\": 1, \"calldroprate\": 0.06, \"percentagecalloutsidenetwork\": 0.82, \"penaltytoswitch\": 371, \"monthlybilledamount\": 71, \"churn\": 0, \"numdayscontractequipmentplanexpiring\": 96, \"totalcallduration\": 5971, \"callingnum\": 4251078442, \"state\": \"WA\", \"customerid\": 1, \"customersuspended\": \"Yes\", \"numberofmonthunpaid\": 7, \"noadditionallines\": \"\\\\N\"}"

# Cleanup 
input_df_encoded = json.loads(input_df)
input_df_encoded = pd.DataFrame([input_df_encoded], columns=input_df_encoded.keys())
input_df_encoded = input_df_encoded.drop('year', 1)
input_df_encoded = input_df_encoded.drop('month', 1)
input_df_encoded = input_df_encoded.drop('churn', 1)

# Pre-process scoring data consistent with training data
columns_to_encode = ['customersuspended', 'education', 'gender', 'homeowner', 'maritalstatus', 'noadditionallines', 'occupation', 'state', 'usesinternetservice', 'usesvoiceservice']
dummies = pd.get_dummies(input_df_encoded[columns_to_encode])
input_df_encoded = input_df_encoded.join(dummies)
input_df_encoded = input_df_encoded.drop(columns_to_encode, axis=1)

columns_encoded = ['age', 'annualincome', 'calldroprate', 'callfailurerate', 'callingnum',
       'customerid', 'monthlybilledamount', 'numberofcomplaints',
       'numberofmonthunpaid', 'numdayscontractequipmentplanexpiring',
       'penaltytoswitch', 'totalminsusedinlastmonth', 'unpaidbalance',
       'percentagecalloutsidenetwork', 'totalcallduration', 'avgcallduration',
       'customersuspended_No', 'customersuspended_Yes',
       'education_Bachelor or equivalent', 'education_High School or below',
       'education_Master or equivalent', 'education_PhD or equivalent',
       'gender_Female', 'gender_Male', 'homeowner_No', 'homeowner_Yes',
       'maritalstatus_Married', 'maritalstatus_Single', 'noadditionallines_\\N',
       'occupation_Non-technology Related Job', 'occupation_Others',
       'occupation_Technology Related Job', 'state_AK', 'state_AL', 'state_AR',
       'state_AZ', 'state_CA', 'state_CO', 'state_CT', 'state_DE', 'state_FL',
       'state_GA', 'state_HI', 'state_IA', 'state_ID', 'state_IL', 'state_IN',
       'state_KS', 'state_KY', 'state_LA', 'state_MA', 'state_MD', 'state_ME',
       'state_MI', 'state_MN', 'state_MO', 'state_MS', 'state_MT', 'state_NC',
       'state_ND', 'state_NE', 'state_NH', 'state_NJ', 'state_NM', 'state_NV',
       'state_NY', 'state_OH', 'state_OK', 'state_OR', 'state_PA', 'state_RI',
       'state_SC', 'state_SD', 'state_TN', 'state_TX', 'state_UT', 'state_VA',
       'state_VT', 'state_WA', 'state_WI', 'state_WV', 'state_WY',
       'usesinternetservice_No', 'usesinternetservice_Yes',
       'usesvoiceservice_No', 'usesvoiceservice_Yes']

# Now that they are encoded, some values will be "empty". Fill those with 0's:
for column_encoded in columns_encoded:
    if not column_encoded in input_df_encoded.columns:
        input_df_encoded[column_encoded] = 0

# Return final prediction
pred = model.predict(input_df_encoded)

# (In production you would replace Print() statement here with some sort of return to JSON)
print('JSON sent to the prediction Model:', '\n')
print(input_df, '\n')
print('For the JSON string sent from the client, The prediction is returned as more JSON (0 = No churn, 1 = Churn):', '\n')
print(json.dumps(str(pred[0])))

#/LAB4.0b

JSON sent to the prediction Model: 

{"callfailurerate": 0, "education": "Bachelor or equivalent", "usesinternetservice": "No", "gender": "Male", "unpaidbalance": 19, "occupation": "Technology Related Job", "year": 2015, "numberofcomplaints": 0, "avgcallduration": 663, "usesvoiceservice": "No", "annualincome": 168147, "totalminsusedinlastmonth": 15, "homeowner": "Yes", "age": 12, "maritalstatus": "Single", "month": 1, "calldroprate": 0.06, "percentagecalloutsidenetwork": 0.82, "penaltytoswitch": 371, "monthlybilledamount": 71, "churn": 0, "numdayscontractequipmentplanexpiring": 96, "totalcallduration": 5971, "callingnum": 4251078442, "state": "WA", "customerid": 1, "customersuspended": "Yes", "numberofmonthunpaid": 7, "noadditionallines": "\\N"} 

For the JSON string sent from the client, The prediction is returned as more JSON (0 = No churn, 1 = Churn): 

"0"


<p style="border-bottom: 3px solid lightgrey;"></p> 

<h1>Phase 4 wrap-up</h1>

This workshop introduced the Team Data Science Process, and walked you through each step of implementing it. Regardless of plaform or technology, you can use this process to guide your projects in Advanced Analytics from start to finish. 

