### The Team Data Science Process Workshop
<img style="float: right;" src="./assets/solutions-microsoft-logo-small.png">

#### Microsoft Cloud and AI Team

The Team Data Science Process (TDSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. TDSP helps improve team collaboration and learning. It contains a distillation of the best practices and structures from Microsoft and others in the industry that facilitate the successful implementation of data science initiatives. The goal is to help companies fully realize the benefits of their analytics program.

The TDSP comprises of the following key components:

 - A data science lifecycle definition
 - A standardized project structure
 - Infrastructure and resources for data science projects
 - Tools and utilities for project execution
    
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png">**Note:** 
    
*You can follow a complete example of this process using Azure Machine Learning* 
</br>

- ["Biomedical entity recognition using Team Data Science Process (TDSP) Template"](https://docs.microsoft.com/en-us/azure/machine-learning/preview/scenario-tdsp-biomedical-recognition?toc=%2Fen-us%2Fazure%2Fmachine-learning%2Fteam-data-science-process%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json)</p>

*This workshop guides you through a series of exercises you can use to learn to implement the TDSP in your Data Science project, using only Python in a Notebook. You can change the **Setup** and **Lab** cells in this Notebook to use another language, another platform, and with more or fewer prompts based on your audience's needs.*

For the labs below, Look for the sections marked: 

`# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>`

There may be one line needed, but most often more than that - read the entire code snippet to see what you need to do. 

[Try to figure out the labs yourself, then search the web, then ask your neighbor - and if you're really stuck, check the answer-sheet](.\AnswerKey.txt) 

    
<p style="border-bottom: 3px solid lightgrey;"></p>

In [1]:
#LAB0 Setup - Get everything up to date, and add any pips you want here

# FYI - You can list all libraries in this environment if you like: 
#import pip
#installed_packages = pip.get_installed_distributions()
#installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
#  for i in installed_packages])
#print(installed_packages_list)

# Import Libraries for the Customer Churn Prediction Labs - Change for other uses
# For serializing output/input
import pickle

# Libraries for training and scoring
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Data and Numeric Manipulation
import pandas as pd
import numpy as np

# Working with files
import csv

#/LAB0 

<img style="float: center;" height="800" width="800" src="https://azure.github.io/LearnAI-Bootcamp/lab03.1-tdsp_and_aml/resources/docs/images/tdsp.png">

[We will use the documentation at this location for the class](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview)

### Lab 0.0 - Clone the TDSP Structure (Note - Already done  for this lab)

*Note: If you have cloned this Notebook, you will already have this structure. Use this
Information as a guide for your production environments. Recommend you place the structures
at the root of your project, rather than under the **Azure-TDSP-ProjectTemplate** folder.*

Instructions (If you're using a local Jupyter Notebook Server):
1. Install and configure git if you don't have it.
2. Change directories to the root of where you would like to lay out your template (your Jupyter Notbook root project folder, Visual Studio folder, etc.)
3. Run the following command from that directory: 

`git clone https://github.com/Azure/Azure-TDSP-ProjectTemplate.git`

#### Lab verification
 ■ Check that the file structure matches what you see [in this reference](https://github.com/Azure/Azure-TDSP-ProjectTemplate)
 
<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 0.1 - Review and Download Project Planning Documents

Instructions:
1. [Open and review this reference](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/team-data-science-process-project-templates)
2. Download either the Microsoft Project template or Microsoft Excel file noted in that location.  

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Open the Microsoft Project Template or the Microsoft Excel file. For the Excel file you can use Microsoft Office 365, a compatible spreadsheet viewer or the [free Microsoft Excel viewer](https://www.microsoft.com/en-us/download/details.aspx?id=10)</p>

<p style="border-bottom: 1px solid lightgrey;"></p>

<h1><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/check.png">Phase One - Business Understanding</h1>

Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-business-understanding)

**Goals**
 - Specify the key variables that are to serve as the model targets and whose related metrics are used determine the success of the project.
 - Identify the relevant data sources that the business has access to or needs to obtain.

**How to do it**
(There are two main tasks addressed in this stage)
 - Define objectives: Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.
 - Identify data sources: Find the relevant data that helps you answer the questions that define the objectives of the project.
 <p style="border-bottom: 3px solid lightgrey;"></p> 

<p><img style="float: right; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png"><b>Using Azure Machine Learning for this Phase:</b></p>

<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Identify scenarios and plan for advanced analytics data processing](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/plan-your-environment)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Fill out the project Charter Document](https://github.com/Azure/Azure-TDSP-ProjectTemplate/blob/master/Docs/Project/Charter.md)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Identify Data Sources](https://github.com/Azure/Azure-TDSP-ProjectTemplate/blob/master/Docs/Data_Report/Data%20Defintion.md)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Create a Data Dictionary](https://github.com/Azure/Azure-TDSP-ProjectTemplate/tree/master/Docs/Data_Dictionaries)</p>

<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 1.0 - Define Objectives
Instructions:
1. Read the Business Case that follows.
2. Open the Charter.md file from the */Azure-TDSP-ProjectTempate/Docs/Project* folder and edit it to contain as much information as time allows - at a minimum, answer the first five bullets. (Note, you can find [this document online here](https://github.com/Azure/Azure-TDSP-ProjectTemplate/blob/master/Docs/Project/Charter.md) )

#### Business Case

*The Orange Telecom company in France is one of the largest operators of mobile and internet services in Europe and Africa and a global leader in corporate telecommunication services. They have 256 million customers worldwide. They have significant coverage in France, Spain, Belgium, Poland, Romania, Slovakia Moldova, and a large presence Africa and the Middle East. Customer Churn is always an issue in any company. Orange would like to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). For this effort, they think churn is the first thing they would like to focus on.*

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Ensure you have the first five bullets documented.</p>

<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 1.1 - Identify Data Sources
Instructions:
1. Using the business case, open the Charter.md file from the */Azure-TDSP-ProjectTempate/Docs/Project* folder and edit it to answer the first bullet under **Architecture**. (Note, you can find [this document online](https://github.com/Azure/Azure-TDSP-ProjectTemplate/blob/master/Docs/Project/Charter.md) )
2. *Optional*: Develop a Data Movement strategy for these datasets, assuming large amounts of data from at least three locations. 

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Ensure you understand where this data would be found.</p>

< TODO >

<p style="border-bottom: 3px solid lightgrey;"></p> 

<h1><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/check.png">Phase Two - Data Acquisition and Understanding</h1>

Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-data)

The Data Aquisition and Understanding phase of the TDSP you ingest or access data from various locations to answer the questions the organization has asked. In most cases, this data will be in multiple locations. Once the data is ingested into the system, you’ll need to examine it to see what it holds. All data needs cleaning, so after the inspection phase, you’ll replace missing values, add and change columns. You’ll cover more extensive Data Wrangling tasks in other labs.

In this section, we’ll use a single file-based dataset to train our model.

**Goals**

  - Produce a clean, high-quality data set whose relationship to the target variables is understood. Locate the data set in the appropriate analytics environment so you are ready to model.
  - Develop a solution architecture of the data pipeline that refreshes and scores the data regularly.

**How to do it**

  - Ingest the data into the target analytic environment.
  - Explore the data to determine if the data quality is adequate to answer the question.
  - Set up a data pipeline to score new or regularly refreshed data.

<p><img style="float: right; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png"><b>Using Azure Machine Learning for this Phase:</b></p>

<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Load data into storage environments for analytics](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/ingest-data)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Explore data in the Team Data Science Process](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/explore-data)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Sample data in Azure blob containers, SQL Server, and Hive tables](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/sample-data)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Access datasets with Python using the Azure Machine Learning Python client library](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/python-data-access)</p>

<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 2.0 - Ingest data from a local source
Instructions:
 1. Use Python Code in the cell to load customer data from the file:
 `./data/CATelcoCustomerChurnTrainingSample.csv`
 
 #### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Ensure that you have 29 columns and 20,468 rows loaded</p>
 

In [2]:
#LAB2.0 - Read data and verify
# Read customer data from a single file
df = pd.read_csv('./data/CATelcoCustomerChurnTrainingSample.csv') 

# Ensure that you have 29 columns and 20,468 rows loaded
print('There should be 20468 obervations of 29 variables:')
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Optional - Instead, read the data from source:
# https://github.com/Azure/MachineLearningSamples-ChurnPrediction/blob/master/data/CATelcoCustomerChurnTrainingSample.csv 
#/LAB2.0

There should be 20468 obervations of 29 variables:


<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 2.1 - Data Exploration and Understanding
Instructions:
 1. Using the dataframe you loaded using the pandas library, explore the data, focusing on the shape, types, and missing values in the data.

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Ensure that you understand the data, it's layout, and know any missing values in the data.</p>

In [3]:
#LAB2.1 - Explore Data
# Explore the df Dataframe, using at least a five-number statistical summary.
# NOTE: Your exploration may be much different - experiment with graphics as well.

# Show the size and shape of data:
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Show the first and last 10 rows
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Show the dataframe structure:
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Check for missing values:
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# perform a simple statistical display:    
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

#/LAB2.1

<p style="border-bottom: 3px solid lightgrey;"></p> 

<h1><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/check.png">Phase Three - Modeling</h1>

Read the [Documentation Reference here](TODO)

**Goals**
  - Determine the optimal data features for the machine-learning model.
  - Create an informative machine-learning model that predicts the target most accurately.
  - Create a machine-learning model that's suitable for production.

**How to do it**
  - Feature engineering: Create data features from the raw data to facilitate model training.
  - Model training: Find the model that answers the question most accurately by comparing their success metrics.
  - Determine if your model is suitable for production.

<p><img style="float: right; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png"><b>Using Azure Machine Learning for this Phase:</b></p>

<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Feature engineering in data science](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/create-features)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Feature selection](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/select-features)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Choose an algorithms for Microsoft Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-choice?toc=%2Fen-us%2Fazure%2Fmachine-learning%2Fteam-data-science-process%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json)</p>


<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 3.0 - Experiement and Select a Model
Instructions:

1. Using Scikit-Learn in Python, implement the GaussianNB() and DecisionTreeClassifier() functions in an experiment over your dataframe. 
 
     a. For the Naive Bayes model, use a random seed of 42, and a .3 split.
 
 b. For the Decision Tree model, use a split of 20 and a random state of 99

#### Lab verification</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Find the accuracy of each algorithm</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Which performed best? Can you improve it?</p>


In [5]:
#LAB3.0 - Customer Churn Prediction Experiment
# For completeness of this example, let's re-import our libraries
import pickle
import pandas as pd
import numpy as np
import csv
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# We'll re-load the data as "CustomerDataFrame"
CustomerDataFrame = pd.read_csv('data/CATelcoCustomerChurnTrainingSample.csv')

# Fill all NA values with 0:
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Drop all duplicate observations:
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# We don't need the 'year" or 'month' variables
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Implement One-Hot Encoding for this model (https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/) 
columns_to_encode = list(CustomerDataFrame.select_dtypes(include=['category','object']))
dummies = pd.get_dummies(CustomerDataFrame[columns_to_encode]) #

# Drop the original categorical columns:
CustomerDataFrame = CustomerDataFrame.drop(columns_to_encode, axis=1) # 

# Re-join the dummies frame to the original data:
CustomerDataFrame = CustomerDataFrame.join(dummies)

# Show the new columns in the joined dataframe:
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Experiment using Naive Bayes:
nb_model = GaussianNB()
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

train, test = train_test_split(CustomerDataFrame, random_state = random_seed, test_size = split_ratio)

target = train['churn'].values
train = train.drop('churn', 1)
train = train.values
nb_model.fit(train, target)

expected = test['churn'].values
test = test.drop('churn', 1)
predicted = nb_model.predict(test)

# Print out the Naive Bayes Classification Accuracy:
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Experiment using Decision Trees:
dt_model = DecisionTreeClassifier(min_samples_split=20, random_state=99)
dt_model.fit(train, target)
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

# Print out the Decision Tree Accuracy:
print("Decision Tree Classification Accuracy", accuracy_score(expected, predicted))

#/LAB3.0


Index(['age', 'annualincome', 'calldroprate', 'callfailurerate', 'callingnum',
       'customerid', 'monthlybilledamount', 'numberofcomplaints',
       'numberofmonthunpaid', 'numdayscontractequipmentplanexpiring',
       'penaltytoswitch', 'totalminsusedinlastmonth', 'unpaidbalance',
       'percentagecalloutsidenetwork', 'totalcallduration', 'avgcallduration',
       'churn', 'customersuspended_No', 'customersuspended_Yes',
       'education_Bachelor or equivalent', 'education_High School or below',
       'education_Master or equivalent', 'education_PhD or equivalent',
       'gender_Female', 'gender_Male', 'homeowner_No', 'homeowner_Yes',
       'maritalstatus_Married', 'maritalstatus_Single', 'noadditionallines_\N',
       'occupation_Non-technology Related Job', 'occupation_Others',
       'occupation_Technology Related Job', 'state_AK', 'state_AL', 'state_AR',
       'state_AZ', 'state_CA', 'state_CO', 'state_CT', 'state_DE', 'state_FL',
       'state_GA', 'state_HI', 'state_IA'

<p style="border-bottom: 3px solid lightgrey;"></p> 

<h1><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/check.png">Phase Four - Deployment</h1>

Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-deployment)

**Goal**
 - Deploy models with a data pipeline to a production or production-like environment for final user acceptance

**How to do it**
  - Deploy the model and pipeline to a production or production-like environment for application consumption

<p><img style="float: right; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png"><b>Using Azure Machine Learning for this Phase:</b></p>

<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Deploy models in production](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/deploy-models-in-production)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Configure your environment to operationalize](https://docs.microsoft.com/en-us/azure/machine-learning/preview/cli-for-azure-machine-learning#o16n)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Model management](https://docs.microsoft.com/en-us/azure/machine-learning/preview/deployment-setup-configuration)</p>

<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 4.0 - Operationalize
Instructions:
1. Deploy the model and pipeline to a production or production-like environment for application consumption to one or more targets:
  - Online websites
  - Spreadsheets
  - Dashboards
  - Line-of-business applications
  - Back-end applications

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Save on disk a serialized model in Python (Pickle File) and create code that shows the artifacts (schema definition) needed to call it.</p>

In [5]:
#LAB4.0a - Create the Model File
# serialize the best performing model on disk:
print ("Serialize the model to a model.pkl file in the root")
# <TODO: REPLACE THIS COMMENT WITH PYTHON CODE>

#/LAB4.0a

Export the model to outputs/model.pkl


In [8]:
#LAB4.0b - Operationalization: Scoring the calls to the model
# Prepare the web service definition before deploying
# Import for the pickle
from sklearn.externals import joblib

# load the model file
global model
model = joblib.load('model.pkl')

# Import for handling the JSON file
import json
import pandas as pd

# Set up a sample "call" from a client:
input_df = "{\"callfailurerate\": 0, \"education\": \"Bachelor or equivalent\", \"usesinternetservice\": \"No\", \"gender\": \"Male\", \"unpaidbalance\": 19, \"occupation\": \"Technology Related Job\", \"year\": 2015, \"numberofcomplaints\": 0, \"avgcallduration\": 663, \"usesvoiceservice\": \"No\", \"annualincome\": 168147, \"totalminsusedinlastmonth\": 15, \"homeowner\": \"Yes\", \"age\": 12, \"maritalstatus\": \"Single\", \"month\": 1, \"calldroprate\": 0.06, \"percentagecalloutsidenetwork\": 0.82, \"penaltytoswitch\": 371, \"monthlybilledamount\": 71, \"churn\": 0, \"numdayscontractequipmentplanexpiring\": 96, \"totalcallduration\": 5971, \"callingnum\": 4251078442, \"state\": \"WA\", \"customerid\": 1, \"customersuspended\": \"Yes\", \"numberofmonthunpaid\": 7, \"noadditionallines\": \"\\\\N\"}"

# Cleanup 
input_df_encoded = json.loads(input_df)
input_df_encoded = pd.DataFrame([input_df_encoded], columns=input_df_encoded.keys())
input_df_encoded = input_df_encoded.drop('year', 1)
input_df_encoded = input_df_encoded.drop('month', 1)
input_df_encoded = input_df_encoded.drop('churn', 1)

# Pre-process scoring data consistent with training data
columns_to_encode = ['customersuspended', 'education', 'gender', 'homeowner', 'maritalstatus', 'noadditionallines', 'occupation', 'state', 'usesinternetservice', 'usesvoiceservice']
dummies = pd.get_dummies(input_df_encoded[columns_to_encode])
input_df_encoded = input_df_encoded.join(dummies)
input_df_encoded = input_df_encoded.drop(columns_to_encode, axis=1)

columns_encoded = ['age', 'annualincome', 'calldroprate', 'callfailurerate', 'callingnum',
       'customerid', 'monthlybilledamount', 'numberofcomplaints',
       'numberofmonthunpaid', 'numdayscontractequipmentplanexpiring',
       'penaltytoswitch', 'totalminsusedinlastmonth', 'unpaidbalance',
       'percentagecalloutsidenetwork', 'totalcallduration', 'avgcallduration',
       'customersuspended_No', 'customersuspended_Yes',
       'education_Bachelor or equivalent', 'education_High School or below',
       'education_Master or equivalent', 'education_PhD or equivalent',
       'gender_Female', 'gender_Male', 'homeowner_No', 'homeowner_Yes',
       'maritalstatus_Married', 'maritalstatus_Single', 'noadditionallines_\\N',
       'occupation_Non-technology Related Job', 'occupation_Others',
       'occupation_Technology Related Job', 'state_AK', 'state_AL', 'state_AR',
       'state_AZ', 'state_CA', 'state_CO', 'state_CT', 'state_DE', 'state_FL',
       'state_GA', 'state_HI', 'state_IA', 'state_ID', 'state_IL', 'state_IN',
       'state_KS', 'state_KY', 'state_LA', 'state_MA', 'state_MD', 'state_ME',
       'state_MI', 'state_MN', 'state_MO', 'state_MS', 'state_MT', 'state_NC',
       'state_ND', 'state_NE', 'state_NH', 'state_NJ', 'state_NM', 'state_NV',
       'state_NY', 'state_OH', 'state_OK', 'state_OR', 'state_PA', 'state_RI',
       'state_SC', 'state_SD', 'state_TN', 'state_TX', 'state_UT', 'state_VA',
       'state_VT', 'state_WA', 'state_WI', 'state_WV', 'state_WY',
       'usesinternetservice_No', 'usesinternetservice_Yes',
       'usesvoiceservice_No', 'usesvoiceservice_Yes']

# Now that they are encoded, some values will be "empty". Fill those with 0's:
for column_encoded in columns_encoded:
    if not column_encoded in input_df_encoded.columns:
        input_df_encoded[column_encoded] = 0

# Return final prediction
pred = model.predict(input_df_encoded)

# (In production you would replace Print() statement here with some sort of return to JSON)
print('JSON sent to the prediction Model:', '\n')
print(input_df, '\n')
print('For the JSON string sent from the client, The prediction is returned as more JSON (0 = No churn, 1 = Churn):', '\n')
print(json.dumps(str(pred[0])))

#/LAB4.0b

JSON sent to the prediction Model: 

{"callfailurerate": 0, "education": "Bachelor or equivalent", "usesinternetservice": "No", "gender": "Male", "unpaidbalance": 19, "occupation": "Technology Related Job", "year": 2015, "numberofcomplaints": 0, "avgcallduration": 663, "usesvoiceservice": "No", "annualincome": 168147, "totalminsusedinlastmonth": 15, "homeowner": "Yes", "age": 12, "maritalstatus": "Single", "month": 1, "calldroprate": 0.06, "percentagecalloutsidenetwork": 0.82, "penaltytoswitch": 371, "monthlybilledamount": 71, "churn": 0, "numdayscontractequipmentplanexpiring": 96, "totalcallduration": 5971, "callingnum": 4251078442, "state": "WA", "customerid": 1, "customersuspended": "Yes", "numberofmonthunpaid": 7, "noadditionallines": "\\N"} 

For the JSON string sent from the client, The prediction is returned as more JSON (0 = No churn, 1 = Churn): 

"0"


<p style="border-bottom: 3px solid lightgrey;"></p> 

<h1><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/check.png">Phase Five - Customer Acceptance</h1>

Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-acceptance)

**Goal**
 - Confirm that the pipeline, the model, and their deployment in a production environment satisfy the customer's objectives
 - Create a path for retraining your model

**How to do it**
  - System validation: Confirm that the deployed model and pipeline meet the customer's needs.
  - Project hand-off: Hand the project off to the entity that's going to run the system in production
  - Develop a "ground truth" mechanism and feed the new labels (if applicable) back into the retraining API

<p><img style="float: right; margin: 0px 15px 15px 0px;" src="./assets/aml-logo.png"><b>Using Azure Machine Learning for this Phase:</b></p>

<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">[Create an exit report of the project](https://github.com/Azure/Azure-TDSP-ProjectTemplate/blob/master/Docs/Project/Exit%20Report.md)</p>

<p style="border-bottom: 1px solid lightgrey;"></p> 

### Lab 5.0 - Testing and Handoff
Instructions:
 1. Review the [Create an exit report of the project](https://github.com/Azure/Azure-TDSP-ProjectTemplate/blob/master/Docs/Project/Exit%20Report.md) topic

#### Lab verification
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./assets/checkbox.png">Documentation reviewed and downloaded</p>

<p style="border-bottom: 1px solid lightgrey;"></p> 


<p style="border-bottom: 3px solid lightgrey;"></p> 

<h1>Workshop wrap-up</h1>

This workshop introduced the Team Data Science Process, and walked you through each step of implementing it. Regardless of plaform or technology, you can use this process to guide your projects in Advanced Analytics from start to finish. 

