# Microsoft Azure AutoML Exercise

### Purpose and Challenge
The purpose of this notebook is fo the user to build and deploy a machine learning application using Azure Machine Learning Service which is in preview. 

This Notebook include incomplete codes that require you to fullfill the lines in order to make it to work properly. This exercise will help you to exercise and practice the knowledge explained in the previous example. 


## 1. Acquire and Prepare Data

For this exercise, we are using a simple dataset named "House price prediction". You can find it here: https://vincentarelbundock.github.io/Rdatasets/csv/Ecdat/Housing.csv .This is a simple dataset. 
We have it in a txt file in the same folder. We will use pandas to read the file into a Dataframe, which we will use it later to train the model.

the filename is 'iris.data.txt'. The delimiter used will be ','.

In [None]:
import pandas as pd
import numpy as np
data = pd.read_csv('Housing.csv', dtype = {
                                            'driveway':'category',
                                            'recroom':'category',
                                            'fullbase':'category',
                                            'gashw':'category',
                                            'airco':'category', 
                                            'prefarea': 'category'}
                  )
#take a look at the data
data.head()

In [None]:
data.dtypes

In [None]:
for col in data.columns:
    if data[col].dtype.name == 'category':
        data[col] = pd.Series(np.where(data[col] == 'yes', 1, 0))
data.head()

We will go straight into model training, without doing EDA(exploratory data analysis) or applying any feature engineering techniques.

## 2.Automated ML

Import Azure ML libs for automated ML.

In [None]:
import logging, os, random, time

import matplotlib.pyplot as plt
import numpy as np

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.widgets import RunDetails
from azureml.core.model import Model
from azureml.pipeline.core import PipelineRun
from azureml.core.run import Run

Provide your machine learning workspace credentials to run workspace. we will need to perform a Microsoft MFA. 

In [None]:
subscription_id =''
resource_group =''
workspace_name = ''
workspace_region = ""

#ws = Workspace.create(name = workspace_name, subscription_id=subscription_id, resource_group=resource_group, location=workspace_region, exist_ok=True) #to create a new workspace

#loading an already created workspace.
try:
    ws = Workspace(workspace_name=workspace_name, subscription_id=subscription_id, resource_group=resource_group)
    print('Workspace configuration succeeded. You are all set!')
except:
    print('Workspace not found. TOO MANY ISSUES!!!')

In [None]:
ws.get_details()

### Create the Experiment by assigning a name to it. 

In [None]:
experiment_name = 'predict_house_price'
path_project_folder = './'

experiment = Experiment(ws, experiment_name)
output = dict()

output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace Name'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Experiment Name'] = experiment_name

pd.DataFrame(data = output, index=['']).T

#### Splitting the data into training, validation and test set

We will use train_test_split from scikit-learn 

In [None]:
from sklearn.model_selection import train_test_split
X = data[[x for x in data.columns if x not in ['price', 'id']]] #removing the target column and id as should not be included in training set.
target = data['price'].values
X_train, X_valid, y_train, y_valid = train_test_split(X, target, test_size=0.3, random_state=42)
X_test = X_valid.iloc[-10:]
y_test = y_valid[-10:]

In [None]:
X_train.shape, X_valid.shape, X_test.shape

In [None]:
#Helper
from azureml.telemetry import set_diagnostics_collection
set_diagnostics_collection(send_diagnostics = True)

# 3.AutoML Configuration 

In [None]:
automl_config = AutoMLConfig(task='regression',
                             primary_metric = 'r2_score',
                             iteration_timeout_minutes =5,
                             iterations=10,
                             max_cores_per_iteration = 1,
                             preprocess=False, 
                             X = X_train, 
                             y = y_train, 
                             X_valid = X_valid, 
                             y_valid = y_valid,
                             auto_blacklist = True, 
                             #n_cross_validation= 3
                             debug_log = 'house_logs.log',
                             verbosity=logging.ERROR,
                             path = path_project_folder, 
                             whitelist_models = ['LightGBM', 'ElasticNet', 'SGDRegressor', 'RandomForestRegressor', 'XGBoostRegressor']
                            )

In [None]:
local_run = experiment.submit(automl_config, show_output=True)

In [None]:
RunDetails(local_run).show()

# 4.Get the best Model 


In [None]:
best_model, fitted = local_run.get_output()
print(best_model, fitted)

### Test the best model with test data that we splitted before.

In [None]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

y_pred = fitted.predict(X_test)
print('r2_score :',r2_score(y_pred, y_test))
print('root mean squared error :  ', np.sqrt(mean_squared_error(y_pred, y_test)))
print('mean absolute error : ', mean_absolute_error(y_pred, y_test))

In [None]:
test_pred = plt.scatter(y_test, y_pred, color='b')
test_test = plt.scatter(y_test, y_test, color='g')
plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)
plt.show()

# 5. Deploy the model 
As now we have succesfully trained the model, we are ready to deploy it. 

1. we need to register the model on our workspace
2. Create a score script for Web Service
2. create a yaml file for the environment
3. Create a Container Image
4. Deploy as a Web Service

### Model Registration

In [None]:
# model = Model.register(ws, model_name='housepriceprediction', model_path='model.pkl', 
#                        description='house price prediction model')

model = local_run.register_model(description='best fitter model elastic search standard scaler for housing price prediction', tags = {'ml': 'price_prediction', 'type':'automl'})
print(local_run.model_id)

In [None]:
model.id

### Create a score script for Web Service

In [None]:
%%writefile score.py
# Scoring Script
import json
import numpy as np
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression

from azureml.core.model import Model

import azureml.train.automl

def init():
    global model
    # retreive the path to the model file using the model name
    model_path = Model.get_model_path('AutoMLaa94670e5best')
    print(model_path)
    model = joblib.load(model_path)
    

def run(raw_data):
    # grab and prepare the data
    data = (np.array(json.loads(raw_data)['data'])).reshape(1,-1)
    # make prediction
    y_hat = model.predict(data)
    return json.dumps(y_hat.tolist())

### create yaml file with conda environment and dependencies.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies

myenv = CondaDependencies.create(conda_packages=['numpy', 'scikit-learn', 'pandas'], pip_packages=['azureml-train-automl'])

conda_env_file_name = 'my_conda_env.yml'
myenv.save_to_file('.',conda_env_file_name)

In [None]:
with open("my_conda_env.yml","r") as f:
    print(f.read())

### Create a Container Image

In [None]:
%%writefile docker_steps.dockerfile
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y build-essential gcc g++ python-dev unixodbc unixodbc-dev

In [None]:
docker_file_name = "docker_steps.dockerfile"

In [None]:
from azureml.core.image import Image, ContainerImage
#specify runtime, the execution script the docker filename, conda env config file, optional tags and descriptiong
image_config = ContainerImage.image_configuration(runtime='python', 
                                                  execution_script='score.py', 
                                                 docker_file = docker_file_name, 
                                                 conda_file= conda_env_file_name, 
                                                 tags = None,
                                                 description='Container image for deploying housing price prediction model!')

image = Image.create(name='housingpriceprediction', 
                    models = [model], 
                    image_config = image_config,
                    workspace= ws)

image.wait_for_creation(show_output=True)

### Deploy as a Web Service

First, write the web service configurations 
Last deploy the web service

In [None]:
from azureml.core.webservice import Webservice
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                                tags = {'ml': 'priceprediction',
                                                       'type':'automl'},
                                                description = 'house price prediction exercise'
                                               )



In [None]:
aci_service_name = 'automlhousepriceprediction'
aci_service = Webservice.deploy_from_image(deployment_config=aciconfig, 
                                          image = image, 
                                          name = aci_service_name,
                                          workspace=ws)

aci_service.wait_for_deployment(True)
print(aci_service)

In [None]:
print(aci_service.get_logs())

## Test the service

In [None]:
import requests
import json

# send a random row from the test set to score
#random_index = np.random.randint(0, len(X_train)-1)
input_data = "{\"data\": " + str(X_test.iloc[0].values.tolist()) + "}" #str(list(X_train[0].reshape(1,-1)[0])) + "}"

headers = {'Content-Type':'application/json'}

# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(aci_service.scoring_uri, input_data, headers=headers)

print("POST to url", aci_service.scoring_uri)
print("input data:", input_data)
print("label:", y_test[0])
print("prediction:", resp.text)

In [None]:
### Remove a service created. 
service.delete()


### Cancel an experiment 

First, retrieve the information from your workspace

In [None]:
_experiment = ws.experiments['<experiment name>']
run_id = '<you have to check on azure portal/ your workspace/ Experiment / running experiment and get Run_id>'

exp_running = Run(experiment=_experiment, run_id=run_id)
exp_running.cancel()