# Microsoft Azure AutoML Exercise

### Purpose and Challenge
The purpose of this notebook is fo the user to build and deploy a machine learning application using Azure Machine Learning Service which is in preview. 

This Notebook include incomplete codes that require you to fullfill the lines in order to make it to work properly. This exercise will help you to exercise and practice the knowledge explained in the previous example. 


## 1. Acquire and Prepare Data

For this exercise, we are using a simple dataset named "House price prediction". You can find it here: https://vincentarelbundock.github.io/Rdatasets/csv/Ecdat/Housing.csv .This is a simple dataset. 
We have it in a txt file in the same folder. We will use pandas to read the file into a Dataframe, which we will use it later to train the model.

the filename is 'iris.data.txt'. The delimiter used will be ','.

In [None]:
import pandas as pd
import numpy as np
data = pd.read_csv('Housing.csv', dtype = {
                                            'driveway':'category',
                                            'recroom':'category',
                                            'fullbase':'category',
                                            'gashw':'category',
                                            'airco':'category', 
                                            'prefarea': 'category'}
                  )
#take a look at the data
data.head()

In [None]:
data.dtypes

Convert the categorical columns to numeric ones. As we see from dataframe, there are only binary values(yes/no), so we can translate them into 0/1

In [None]:
for col in data.columns:
    if data[col].dtype.name == 'category':
        data[col] = pd.Series(np.where(data[col] == 'yes', 1, 0))
data.head()

We will go straight into model training, without doing EDA(exploratory data analysis) or applying any feature engineering techniques.

## 2.Automated ML

Import Azure ML libs for automated ML.

In [None]:
import logging, os, random, time

import matplotlib.pyplot as plt
import numpy as np

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.widgets import RunDetails
from azureml.core.model import Model
from azureml.pipeline.core import PipelineRun
from azureml.core.run import Run

Provide your machine learning workspace credentials to run workspace. we will need to perform a Microsoft MFA. 

In [None]:
subscription_id ='the subscription id'
resource_group ='Resource group name'
workspace_name = 'the workspace name'
workspace_region = "workspace region"

#load an existring workspace
try:
    #write here the code
except:
    print('Workspace not found. TOO MANY ISSUES!!!')
    
    
#if you have not already created a workspace you can create it with a simple line of code

#loading an already created workspace.
ws = Workspace.from_config()

In [None]:
ws.get_details()

### Create the Experiment by assigning a name to it. 

In [None]:

path_project_folder = './'


output = dict()

output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace Name'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location

pd.DataFrame(data = output, index=['']).T

#### Splitting the data into training, validation and test set

We will use train_test_split from scikit-learn 

In [None]:
from sklearn.model_selection import train_test_split
#split the data into train, validation and test sets. make sure you not include the target and id columns into these sets. 


In [None]:
#Helper
from azureml.telemetry import set_diagnostics_collection
set_diagnostics_collection(send_diagnostics = True)

# 3.AutoML Configuration 

In [None]:
#fill the parameters
automl_config = AutoMLConfig(task=,
                             primary_metric = ,
                             iteration_timeout_minutes =5,
                             iterations=10,
                             max_cores_per_iteration = 1,
                             preprocess=, 
                             X = , 
                             y = , 
                             X_valid = , 
                             y_valid = ,
                             auto_blacklist = True, 
                             #n_cross_validation= 3
                             debug_log = 'house_logs.log',
                             verbosity=logging.ERROR,
                             path = path_project_folder, 
                             whitelist_models = []
                            )

In [None]:
#you should create an experiment and submit it to be executed
experiment_name = 'write a name for your experiment'
experiment = # create an experiment 
local_run = #submit an experiment

In [None]:
#show the details of the execution
RunDetails(local_run).show()

# 4.Get the best Model 


In [None]:
#write down the code to retrieve the best model.



### Test the best model with test data that we splitted before.

In [None]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

y_pred = fitted.predict(X_test)
print('r2_score :',r2_score(y_pred, y_test))
print('root mean squared error :  ', np.sqrt(mean_squared_error(y_pred, y_test)))
print('mean absolute error : ', mean_absolute_error(y_pred, y_test))

In [None]:
test_pred = plt.scatter(y_test, y_pred, color='b')
test_test = plt.scatter(y_test, y_test, color='g')
plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)
plt.show()

# 5. Deploy the model 
As now we have succesfully trained the model, we are ready to deploy it. 

1. we need to register the model on our workspace
2. Create a score script for Web Service
2. create a yaml file for the environment
3. Create a Container Image
4. Deploy as a Web Service

### Model Registration

In [None]:
# Here write the code to register the best model. There are 2 ways of doing it, it is up to you to select the way. 


### Create a score script for Web Service

In [None]:
%%writefile score.py
# Scoring Script
import json
import numpy as np
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression

from azureml.core.model import Model

import azureml.train.automl

def init():
    global model
    # retreive the path to the model file using the model name
    model_path = Model.get_model_path('AutoMLaa94670e5best')
    print(model_path)
    model = joblib.load(model_path)
    

def run(raw_data):
    # grab and prepare the data
    data = (np.array(json.loads(raw_data)['data'])).reshape(1,-1)
    # make prediction
    y_hat = model.predict(data)
    return json.dumps(y_hat.tolist())

### create yaml file with conda environment and dependencies.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies

myenv = # complete the code to create the conda environment with library dependencies and pip. 
conda_env_file_name = 'my_conda_env.yml'
myenv.save_to_file('.',conda_env_file_name)

In [None]:
with open("my_conda_env.yml","r") as f:
    print(f.read())

### Create a Container Image

In [None]:
%%writefile docker_steps.dockerfile
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y build-essential gcc g++ python-dev unixodbc unixodbc-dev

In [None]:
docker_file_name = "docker_steps.dockerfile"

In [None]:
from azureml.core.image import Image, ContainerImage
#create the image container which will later deployed as a service.

#specify runtime, the execution script the docker filename, conda env config file, optional tags and descriptiong

image.wait_for_creation(show_output=True)

### Deploy as a Web Service

First, write the web service configurations 
Last deploy the web service

In [None]:
from azureml.core.webservice import Webservice
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                                tags = {'ml': 'priceprediction',
                                                       'type':'automl'},
                                                description = 'house price prediction exercise'
                                               )



In [None]:
aci_service_name = 'write a name for you service.'
aci_service = #write the code to deploy the service from image.

aci_service.wait_for_deployment(True)
print(aci_service.state)

In [None]:
print(aci_service.get_logs())

## Test the service

In [None]:
import requests
import json

# send a random row from the test set to score
#random_index = np.random.randint(0, len(X_train)-1)
input_data = "{\"data\": " + str(X_test.values.tolist()) + "}" #str(list(X_train[0].reshape(1,-1)[0])) + "}"

headers = {'Content-Type':'application/json'}

# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(aci_service.scoring_uri, input_data, headers=headers)

print("POST to url", aci_service.scoring_uri)
print("input data:", input_data)
print("label:", y_test[1:2])
print("prediction:", resp.text)

In [None]:
X 

In [None]:
### Remove a service created. 



### Cancel an experiment 

First, retrieve the information from your workspace

In [None]:
_experiment = ws.experiments['<experiment name>']
run_id = '<you have to check on azure portal/ your workspace/ Experiment / running experiment and get Run_id>'

exp_running = Run(experiment=_experiment, run_id=run_id)
exp_running.cancel()