Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Classifying Commecial Blocks with Microsoft Azure
_**Comparing Automated Machine Learning with three standard Scikit Learn Models.**_

---
---

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
    1. [Importing Libraries](#Importing-Libraries)
    1. [Accessing the Azure Workspace](#Accessing-Workspace)
    1. [Creating an Experiment](#Creating-an-Experiment)
    1. [Managing Dependencies](#Managing-Dependencies)
1. [Data](#Data)
1. [Classifying with Scikit Learn](#Classifying-with-Scikit-Learn)
1. [Classifying with Automated Machine Learning](#Classifying-with-Automated-Machine-Learning)
1. [Results](#Results)
1. [Test](#Test)

---

## Introduction
This classification model runs a dataset of broadcast data to classify whether or not a specific segment is a commercial on television. This will compare running the experiment locally and running it with Microsoft Azure Automated Machine Learning.

<img src="https://cdn-images-1.medium.com/max/1600/1*oPd4rTg2WV0Ph_dD8AwAAg.png" style="width:40%;">

---

## Setup

### Importing Libraries

In [12]:
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# IMPORT AZURE LIBRARIES
# Azure Notebook Libraries
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig

### Accessing the Azure Workspace
The workspace configuration file is one level above the public repository and not availible to those outside of the workspace.

In [13]:
# Load workspace
from azureml.core import Workspace

ws = Workspace.from_config()

Found the config file in: C:\Users\house\Documents\GitHub\config.json


### Creating an Experiment

In [15]:
# Choose a name for the experiment and specify the project folder.
from azureml.core.experiment import Experiment

experiment_name = 'simple_classification'
project_folder = './sample_projects/simple_classification'

experiment = Experiment(ws, experiment_name)


### Managing Dependencies

In [16]:
from azureml.core.runconfig import RunConfiguration

# Editing a run configuration property on-fly.
run_config_user_managed = RunConfiguration()

run_config_user_managed.environment.python.user_managed_dependencies = True

---

### Import Data from Local Machine

In [17]:
# Data Upload Functions
from sklearn.datasets import load_svmlight_file

def get_data(filepath):
    data = load_svmlight_file(filepath)
    return data[0], data[1]

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# IMPORT DATA
print("\nImporting Data...")

X_train, y_train = get_data("Data/HW2.train.txt")
X_test, y_test = get_data("Data/HW2.test.txt")

X_train = X_train.toarray() # convert sparce matrix to array
X_test = X_test.toarray() 
print("Data imported.")




Importing Data...
Data imported.


### Import Data from Datastore -- IN PROGRESS

In [18]:
'''
datastores = ws.datastores
for name, ds in datastores.items():
    print(name, ds.datastore_type)
'''

'\ndatastores = ws.datastores\nfor name, ds in datastores.items():\n    print(name, ds.datastore_type)\n'

In [19]:
'''
from azureml.core import Datastore

ds = ws.get_default_datastore()
'''

'\nfrom azureml.core import Datastore\n\nds = ws.get_default_datastore()\n'

In [20]:
#print(ds)

---

## Classifying with Scikit Learn

### Read Local Training Script

In [21]:
with open('./train.py', 'r') as f:
    f.read()

### Run Experiment
Run the local `train.py` script

In [28]:
from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory='./', script='train.py', run_config=run_config_user_managed)
run = experiment.submit(src)

In [29]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
simple_classification,simple_classification_1552057879_eecb2a4b,azureml.scriptrun,Running,Link to Azure Portal,Link to Documentation


---

## Classifying with Automated Machine Learning

### Configure AML for classification

In [30]:
import logging

automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automl_errors.log',
                             primary_metric = 'AUC_weighted',
                             iteration_timeout_minutes = 60,
                             iterations = 5,
                             n_cross_validations = 3,
                             verbosity = logging.INFO,
                             X = X_train, 
                             y = y_train,
                             path = project_folder)

### Run the AML Experiment Locally

In [32]:
local_run = experiment.submit(automl_config, show_output = True)

Running on local machine
Parent Run ID: AutoML_684a7db4-db57-4c4e-95e2-8f0db30d1ec6
********************************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
SAMPLING %: Percent of the training data to sample.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
********************************************************************************************************************

 ITERATION   PIPELINE                                       SAMPLING %  DURATION      METRIC      BEST
         0   StandardScalerWrapper LightGBM                 100.0000    0:00:26       0.9515    0.9515
         1   StandardScalerWrapper LightGBM                 100.0000    0:00:27       0.9580    0.9580
         2   SparseNormalizer LightGBM                      100

In [34]:
local_run

Experiment,Id,Type,Status,Details Page,Docs Page
simple_classification,AutoML_684a7db4-db57-4c4e-95e2-8f0db30d1ec6,automl,Completed,Link to Azure Portal,Link to Documentation


## Results

In [35]:
from azureml.widgets import RunDetails
RunDetails(local_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

### Select Best Classification Model

In [36]:
best_run, fitted_model = local_run.get_output()
print(best_run)
print(fitted_model)

Run(Experiment: simple_classification,
Id: AutoML_684a7db4-db57-4c4e-95e2-8f0db30d1ec6_1,
Type: None,
Status: Completed)
Pipeline(memory=None,
     steps=[('StandardScalerWrapper', <automl.client.core.common.model_wrappers.StandardScalerWrapper object at 0x000001EA64676E80>), ('LightGBMClassifier', <automl.client.core.common.model_wrappers.LightGBMClassifier object at 0x000001EA6468EB38>)])
Y_transformer(['LabelEncoder', LabelEncoder()])


## Test

In [37]:
y_pred = fitted_model.predict(X_test)
y_pred

array([-1.,  1.,  1., ...,  1.,  1., -1.])

In [38]:
from sklearn.metrics import f1_score
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc

print('f1 score: %.2f' % f1_score(y_test, y_pred))

f1 score: 0.89


In [None]:

'''
# IN PROGRESS
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Plot results with ROC curve
# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
n_classes = y_test.shape[0]
print(n_classes)
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_pred[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

'''