# Objective:
To demonstrate the use of Azure AutoML to develop a classification model. Apart from trying many different data scaling and tree algorithm approaches, Azure ML also tries non-tree algorithms like Naive Bayes.

## Business problem:
To predict coupon redemption. See full details of the hackathon entry here: https://github.com/balawillgetyou/dy/blob/master/AmexAV20191006Annotated.ipynb

## Azure ML:
Azure containerization and REST API deployment details are here: 
https://github.com/balawillgetyou/dy/blob/master/AmexMarketing20191029.ipynb

In [45]:
import pandas as pd

temp_1 = open("X_train.csv", 'r', encoding='latin-1') 
X_train = pd.read_csv(temp_1)

temp_1 = open("y_train.csv", 'r', encoding='latin-1') 
y_train = pd.read_csv(temp_1, header=None)

temp_1 = open("X_test.csv", 'r', encoding='latin-1') 
X_test = pd.read_csv(temp_1)

temp_1 = open("y_test.csv", 'r', encoding='latin-1') 
y_test = pd.read_csv(temp_1, header=None)

In [46]:
X_train = X_train.loc[:, ~X_train.columns.str.contains('^Unnamed')]
X_test = X_test.loc[:, ~X_test.columns.str.contains('^Unnamed')]
y_train = y_train.iloc[:,1]
y_test = y_test.iloc[:,1]

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(44471, 19)
(44471,)
(7848, 19)
(7848,)


In [47]:
print(y_train.head())
print(X_train.head())

0    0
1    0
2    0
3    0
4    0
Name: 1, dtype: int64
   customer_id  item_id  quantity  selling_price  other_discount  \
0         1463    14223         7         149.60          -47.37   
1          725    22021         1          63.76           -7.12   
2          716    21340         1          45.95            0.00   
3           42    13399         1         102.94          -21.37   
4           93    45712         1         330.91         -131.79   

   coupon_discount  campaign_id  coupon_id  campaign_type  couponValidityDays  \
0             0.00           30         21              0              133.00   
1             0.00            3        884              1               56.00   
2             0.00            8          9              0               77.00   
3             0.00           12          6              1               32.00   
4             0.00           30         23              0              133.00   

   age_range  marital_status  rented  family_si

In [48]:
#Configure workspace
from azureml.core.workspace import Workspace
ws = Workspace.from_config()

In [49]:
#Define the experiment parameter and model settings for training. Use **kwargs for unknown number of named arguments
import logging

automl_settings = {
    "iteration_timeout_minutes": 3,
    "experiment_timeout_minutes": 30,
    "enable_early_stopping": True,
    "primary_metric": 'AUC_weighted',
    "featurization": 'auto',
    "verbosity": logging.INFO,
    "n_cross_validations": 5
}

In [50]:
#autoML config
from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(task='classification',
                             debug_log='automated_ml_errors.log',
                             X=X_train.values,
                             y=y_train.values.flatten(),
                             **automl_settings)



In [51]:
from azureml.core.experiment import Experiment
experiment = Experiment(ws, "Amex-AutoML")
local_run = experiment.submit(automl_config, show_output=True)

Running on local machine
Parent Run ID: AutoML_2f161a84-4cf1-48b1-bd1d-880955f3d2bd

Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS SUMMARY:
For more details, use API: run.get_guardrails()

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  Classes in the training data are imbalanced.

TYPE:         Missing values imputation
STATUS:       PASSED
DESCRIPTION:  There were no missing values found in the training data.

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs wer



MaxAbsScaler LightGBM                          0:02:46       0.9879    0.9879
        22   MaxAbsScaler LightGBM                          0:01:49       0.9869    0.9879
        23   MaxAbsScaler ExtremeRandomTrees                0:01:01       0.9456    0.9879
        24   StandardScalerWrapper LightGBM                 0:01:21       0.9919    0.9919
        25   StandardScalerWrapper LightGBM                 0:00:51       0.9756    0.9919
        26   RobustScaler LightGBM                          0:01:16       0.9810    0.9919
        27   RobustScaler LightGBM                          0:01:52       0.9756    0.9919
        28   StandardScalerWrapper LightGBM                 0:00:52       0.8698    0.9919
        29   StandardScalerWrapper LightGBM                 0:01:01       0.9831    0.9919
        30   PCA LightGBM                                   0:01:04       0.7011    0.9919
        31   VotingEnsemble                                 0:03:22       0.9925    0.9925
        32  

In [52]:
#Explore the results
from azureml.widgets import RunDetails
RunDetails(local_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSET', …

In [53]:
#Retrieve the best model
best_run, fitted_model = local_run.get_output()
print('*'*100)
print(best_run)
print('*'*100)
print(fitted_model)

****************************************************************************************************
Run(Experiment: Amex-AutoML,
Id: AutoML_2f161a84-4cf1-48b1-bd1d-880955f3d2bd_31,
Type: None,
Status: Completed)
****************************************************************************************************
Pipeline(memory=None,
     steps=[('datatransformer', DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
        feature_sweeping_config=None, feature_sweeping_timeout=None,
        featurization_config=None, is_cross_validation=None,
        is_onnx_compatible=None, logger=None, observer=None, task=None)), ('pref...666666666667, 0.06666666666666667, 0.06666666666666667, 0.06666666666666667, 0.06666666666666667]))])
