# Customer Churn Modeling Using AutoML

In this example we use the data that we got from Exploratory Data Analysis, Data Preparation and Feature Engineering for a dataset describing customer churn in Synapse. At the end of this notebook we run predictive models using Azure's AutoML.


![EDA](https://stretaildemodev.blob.core.windows.net/notebookimages/data_exploration.jpg?sp=r&st=2022-01-05T22:03:05Z&se=2024-01-06T06:03:05Z&spr=https&sv=2020-08-04&sr=b&sig=9krJNQEBJ8%2BrK99JxmwB%2Fg1A5ThefwoFz%2B8ZdYK3ANU%3D)

### Importing libraries

In [1]:
import azureml.core

from azureml.core import Experiment, Workspace, Dataset, Datastore
from azureml.train.automl import AutoMLConfig
# from notebookutils import mssparkutils
from azureml.data.dataset_factory import TabularDatasetFactory

In [2]:
from pyspark.sql import SparkSession
import matplotlib.pyplot as plt 
import seaborn as sns
import numpy as np
from azure.storage.blob import ContainerClient, BlobClient
import pandas as pd
from io import BytesIO
from copy import deepcopy
import GlobalVariables as gv

### Reading and exploring data

In [3]:
# Reading data by connecting to the azure blob storage account
blob = BlobClient.from_connection_string(conn_str=gv.CustomerChurnCONNECTIONSTRING, container_name=gv.CustomerChurnCONTAINER_NAME, blob_name=gv.CustomerChurnBLOBNAME)
blob_data = blob.download_blob()
BytesIO(blob_data.content_as_bytes())
df = pd.read_csv(BytesIO(blob_data.content_as_bytes()))



In [4]:
df = df.iloc[: , 1:]

In [5]:
df

Unnamed: 0,Customer ID,Segment,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Country
0,13085.0,1,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,1
1,13085.0,1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,1
2,13085.0,1,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,1
3,13085.0,1,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.10,1
4,13085.0,1,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,1
...,...,...,...,...,...,...,...,...,...
370947,18102.0,0,537659,21623,VINTAGE UNION JACK MEMOBOARD,600,2010-12-07 16:43:00,6.38,1
370948,18102.0,0,537659,85064,CREAM SWEETHEART LETTER RACK,160,2010-12-07 16:43:00,3.88,1
370949,18102.0,0,537659,82484,WOOD BLACK BOARD ANT WHITE FINISH,600,2010-12-07 16:43:00,4.78,1
370950,18102.0,0,537659,22833,HALL CABINET WITH 3 DRAWERS,72,2010-12-07 16:43:00,32.69,1


In [6]:
# All columns in the data
df.columns

Index(['Customer ID', 'Segment', 'Invoice', 'StockCode', 'Description',
       'Quantity', 'InvoiceDate', 'Price', 'Country'],
      dtype='object')

In [7]:
# Selecting specific columns for our model
df = df[['Segment','Quantity','Price','StockCode']]

### AutoML models

In [8]:
# Setting up experiment
experiment_name = "syndreamdemoretaildev-CustomerChurnData-20211231061227"
ws = Workspace.get(name=gv.WORKSPACE_NAME,subscription_id=gv.SUBSCRIPTION_ID, resource_group=gv.RESOURCE_GROUP)
experiment = Experiment(ws, experiment_name)
datastore = Datastore.get_default(ws)


In [9]:
dataset = TabularDatasetFactory.register_pandas_dataframe(df, datastore, name = experiment_name + "-dataset")

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/a3ed3b16-530a-42a4-9e94-de268242c269/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


In [10]:
# Initializing AutoML Config
automl_config = AutoMLConfig(task = "classification",
                             training_data = df,
                             label_column_name = "Segment",
                             primary_metric = "accuracy",
                             experiment_timeout_hours = 0.25,
                             max_concurrent_iterations = 2,
                             enable_onnx_compatible_models = False)

In [11]:
from azureml.core import Experiment, Workspace, Dataset, Datastore
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory

In [12]:
# Running AutoML
run = experiment.submit(automl_config)

2022-01-18:00:28:50,630 INFO     [modeling_bert.py:226] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
2022-01-18:00:28:50,635 INFO     [modeling_xlnet.py:339] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
2022-01-18:00:28:54,449 INFO     [utils.py:157] NumExpr defaulting to 4 threads.


Experiment,Id,Type,Status,Details Page,Docs Page
syndreamdemoretaildev-CustomerChurnData-20211231061227,AutoML_0e0498eb-7ac8-44e1-b7e7-8f6c19fb0809,automl,Preparing,Link to Azure Machine Learning studio,Link to Documentation


2022-01-18:00:45:30,556 INFO     [logging_handler.py:290] Sending 2667 bytes
2022-01-18:00:45:30,558 INFO     [logging_handler.py:304] Finish uploading in 0.515512 seconds.
2022-01-18:00:53:24,276 INFO     [explanation_client.py:332] Using default datastore for uploads


In [13]:
# Choosing best model
run.wait_for_completion()

import mlflow

# Get best model from automl run
best_run, non_onnx_model = run.get_output()

artifact_path = experiment_name + "_artifact"

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
mlflow.set_experiment(experiment_name)

with mlflow.start_run() as run:
    # Save the model to the outputs directory for capture
    mlflow.sklearn.log_model(non_onnx_model, artifact_path)

    # Register the model to AML model registry
    mlflow.register_model("runs:/" + run.info.run_id + "/" + artifact_path, "synretailprod-AdobeAnalytics_AdobeAnalyticsWebsiteContacts-20220113085215-Best")

2022-01-18:00:53:55,416 INFO     [utils.py:117] Parsing artifact uri azureml://experiments/syndreamdemoretaildev-CustomerChurnData-20211231061227/runs/473723a5-9cea-4d9b-a64b-9861799fe54f/artifacts
2022-01-18:00:53:55,418 INFO     [utils.py:128] Artifact uri azureml://experiments/syndreamdemoretaildev-CustomerChurnData-20211231061227/runs/473723a5-9cea-4d9b-a64b-9861799fe54f/artifacts info: {'experiment': 'syndreamdemoretaildev-CustomerChurnData-20211231061227', 'runid': '473723a5-9cea-4d9b-a64b-9861799fe54f'}
Registered model 'synretailprod-AdobeAnalytics_AdobeAnalyticsWebsiteContacts-20220113085215-Best' already exists. Creating a new version of this model...
2022/01/18 00:54:00 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: synretailprod-AdobeAnalytics_AdobeAnalyticsWebsiteContacts-20220113085215-Best, version 5
Created version '5' of model 'synretailprod-AdobeAnalytics_AdobeAnalyticsWebsit