## E-commerce Customer Churn Prediction
Customer churn refers to the loss of existing clients or customers. This solution identifies  telecom customers who are more likely to close their account and leave the telecom service provider. During the training stage, the solution automatically conducts feature interaction on the training data and selects a subset of features based on feature importance. It then trains multiple models and identifies the best performing model. This model is then selected for prediction on new data.

### Contents

1. [Set up the environment](#Set-up-the-environment)
1. [Usage Instructions](#Usage-Instructions)
1. [Upload the data for training](#Upload-the-data-for-training)
1. [Run Training Job](#Run-Training-Job)
1. [Live Inference Endpoint](#Live Inference)
1. [Batch Transform Job](#Batch-Transform-Job)
1. [Output Interpretation](#Output-Interpretation)



<img src="images/Flow_diagram.JPG">

### Prerequisite

To run this algorithm you need to have access to the following AWS Services:
- Access to AWS SageMaker and the model package.
- An S3 bucket to specify input/output.
- Role for AWS SageMaker to access input/output from S3.

### Input format
#### Input:
Name of the file: <b>train.csv</b><br>
This file contains historical churn prediction data for Ecommerce. T<br><br>

</ul>
<li>  Tenure:Tenure of customer in organization</li>
<li> PreferredLoginDevice: Preferred login device of customer</li>

<li> CityTier: City tier</li>
<li> WarehouseToHome: Distance in between warehouse to home of customer</li>
<li> PreferredPaymentMode: Preferred payment method of customer</li>
<li> Gender: Gender of customer</li>
<li> HourSpendOnApp: Number of hours spend on mobile application or website</li>
    
<li> NumberOfDeviceRegistered: Total number of deceives is registered on particular customer</li>
<li> PreferedOrderCat: Preferred order category of customer in last month</li>
<li> SatisfactionScore: Satisfactory score of customer on service</li>
<li> MaritalStatus: Marital status of customer</li>
<li> NumberOfAddress: Total number of added added on particular customer
    
 <li>Complain: Any complaint has been raised in last month</li>
<li> OrderAmountHikeFromlastYear: Percentage increases in order from last year</li>
<li> CouponUsed: Total number of coupon has been used in last month</li>
<li> OrderCount: Total number of orders has been places in last month</li>
<li> DaySinceLastOrder: Day Since last order by customer</li>
    






## Set up the environment
Here we specify a bucket to use and the role that will be used for working with SageMaker.

In [1]:
# S3 prefix
prefix = 'churn-ecom'

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

## Create the session
The session remembers our connection parameters to SageMaker. We'll use it to perform all of our SageMaker operations.

In [2]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

## Upload the data for training
When training large models with huge amounts of data, you'll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we're using classification dataset, which we have included.

We can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket.

In [3]:
data_location= 's3://mphasis-marketplace/churn-prediction-ecomm/input/train.csv'

## Create an estimator and fit the model
In order to use SageMaker to fit our algorithm, we'll create an Estimator that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:
- The container name. This is constructed as in the shell commands above.
- The role. As defined above.
- The instance count which is the number of machines to use for training.
- The instance type which is the type of machine to use for training.
- The output path determines where the model artifact will be written.
- The session is the SageMaker session object that we defined above

Then we use fit() on the estimator to train against the data that we uploaded above.

In [4]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/telecom-pycaret-churn'.format(account, region)

tree = sage.estimator.Estimator(image,
                       role, 3, 'ml.c4.2xlarge',
                      output_path="s3://{}/output".format(sess.default_bucket()),
                       sagemaker_session=sess)

tree.fit(data_location)

2021-04-28 07:04:20 Starting - Starting the training job...
2021-04-28 07:04:43 Starting - Launching requested ML instancesProfilerReport-1619593459: InProgress
......
2021-04-28 07:05:43 Starting - Preparing the instances for training......
2021-04-28 07:06:47 Downloading - Downloading input data......
2021-04-28 07:07:43 Training - Downloading the training image......
2021-04-28 07:08:44 Training - Training image download completed. Training in progress.[34mStarting the training.[0m
[34m(5619, 19)[0m
[34mIntProgress(value=0, description='Processing: ', max=3)
                                                                    
                                                                    [0m
[34mInitiated  . . . . . . . . . . . . . . . . . .              07:08:35[0m
[34mStatus     . . . . . . . . . . . . . . . . . .  Loading Dependencies
                                                                           
                                                         

## Hosting your model
You can use a trained model to get real time predictions using HTTP endpoint. Follow these steps to walk you through the process.


In [5]:
training_job_name = tree.latest_training_job.name
attached_tree = sage.estimator.Estimator.attach(training_job_name)



2021-04-28 07:10:44 Starting - Preparing the instances for training
2021-04-28 07:10:44 Downloading - Downloading input data
2021-04-28 07:10:44 Training - Training image download completed. Training in progress.
2021-04-28 07:10:44 Uploading - Uploading generated training model
2021-04-28 07:10:44 Completed - Training job completed



### Deploy the model
Deploying the model to SageMaker hosting just requires a deploy call on the fitted model. This call takes an instance count, instance type, and optionally serializer and deserializer functions. These are used when the resulting predictor is created on the endpoint.

In [6]:

from sagemaker.predictor import csv_serializer
predictor = attached_tree.deploy(4, 'ml.m4.xlarge', serializer=csv_serializer,endpoint_name='churn-ecomm-pycaret')

-------------!

## Choose some data and use it for a prediction


In [7]:
test_data  = 's3://mphasis-marketplace/churn-prediction-ecomm/input/test.csv'

data = pd.read_csv(test_data,encoding='ISO-8859–1',header=None)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
0,Tenure,PreferredLoginDevice,CityTier,WarehouseToHome,PreferredPaymentMode,Gender,HourSpendOnApp,NumberOfDeviceRegistered,PreferedOrderCat,SatisfactionScore,MaritalStatus,NumberOfAddress,Complain,OrderAmountHikeFromlastYear,CouponUsed,OrderCount,DaySinceLastOrder,CashbackAmount
1,4,Mobile Phone,3,6,Debit Card,Female,3,3,Laptop & Accessory,2,Single,9,1,11,1,1,5,160
2,,Phone,1,8,UPI,Male,3,4,Mobile,3,Single,7,1,15,0,1,0,121
3,,Phone,1,30,Debit Card,Male,2,4,Mobile,3,Single,6,1,14,0,1,3,120
4,0,Phone,3,15,Debit Card,Male,2,4,Laptop & Accessory,5,Single,8,0,23,0,1,3,134
5,0,Phone,1,12,CC,Male,,3,Mobile,5,Single,3,0,11,1,1,3,130
6,6,Phone,1,14,Debit Card,Male,,1,Mobile,5,Married,7,0,19,6,7,6,131
7,,Phone,1,6,Debit Card,Male,2,3,Mobile,1,Married,2,0,11,0,1,3,120
8,12,Mobile Phone,1,16,Credit Card,Male,3,3,Fashion,3,Single,8,0,21,2,,2,235
9,19,Computer,3,32,Debit Card,Male,3,4,Laptop & Accessory,4,Single,9,0,15,2,2,2,180


In [8]:
predictions = predictor.predict(data.values).decode('utf-8')



The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [9]:
print(predictions)

Tenure,PreferredLoginDevice,CityTier,WarehouseToHome,PreferredPaymentMode,Gender,HourSpendOnApp,NumberOfDeviceRegistered,PreferedOrderCat,SatisfactionScore,MaritalStatus,NumberOfAddress,Complain,OrderAmountHikeFromlastYear,CouponUsed,OrderCount,DaySinceLastOrder,CashbackAmount,Label,Score
4.0,Mobile Phone,3,6,Debit Card,Female,3.0,3,Laptop & Accessory,2,Single,9,1,11,1,1.0,5,160,1,0.6702
,Phone,1,8,UPI,Male,3.0,4,Mobile,3,Single,7,1,15,0,1.0,0,121,1,0.939
,Phone,1,30,Debit Card,Male,2.0,4,Mobile,3,Single,6,1,14,0,1.0,3,120,1,0.9791
0.0,Phone,3,15,Debit Card,Male,2.0,4,Laptop & Accessory,5,Single,8,0,23,0,1.0,3,134,1,0.8279
0.0,Phone,1,12,CC,Male,,3,Mobile,5,Single,3,0,11,1,1.0,3,130,1,0.6665
6.0,Phone,1,14,Debit Card,Male,,1,Mobile,5,Married,7,0,19,6,7.0,6,131,0,0.9916
,Phone,1,6,Debit Card,Male,2.0,3,Mobile,1,Married,2,0,11,0,1.0,3,120,0,0.9972
12.0,Mobile Phone,1,16,Credit Card,Male,3.0,3,Fashion,3,Single,8,0,21,2,,2,235,0,0.9962
19.0,Computer,3,32,Debit Card,Male,3.0,4,Laptop & Acce

### Output

Output files contains column predicted Group, which has the predicted class

In [40]:
transform_output_folder = "batch-transform-output"
output_path="s3://{}/{}".format(sess.default_bucket(), transform_output_folder)

transformer = tree.transformer(instance_count=1,
                               instance_type='ml.m4.xlarge',
                               output_path=output_path)

In [41]:
transformer.transform(test_data, content_type='text/csv')
transformer.wait()
print("Batch Transform output saved to " + transformer.output_path)

.............................[34mStarting the inference server with 4 workers.[0m
[34m[2021-04-20 15:39:52 +0000] [12] [INFO] Starting gunicorn 20.1.0[0m
[34m[2021-04-20 15:39:52 +0000] [12] [INFO] Listening at: unix:/tmp/gunicorn.sock (12)[0m
[34m[2021-04-20 15:39:52 +0000] [12] [INFO] Using worker: gevent[0m
[34m[2021-04-20 15:39:52 +0000] [16] [INFO] Booting worker with pid: 16[0m
[34m[2021-04-20 15:39:52 +0000] [17] [INFO] Booting worker with pid: 17[0m
[34m[2021-04-20 15:39:52 +0000] [18] [INFO] Booting worker with pid: 18[0m
[34m[2021-04-20 15:39:52 +0000] [19] [INFO] Booting worker with pid: 19[0m
[34mTransformation Pipeline and Model Successfully Loaded[0m
[34m169.254.255.130 - - [20/Apr/2021:15:40:00 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [20/Apr/2021:15:40:00 +0000] "GET /execution-parameters HTTP/1.1" 404 2 "-" "Go-http-client/1.1"[0m
[34mTransformation Pipeline and Model Successfully Loaded[0m
[34m169.2

#### Inspect the Batch Transform Output in S3

In [42]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "test.csv")



s3_client = sess.boto_session.client('s3')

response = s3_client.get_object(Bucket = sess.default_bucket(), Key = file_key)
response_bytes = response['Body'].read().decode('utf-8')
print(response_bytes)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Label,Score
1869,7010-BRBUU,Male,0,Yes,Yes,72,Yes,Yes,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Credit card (automatic),24.1,1734.65,No,0.9858
4528,9688-YGXVR,Female,0,No,No,44,Yes,No,Fiber optic,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Credit card (automatic),88.15,3973.2,No,0.7459
6344,9286-DOJGF,Female,1,Yes,No,38,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Bank transfer (automatic),74.95,2869.85,No,0.5653
6739,6994-KERXL,Male,0,No,No,4,Yes,No,DSL,No,No,No,No,No,Yes,Month-to-month,Yes,Electronic check,55.9,238.5,Yes,0.6318
432,2181-UAESM,Male,0,No,No,2,Yes,No,DSL,Yes,No,Yes,No,No,No,Month-to-month,No,Electronic check,53

### View Output
Lets read results of above transform job from s3 files and print output

In [43]:
s3_client = sess.boto_session.client('s3')
s3_client.download_file(sess.default_bucket(), "{}/test.csv.out".format(transform_output_folder), '/tmp/test.csv.out')
with open('/tmp/test.csv.out') as f:
    results = f.readlines() 
##print("Transform results: \n{}".format(''.join(results)))
string_final = ''.join(results)

print(string_final)

with open("Output.txt", "w") as text_file:
    text_file.write(string_final)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Label,Score
1869,7010-BRBUU,Male,0,Yes,Yes,72,Yes,Yes,No,No internet service,No internet service,No internet service,No internet service,No internet service,No internet service,Two year,No,Credit card (automatic),24.1,1734.65,No,0.9858
4528,9688-YGXVR,Female,0,No,No,44,Yes,No,Fiber optic,No,Yes,Yes,No,Yes,No,Month-to-month,Yes,Credit card (automatic),88.15,3973.2,No,0.7459
6344,9286-DOJGF,Female,1,Yes,No,38,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Bank transfer (automatic),74.95,2869.85,No,0.5653
6739,6994-KERXL,Male,0,No,No,4,Yes,No,DSL,No,No,No,No,No,Yes,Month-to-month,Yes,Electronic check,55.9,238.5,Yes,0.6318
432,2181-UAESM,Male,0,No,No,2,Yes,No,DSL,Yes,No,Yes,No,No,No,Month-to-month,No,Electronic check,53