### Newspaper Customer Churn Prediction
Customer churn refers to the loss of existing clients or customers. This solution identifies Newspaper customers who are more likely to close their account and leave the newspaper Service. During the training stage, the solution automatically conducts feature interaction on the training data and selects a subset of features based on feature importance. It then trains multiple models and identifies the best performing model. This model is then selected for prediction on new data.

### Contents

1. [Set up the environment](#Set-up-the-environment)
1. [Usage Instructions](#Usage-Instructions)
1. [Upload the data for training](#Upload-the-data-for-training)
1. [Run Training Job](#Run-Training-Job)
1. [Live Inference Endpoint](#Live Inference)
1. [Batch Transform Job](#Batch-Transform-Job)
1. [Output Interpretation](#Output-Interpretation)



<img src="images/Flow_diagram.JPG">

### Prerequisite

To run this algorithm you need to have access to the following AWS Services:
- Access to AWS SageMaker and the model package.
- An S3 bucket to specify input/output.
- Role for AWS SageMaker to access input/output from S3.

### Input format
#### Input:
Name of the file: <b>train.csv</b><br>
This file contains historical churn prediction data for Broadband<br><br>

</ul>
<li>  All Continuous or Categorical variables with any column name can be the input</li>
<li> The target variable should be named 'Churn'</li>

    






## Set up the environment
Here we specify a bucket to use and the role that will be used for working with SageMaker.

In [14]:
# S3 prefix
prefix = 'churn-newspaper'

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

## Create the session
The session remembers our connection parameters to SageMaker. We'll use it to perform all of our SageMaker operations.

In [15]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

## Upload the data for training
When training large models with huge amounts of data, you'll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we're using classification dataset, which we have included.

We can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket.

In [16]:
data_location= 's3://mphasis-marketplace/churn-prediction-newspaper/input/train.csv'

## Create an estimator and fit the model
In order to use SageMaker to fit our algorithm, we'll create an Estimator that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:
- The container name. This is constructed as in the shell commands above.
- The role. As defined above.
- The instance count which is the number of machines to use for training.
- The instance type which is the type of machine to use for training.
- The output path determines where the model artifact will be written.
- The session is the SageMaker session object that we defined above

Then we use fit() on the estimator to train against the data that we uploaded above.

In [17]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/telecom-pycaret-churn'.format(account, region)

tree = sage.estimator.Estimator(image,
                       role, 3, 'ml.c4.2xlarge',
                      output_path="s3://{}/output".format(sess.default_bucket()),
                       sagemaker_session=sess)

tree.fit(data_location)

2021-05-05 13:56:19 Starting - Starting the training job...
2021-05-05 13:56:42 Starting - Launching requested ML instancesProfilerReport-1620222979: InProgress
......
2021-05-05 13:57:42 Starting - Preparing the instances for training......
2021-05-05 13:58:42 Downloading - Downloading input data
2021-05-05 13:58:42 Training - Downloading the training image......
2021-05-05 13:59:42 Training - Training image download completed. Training in progress.[35mStarting the training.[0m
[35m(15855, 17)[0m
[35mIntProgress(value=0, description='Processing: ', max=3)
                                                                    
                                                                    [0m
[35mInitiated  . . . . . . . . . . . . . . . . . .              13:59:38[0m
[35mStatus     . . . . . . . . . . . . . . . . . .  Loading Dependencies
                                                                           
                                                              

## Hosting your model
You can use a trained model to get real time predictions using HTTP endpoint. Follow these steps to walk you through the process.


In [18]:
training_job_name = tree.latest_training_job.name
attached_tree = sage.estimator.Estimator.attach(training_job_name)



2021-05-05 14:12:05 Starting - Preparing the instances for training
2021-05-05 14:12:05 Downloading - Downloading input data
2021-05-05 14:12:05 Training - Training image download completed. Training in progress.
2021-05-05 14:12:05 Uploading - Uploading generated training model
2021-05-05 14:12:05 Completed - Training job completed



### Deploy the model
Deploying the model to SageMaker hosting just requires a deploy call on the fitted model. This call takes an instance count, instance type, and optionally serializer and deserializer functions. These are used when the resulting predictor is created on the endpoint.

In [19]:

from sagemaker.predictor import csv_serializer
predictor = attached_tree.deploy(4, 'ml.m4.xlarge', serializer=csv_serializer,endpoint_name='churn-newspaper-pycaret')

---------------!

## Choose some data and use it for a prediction


In [20]:
test_data  = 's3://mphasis-marketplace/churn-prediction-newspaper/input/test.csv'

data = pd.read_csv(test_data,encoding='ISO-8859–1',header=None)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,HH Income,Home Ownership,Ethnicity,dummy for Children,Year Of Residence,Age range,Language,State,City,County,Zip Code,weekly fee,Deliveryperiod,Nielsen Prizm,reward program,Source Channel
1,"$125,000 - $149,999",OWNER,English,N,23,70-74,English,CA,LAGUNA NIGUEL,ORANGE,92677,$9.00 - $9.99,7DayOL,MW,1,Internet
2,"$400,000 - $499,999",OWNER,English,Y,16,50-54,English,CA,BUENA PARK,ORANGE,90621,$2.00 - $2.99,Thu-Sun,FW,0,TeleIn
3,"$150,000 - $174,999",OWNER,English,N,11,65-69,English,CA,IRVINE,ORANGE,92603,$4.00 - $4.99,7Day,YW,0,CustCall
4,"$200,000 - $249,999",OWNER,unknown,N,1,25-29,,CA,LADERA RANCH,ORANGE,92694,$1.00 - $1.99,SunOnly,FW,0,TeleIn
5,"Under $20,000",RENTER,English,N,7,25-29,English,CA,IRVINE,ORANGE,92614,$2.00 - $2.99,7Day,YM,0,TeleIn
6,"$ 80,000 - $89,999",OWNER,Italian,N,7,60-64,English,CA,ALISO VIEJO,ORANGE,92656,$0.51 - $0.99,7Day,YM,0,RetenIn


In [21]:
predictions = predictor.predict(data.values).decode('utf-8')



The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [22]:
print(predictions)

HH Income,Home Ownership,Ethnicity,dummy for Children,Year Of Residence,Age range,Language,State,City,County,Zip Code,weekly fee,Deliveryperiod,Nielsen Prizm,reward program,Source Channel,Label,Score
"$125,000 - $149,999",OWNER,English,N,23,70-74,English,CA,LAGUNA NIGUEL,ORANGE,92677,$9.00 - $9.99,7DayOL,MW,1,Internet,NO,0.8317
"$400,000 - $499,999",OWNER,English,Y,16,50-54,English,CA,BUENA PARK,ORANGE,90621,$2.00 - $2.99,Thu-Sun,FW,0,TeleIn,NO,0.9177
"$150,000 - $174,999",OWNER,English,N,11,65-69,English,CA,IRVINE,ORANGE,92603,$4.00 - $4.99,7Day,YW,0,CustCall,NO,0.9275
"$200,000 - $249,999",OWNER,unknown,N,1,25-29,,CA,LADERA RANCH,ORANGE,92694,$1.00 - $1.99,SunOnly,FW,0,TeleIn,NO,0.9779
"Under $20,000",RENTER,English,N,7,25-29,English,CA,IRVINE,ORANGE,92614,$2.00 - $2.99,7Day,YM,0,TeleIn,NO,0.9821
"$  80,000 - $89,999",OWNER,Italian,N,7,60-64,English,CA,ALISO VIEJO,ORANGE,92656,$0.51 - $0.99,7Day,YM,0,RetenIn,NO,0.9513



### Output

Output files contains column predicted Group, which has the predicted class

In [23]:
transform_output_folder = "batch-transform-output"
output_path="s3://{}/{}".format(sess.default_bucket(), transform_output_folder)

transformer = tree.transformer(instance_count=1,
                               instance_type='ml.m4.xlarge',
                               output_path=output_path)

In [24]:
transformer.transform(test_data, content_type='text/csv')
transformer.wait()
print("Batch Transform output saved to " + transformer.output_path)

.................................[34mStarting the inference server with 4 workers.[0m
[34m[2021-05-05 14:27:23 +0000] [12] [INFO] Starting gunicorn 20.1.0[0m
[34m[2021-05-05 14:27:23 +0000] [12] [INFO] Listening at: unix:/tmp/gunicorn.sock (12)[0m
[34m[2021-05-05 14:27:23 +0000] [12] [INFO] Using worker: gevent[0m
[34m[2021-05-05 14:27:23 +0000] [16] [INFO] Booting worker with pid: 16[0m
[34m[2021-05-05 14:27:23 +0000] [17] [INFO] Booting worker with pid: 17[0m
[34m[2021-05-05 14:27:24 +0000] [22] [INFO] Booting worker with pid: 22[0m
[34m[2021-05-05 14:27:24 +0000] [24] [INFO] Booting worker with pid: 24[0m
[34mTransformation Pipeline and Model Successfully Loaded[0m
[35mTransformation Pipeline and Model Successfully Loaded[0m
[34m169.254.255.130 - - [05/May/2021:14:27:33 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [05/May/2021:14:27:33 +0000] "GET /execution-parameters HTTP/1.1" 404 2 "-" "Go-http-client/1.1"[0m
[34mT

#### Inspect the Batch Transform Output in S3

In [25]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "test.csv")



s3_client = sess.boto_session.client('s3')

response = s3_client.get_object(Bucket = sess.default_bucket(), Key = file_key)
response_bytes = response['Body'].read().decode('utf-8')
print(response_bytes)

HH Income,Home Ownership,Ethnicity,dummy for Children,Year Of Residence,Age range,Language,State,City,County,Zip Code,weekly fee,Deliveryperiod,Nielsen Prizm,reward program,Source Channel,Label,Score
"$125,000 - $149,999",OWNER,English,N,23,70-74,English,CA,LAGUNA NIGUEL,ORANGE,92677,$9.00 - $9.99,7DayOL,MW,1,Internet,NO,0.8317
"$400,000 - $499,999",OWNER,English,Y,16,50-54,English,CA,BUENA PARK,ORANGE,90621,$2.00 - $2.99,Thu-Sun,FW,0,TeleIn,NO,0.9177
"$150,000 - $174,999",OWNER,English,N,11,65-69,English,CA,IRVINE,ORANGE,92603,$4.00 - $4.99,7Day,YW,0,CustCall,NO,0.9275
"$200,000 - $249,999",OWNER,unknown,N,1,25-29,,CA,LADERA RANCH,ORANGE,92694,$1.00 - $1.99,SunOnly,FW,0,TeleIn,NO,0.9779
"Under $20,000",RENTER,English,N,7,25-29,English,CA,IRVINE,ORANGE,92614,$2.00 - $2.99,7Day,YM,0,TeleIn,NO,0.9821
"$  80,000 - $89,999",OWNER,Italian,N,7,60-64,English,CA,ALISO VIEJO,ORANGE,92656,$0.51 - $0.99,7Day,YM,0,RetenIn,NO,0.9513



### View Output
Lets read results of above transform job from s3 files and print output

In [26]:
s3_client = sess.boto_session.client('s3')
s3_client.download_file(sess.default_bucket(), "{}/test.csv.out".format(transform_output_folder), '/tmp/test.csv.out')
with open('/tmp/test.csv.out') as f:
    results = f.readlines() 
##print("Transform results: \n{}".format(''.join(results)))
string_final = ''.join(results)

print(string_final)

with open("Output.txt", "w") as text_file:
    text_file.write(string_final)

HH Income,Home Ownership,Ethnicity,dummy for Children,Year Of Residence,Age range,Language,State,City,County,Zip Code,weekly fee,Deliveryperiod,Nielsen Prizm,reward program,Source Channel,Label,Score
"$125,000 - $149,999",OWNER,English,N,23,70-74,English,CA,LAGUNA NIGUEL,ORANGE,92677,$9.00 - $9.99,7DayOL,MW,1,Internet,NO,0.8317
"$400,000 - $499,999",OWNER,English,Y,16,50-54,English,CA,BUENA PARK,ORANGE,90621,$2.00 - $2.99,Thu-Sun,FW,0,TeleIn,NO,0.9177
"$150,000 - $174,999",OWNER,English,N,11,65-69,English,CA,IRVINE,ORANGE,92603,$4.00 - $4.99,7Day,YW,0,CustCall,NO,0.9275
"$200,000 - $249,999",OWNER,unknown,N,1,25-29,,CA,LADERA RANCH,ORANGE,92694,$1.00 - $1.99,SunOnly,FW,0,TeleIn,NO,0.9779
"Under $20,000",RENTER,English,N,7,25-29,English,CA,IRVINE,ORANGE,92614,$2.00 - $2.99,7Day,YM,0,TeleIn,NO,0.9821
"$  80,000 - $89,999",OWNER,Italian,N,7,60-64,English,CA,ALISO VIEJO,ORANGE,92656,$0.51 - $0.99,7Day,YM,0,RetenIn,NO,0.9513

