# Amazon SageMaker Workshop
## _**Introduction**_

This workshop has been adapted from an [AWS blog post](https://aws.amazon.com/blogs/ai/predicting-customer-churn-with-amazon-machine-learning/). 

Losing customers is costly for any business. Identifying unhappy customers early on gives you a chance to offer them incentives to stay.  In this workshop we'll use machine learning (ML) for automated identification of unhappy customers, also known as customer churn prediction.

---
In this workshop we will use Gradient Boosted Trees to Predict Mobile Customer Departure.

To solve put our model in production we will use some features of SageMaker like:

* [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html)
* [Amazon SageMaker Training Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html)
* [Amazon SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html)
  * Manage multiple trials
  * Experiment with hyperparameters and charting
* [Amazon SageMaker Debugger](https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html)
  * Debug your model 
* [Amazon SageMaker Clarify](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-fairness-and-explainability.html)
* [Model hosting](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html)
  * Set up a persistent endpoint to get predictions from your model
* [SageMaker Model Monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-model-monitor.html)
  * Monitor the quality of your model
  * Set alerts for when model quality deviates
* [Amazon SageMaker Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html)

---

## The format of this workshop

Although we recommend that you follow and run the Labs in order, _this workshop was built in a way that you can skip labs or just do those that interest you the most_ (e.g. you can just run the last Lab, or just run labs 4 an 5, or lab 1 and 4, etc.). Running the labs in order help us understand the natural flow of an ML project and may make more sense.

> This is only possible because we leverage the design of SageMaker where each component is independent from each other (e.g. training jobs, hosting, processing) and customers have the freedom to use those that fit better to their use-case.

This `0-Introduction` lab is the only Lab that is strictly required to setup some basic things like creating S3 buckets, installing packages, etc.)

---

## The Data

Mobile operators have historical records that tell them which customers ended up churning and which continued using the service. We can use this historical information to train an ML model that can predict customer churn. After training the model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model to have the model predict whether this customer will churn. 

The dataset we use is publicly available and was mentioned in [Discovering Knowledge in Data](https://www.amazon.com/dp/0470908742/) by Daniel T. Larose. It is attributed by the author to the University of California Irvine Repository of Machine Learning Datasets. The `Data sets` folder that came with this notebook contains the churn dataset.

The dataset can be [downloaded here.](https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=11704&itemId=0470908742&resourceId=46577)

---

## Let's configure our environment

In [None]:
import sys
!{sys.executable} -m pip install sagemaker -U
#!{sys.executable} -m pip install sagemaker==2.42.0 -U
!{sys.executable} -m pip install sagemaker-experiments
!{sys.executable} -m pip install xgboost
#!{sys.executable} -m pip install xgboost==1.3.3
#!pip freeze | grep sagemaker
#!pip freeze | grep xgboostb

In [None]:
import pandas as pd
import boto3
import sagemaker

sess = boto3.Session()
region = sess.region_name
sm = sess.client('sagemaker')
role = sagemaker.get_execution_role()

region = sess.region_name
account_id = sess.client('sts', region_name=region).get_caller_identity()["Account"]
bucket = 'sagemaker-studio-{}-{}'.format(sess.region_name, account_id)
prefix = 'xgboost-churn'

try:
    if sess.region_name == "us-east-1":
        sess.client('s3').create_bucket(Bucket=bucket)
    else:
        sess.client('s3').create_bucket(Bucket=bucket, 
                                        CreateBucketConfiguration={'LocationConstraint': sess.region_name})
except Exception as e:
    print("Looks like you already have a bucket of this name. That's good!")

framework_version = '1.2-2'
docker_image_name = sagemaker.image_uris.retrieve(framework='xgboost', region=region, version=framework_version)

# Workaround while versions are not updated in SM SDK
framework_version = '1.3-1'
docker_image_name = docker_image_name[:-5] + framework_version

print("Setting some useful environment variables (bucket, prefix, region, docker_image_name)...")
%store bucket
%store prefix
%store region
%store docker_image_name
%store framework_version

---
## Let's download the data and upload to S3

In [None]:
!wget https://higheredbcs.wiley.com/legacy/college/larose/0470908742/ds/data_sets.zip
!unzip -o data_sets.zip
!mv "Data sets"/churn.txt .
!rm -rf "Data sets" data_sets.zip

In [None]:
local_raw_path = "churn.txt"
raw_dir = f"{prefix}/data/raw"
s3uri_raw = sagemaker.s3.S3Uploader.upload(local_raw_path, f's3://{bucket}/{raw_dir}')
s3uri_raw

Store the raw data S3 URI for later:

In [None]:
%store s3uri_raw

print("\n\nWe are ready for starting the SageMaker Workshop!")

---
# [You can now go to the first lab 1-DataPrep](../1-DataPrep/data_preparation.ipynb)