# Capstone Project: Create a Customer Segmentation Report for Arvato Financial Services

In this project, you will analyze demographics data for customers of a mail-order sales company in Germany, comparing it against demographics information for the general population. You'll use unsupervised learning techniques to perform customer segmentation, identifying the parts of the population that best describe the core customer base of the company. Then, you'll apply what you've learned on a third dataset with demographics information for targets of a marketing campaign for the company, and use a model to predict which individuals are most likely to convert into becoming customers for the company. The data that you will use has been provided by our partners at Bertelsmann Arvato Analytics, and represents a real-life data science task.

If you completed the first term of this program, you will be familiar with the first part of this project, from the unsupervised learning project. The versions of those two datasets used in this project will include many more features and has not been pre-cleaned. You are also free to choose whatever approach you'd like to analyzing the data rather than follow pre-determined steps. In your work on this project, make sure that you carefully document your steps and decisions, since your main deliverable for this project will be a blog post reporting your findings.

In [16]:
# import libraries here; add more as necessary
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import shutil
import sagemaker
from sagemaker import get_execution_role
import subprocess
import json




# magic word for producing visualizations in notebook
%matplotlib inline

In [3]:
# Setup Sagemaker Session
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
execution_role = sagemaker.session.get_execution_role()
region = sagemaker_session.boto_region_name

print(f'bucket: {bucket}')

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
bucket: sagemaker-us-east-1-709677954768


In [4]:
#download data to notebook
#define data location constants
local_data_dir = 'data'
s3_data_path = f's3://{bucket}/data' 
s3_model_path = f's3://{bucket}/model'

## Initial Model and Kaggle Submission

Below I will be setting up the an initial AutoGluon run without any refienment of the data. Then I'll be submitting to Kaggle.

In [17]:
%%capture

!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
!pip install kaggle
!pip install python-dotenv


### Setting up Kaggle Creds


In [12]:
!mkdir -p kaggle
%env KAGGLE_CONFIG_DIR=kaggle
!touch kaggle/kaggle.json
!chmod 600 kaggle/kaggle.json

env: KAGGLE_CONFIG_DIR=kaggle


In [21]:
from dotenv import dotenv_values

CONFIG = dotenv_values('env.txt')
kaggle_username = CONFIG['KAGGLE_USERNAME']
kaggle_key = CONFIG['KAGGLE_KEY']

# Save API token the kaggle.json file
with open("kaggle/kaggle.json", "w") as f:
    f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))