# Starbucks Capstone Challenge

### Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. 

Not all users receive the same offer, and that is the challenge to solve with this data set.

The task is to combine transaction, demographic and offer data to determine which customer respond best to which offer. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

There are given transactional data showing customer purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer. 


### Example

To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.

However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the "buy 10 dollars get 2 dollars off offer", but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.




### Domain Background

General article of advert targeting using ML algorithm, that describes how important personalized advertising is.
https://www.sciencedirect.com/science/article/pii/S2405959520301090

Specific article for the same data set where an unsupervised learning with clustering is used.
https://medium.com/@jeffrisandy/investigating-starbucks-customers-segmentation-using-unsupervised-machine-learning-10b2ac0cfd3b


The targeted control of offers to customers is a prime example of methods of machine learning and data science.
Which offer can I make to which customers. Which one is the customer most likely to look at and accept?
Is it possible from successfully made offers to develop other offers that are promising? These are questions of e-commerce that can be applied to many business areas.


### Problem Statement

There are many questions to answer in this capstone project
* Is it possible with a combination of demographic and customers features to predict the offer completed value? This is part of a supervised machine learning algorithm.
* Same question is for viewed value. 
* There are customers who make up 80% of sales. Let’s call them power customer. If we label the customers as power or no power customer, is it possible to predict this label, based only on the customer demoraphic features?

### Datasets and Inputs


The data is contained in three files:

* portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
* profile.json - demographic data for each customer
* transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

**portfolio.json**
* id (string) - offer id
* offer_type (string) - type of offer ie BOGO, discount, informational
* difficulty (int) - minimum required spend to complete an offer
* reward (int) - reward given for completing an offer
* duration (int) - time for offer to be open, in days
* channels (list of strings)

**profile.json**
* age (int) - age of the customer 
* became_member_on (int) - date when customer created an app account
* gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
* id (str) - customer id
* income (float) - customer's income

**transcript.json**
* event (str) - record description (ie transaction, offer received, offer viewed, etc.)
* person (str) - customer id
* time (int) - time in hours since start of test. The data begins at time t=0
* value - (dict of strings) - either an offer id or transaction amount depending on the record



### Solution Statement

To create a machine learning model, it is necessary to get the labels for each received offer. In the transcript file are all received offers, all viewed and all completed offers. But there is no link between the received, viewed and completed person offer combinations. Therefor it is necessary to create a dataset where for each received offer the information will be added about viewed and completed. The columns for viewed and completed are the label columns for the machine learning models. 
After labeling the data an explorative data analysis is performed to decide which customer offer combinations will be part of the machine learning models.


### Benchmark Model

The feature input vector for this dataset is with around 25 input features in csv file format relative less complex. I decide to use aws xgboost as the benchmark model the aws linear learner for comparison.


### Evaluation Metrics

As evaluation metrics first the accuracy score will be defined, additional the confusion matrix will be plotted, and the precision and recall will be calculated. With the linear learner it is easy possible to optimize these values.



### Project Design

#### Data Cleaning
First there is a lot of data cleaning to do. Are there duplicate rows? Change the format for date time columns, normalize json columns in transcript file. Then a bunch of customers have no information provided about there income, age, gender. I decide to keep these customers because it is an own group. After cleaning the datasets, the different datasets can be merged into one full dataset. 
7.2	Labeling
As described in the dataset section, there is no clear information which offer is unviewed or uncompleted. Therefore I decided to search for each received offer the corresponding viewed and the correspond completed event, which must be in the duration time frame. If there is no corresponding event, the received offer is labeled with minus 1. 

```match = sub_match.query('offer_id == @offer_id and @time <= time <= @validity')```



#### Exploratory Data Analysis

After data cleaning it is time for an explorative data analysis. Starting with univariate analysis regarding features from profile and offer. Then do some bivariate analysis regarding features in profile dataset. Finally do some analysis regarding the customers and there behave on offers. Are there customers which view or complete all offers, no offers, some offers?
How is the offer viewed rate? Are there popular offers depending on the received, viewed and completed counts?
One more interesting question is about power customers? In addition to the offer data, there are also normal purchases with amount values in the data. When I group this data by customer and sum up the amount values, I can sort it by amount values and find all customers which makes i.e. 80% of total amount or 20% of customers with highest amount summation.
Depending on the results of the data analysis it is possible that more different machine learning models could be developed.

#### Feature Engineering
To create data for machine learning models some additional feature engineering must be done. There are some categorical variables, which will get one hot encoding. Continuous data will be normalized to a range from zero to one. 

#### Machine Learning
**Which customer offer combinations will be viewed?**

For the machine learning model features from customers and offers which were received and there corresponding “offer viewed” label will be used to train a model and to predict if a customer offer combination will viewed an offer. 
As algorithms for the machine learning models I decide to use “xgboost” and the “Linear Learner” both as amazon web services implemented algorithms.

**Which customer offer combinations will be completed?**

For the machine learning model features from customers and offers which were received and there corresponding “offer competed” label will be used to train a model and to predict if a customer offer combination will complete an offer. 
As algorithms for the machine learning models I decide to use “xgboost” and the “Linear Learner” both as amazon web services implemented algorithms.




