# Starbucks Capstone Challenge

## Project Overview

In my final project, I aim to answer the following questions:

1. What are the main motivators that make an offer convert into a sale?
2. Is it possible to predict the acceptance of an offer by a customer with the data sets made available?

The data sets used for the development of this project contain simulated data that mimic customer behavior in the Starbucks mobile rewards app. Every few days, Starbucks sends an offer to users of the mobile app. An offer can be just an ad for a drink or a real offer, like a discount or BOGO (buy one and take another). Some users may not receive an offer for certain weeks. Not all users receive the same offer, and that is the challenge to be solved with this data set.

As part of the solution and for a better organization of the analysis layout, I will conduct the project using the CRISP-DM methodology.


## Business Understanding

Background: The Starbucks company generates offers and sends them to its customers through the mobile application. Through these offers, they aim to increase the company's revenue by converting these offers into effective sales. as mentioned above in the project overview, the data sets used for the development of this project contain data that simulate the customer's behavior in the mobile application and have the following information:

- Offer portfolio, which consists of the attributes of each offer
- Demographic data for each customer
- Transactional records of events that occur in the application

Project objective: The objective of this project is to answer the following questions below

1. What are the main motivators that make an offer convert into a sale?
2. Is it possible to predict the acceptance of an offer by a customer with a set of data available?

Using the data sets available for this project, I develop analyzes to answer the above questions using machine learning models. The use of machine learning models help me to identify the main motivators for generating a good offer, as well as to generate predictions about the acceptance of offers by customers in the mobile application. Throughout the process, I explore characteristics of customers who accept or reject the offers generated and I also explore how much money a customer could spend on an offer that is influencing them.

Criterio de Suesso: The success criterion for this project will be to be able to answer the questions defined in the objective based on the analyzes that will be generated.

## Data Sets - Details

The data is contained in three files:

* portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
* profile.json - demographic data for each customer
* transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

**portfolio.json**
* id (string) - offer id
* offer_type (string) - type of offer ie BOGO, discount, informational
* difficulty (int) - minimum required spend to complete an offer
* reward (int) - reward given for completing an offer
* duration (int) - time for offer to be open, in days
* channels (list of strings)

**profile.json**
* age (int) - age of the customer 
* became_member_on (int) - date when customer created an app account
* gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
* id (str) - customer id
* income (float) - customer's income

**transcript.json**
* event (str) - record description (ie transaction, offer received, offer viewed, etc.)
* person (str) - customer id
* time (int) - time in hours since start of test. The data begins at time t=0
* value - (dict of strings) - either an offer id or transaction amount depending on the record

## Data Understanding

### Libraries

In [5]:
import pandas as pd
import numpy as np
import json
import math

from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler

from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error
from sklearn.metrics import classification_report

from time import time

import matplotlib.pyplot as plt
%matplotlib inline

### Load Data Sets

In [8]:
portfolio = pd.read_json('data/portfolio.json', orient='records', lines=True)
profile = pd.read_json('data/profile.json', orient='records', lines=True)
transcript = pd.read_json('data/transcript.json', orient='records', lines=True)