# Starbucks Capstone Challenge

### Project Overview

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. 

Not all users receive the same offer, and that is the challenge to solve with this data set.

Your task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

You'll be given transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer. 

Keep in mind as well that someone using the app might make a purchase through the app without having received an offer or seen an offer.



# Problem Statement :
Predicting the purchase offer to which a possible higher level of response or user actions like ‘offer received’, ‘offer viewed’, ‘transaction’ and  ‘offer completed’ can be achieved based on the demographic attributes of the customer and other attributes of the companies purchase offers. 


## Importing Libraries & loading datasets :

In [3]:
import pandas as pd
import numpy as np
import math
import json
%matplotlib inline

# read in the json files
portfolio = pd.read_json('data/portfolio.json', orient='records', lines=True)
profile = pd.read_json('data/profile.json', orient='records', lines=True)
transcript = pd.read_json('data/transcript.json', orient='records', lines=True)

In [4]:
import matplotlib.pyplot as plt
import seaborn as sns

In [6]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import tensorflow as tf
import keras

ModuleNotFoundError: No module named 'tensorflow'

In [None]:
portfolio.head(1) 

In [None]:
profile.head(1) 

In [None]:
transcript.head(1) 

## Data Accessing and Cleaning :

### 1. Portfolio Data

#### Dataset overview :

**portfolio.json**

- id (string) - offer id
- offer_type (string) - type of offer ie BOGO, discount, informational
- difficulty (int) - minimum required spend to complete an offer
- reward (int) - reward given for completing an offer
- duration (int) - time for offer to be open, in days
- channels (list of strings)

 There are three types of offers that can be sent: `buy-one-get-one (BOGO)`, `discount`, and `informational`.
- In a BOGO offer, a user needs to spend a certain amount to get a reward equal to that threshold amount. 
- In a discount, a user gains a reward equal to a fraction of the amount spent.
- In an informational offer, there is no reward, but neither is there a requisite amount that the user is expected to spend. 

Offers can be delivered via multiple channels.

In [None]:
portfolio.head()

In [None]:
portfolio.shape

In [None]:
portfolio.info()

In [None]:
portfolio.describe()

In [None]:
portfolio['channels']

### - Data Cleaning

- create a copy of the original dataframe for further implementation .
- convert the column 'Channels' into 4 different channel on the basis of different types of channel .
- rename the column name from 'ID' to 'offer_id' .

In [None]:
df1 = portfolio.copy()

In [None]:
dummy = pd.get_dummies(df1.channels.apply(pd.Series).stack()).sum(level=0)
df1 = pd.concat([df1, dummy], axis=1)
df1 = df1.drop(columns='channels')

In [None]:
df1 = df1.rename(columns={'id':'offer_id'})

In [None]:
df1

### 2. Profile Data
#### Dataset overview :

**profile.json**

- age (int) - age of the customer
- became_member_on (int) - date when customer created an app account
- gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
- id (str) - customer id
- income (float) - customer's income

In [None]:
profile.head()

In [None]:
profile.shape

In [None]:
profile.info()

In [None]:
profile.describe()

In [None]:
profile.duplicated().sum()

## - Data Cleaning
- create a copy of the original dataframe for further implementation .
- convert the datatype of 'became_member_on' column and sort the date into proper format .
- change the column name from 'ID' to 'customer_id' .

In [None]:
df2 = profile.copy()

In [None]:
df2['became_member_on'] = pd.to_datetime(df2['became_member_on'], format='%Y%m%d')

In [None]:
df2 = df2.rename(columns={'id':'customer_id'})

In [None]:
df2.head(10)

In [None]:
type(df2.became_member_on[0])

### 3. Transcript Data
#### Dataset overview :

**transcript.json**

- event (str) - record description (ie transaction, offer received, offer viewed, etc.)
- person (str) - customer id
- time (int) - time in hours since start of test. The data begins at time t=0
- value - (dict of strings) - either an offer id or transaction amount depending on the record

In [None]:
transcript.head()

In [None]:
transcript.shape

In [None]:
transcript.info()

In [None]:
transcript.describe()

In [None]:
transcript['value']

In [None]:
transcript['value'].value_counts()   
# the error occur because Column Vlaue contains dictonary in each row .

In [None]:
transcript['event'].unique()

In [None]:
transcript['event'].value_counts()

## - Data Cleaning
- create a copy of the original dataframe for further implementation .
- change the column name from 'person' to 'customer_id' .
- convert the column 'Event' into 4 different columns on the basis of different types of event .
- convert the column 'Values' into 2 different column  .

In [None]:
df3 = transcript.copy()

In [None]:
df3 = df3.rename(columns={'person':'customer_id'})

In [None]:
df3['event'] = df3['event'].str.replace(' ', '-')


In [None]:
df3['event'].value_counts()

In [None]:
dummy = pd.get_dummies(df3['event'])
df3 = pd.concat([df3, dummy], axis=1 )

In [None]:
df3.head()

In [None]:
df3['offer_id'] = [[*i.values()][0]if [*i.keys()][0] in ['offer id','offer_id'] else None for i in df3.value]
df3['amount'] = [np.round([*i.values()][0], decimals=2)if [*i.keys()][0] == 'amount' else None for i in df3.value]

In [None]:
df3 = df3.drop(columns='value')

In [None]:
df3.head()

## - Data Cleaning :

- Concatenate all the three dataset together .
- Fixed the Offer_ids .
- fixed even_ids

In [None]:
t_p = pd.merge(df3, df2, on='customer_id')


In [None]:
t_p

In [None]:
df = pd.merge(t_p, df1, on='offer_id', how='left')
df

In [None]:
offer_id = df['offer_id'].unique()
offer_id

In [None]:
offer_dict = pd.Series(offer_id ).to_dict()
offer_dict

In [None]:
offer_dict = dict([(value, key) for key, value in offer_dict.items()]) 
offer_dict

In [None]:
df['offer_id'] = df['offer_id'].map(offer_dict)
df.head()

In [None]:
df['offer_id'] = df['offer_id'].replace(1, np.nan)

In [None]:
df.head()

In [None]:
df['offer_id'].unique()

In [None]:
event_ids = df['event'].unique()
event_ids

In [None]:
event_dict = pd.Series(event_ids).to_dict()
event_dict

In [None]:
event_dict = dict([(value, key) for key, value in event_dict.items()]) 
event_dict

In [None]:
#map event_ids to the encoded event ids
df['event_id'] = df['event'].map(event_dict)

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.columns

In [None]:
df.info()

In [None]:
df.to_csv('data/data.csv', index=False)

In [None]:
data = pd.read_csv('data/data.csv')

## Data Exploration and Data Visualization :

In [None]:
data.age.describe()

In [None]:
data.age.hist(bins = 30)
plt.xlabel('Age Group')
plt.ylabel('Count')
plt.title('Age Group Distribution');

### Observation :

- Outlier is present Age > 115 is present is high amount , which does not make sense .
- Average Aged user is middle age ie. around 50-62 years 

In [None]:
data.income.describe()

In [None]:
data.income.hist(bins = 30);
plt.xlabel('Income Range')
plt.ylabel('Count')
plt.title('Income Range Distribution');

### Observation :

- Average income user is middle income group ie. 65000-70000 

In [None]:
data.gender.value_counts()

In [None]:
male_proportion  = data.gender.value_counts()[0] / data.shape[0]*100
female_proportion = data.gender.value_counts()[1] / data.shape[0]*100
others_proportion = data.gender.value_counts()[2] / data.shape[0]*100

male_proportion ,female_proportion ,others_proportion

In [None]:
ax = data.gender.value_counts()
ax.plot(kind='bar')
plt.ylabel('Number of People')
plt.xlabel('Gender')
plt.title('Gender Distribution');

### Observation :

- males are more than 50 percent of users .

In [None]:
offer_received = data[data['offer-received'] == 1].offer_type.value_counts()
offer_viewed = data[data['offer-viewed'] == 1].offer_type.value_counts()
offer_completed = data[data['offer-completed'] == 1].offer_type.value_counts()

offer_received , offer_viewed , offer_completed 

In [None]:
plt.subplot(131)
offer_received = data[data['offer-received'] == 1].offer_type.value_counts()
offer_received.plot(kind='bar', figsize=(15,5))
plt.ylabel('counts')
plt.xlabel('Offer Type')
plt.title('Offer received with Offer Type ');

plt.subplot(132)
offer_viewed = data[data['offer-viewed'] == 1].offer_type.value_counts()
offer_viewed.plot(kind='bar' , figsize=(15,5))
plt.ylabel('counts')
plt.xlabel('Offer Type')
plt.title('Offer viewed with Offer Type ');

plt.subplot(133)
offer_completed = data[data['offer-completed'] == 1].offer_type.value_counts()
offer_completed.plot(kind='bar' , figsize=(15,5))
plt.ylabel('counts')
plt.xlabel('Offer Type')
plt.title('Offer completed received with Offer Type ');


In [None]:
# For BOGO Offer :

R = offer_received[1] 
V = offer_viewed[0] 
C = offer_completed[1] 

view_prop = V/R
com_prop = C/R
R , V , C , view_prop , com_prop

In [None]:
# For DISCOUNT Offer :

R = offer_received[0] 
V = offer_viewed[1] 
C = offer_completed[0] 

view_prop = V/R
com_prop = C/R
R , V , C , view_prop , com_prop

### Observation :

- BOGO offers are highly demanding , 30499 users received BOGO offer  25449 viewed the offer and 15669 completed it .
- the percentage of BOGO Offer viewer is 83 percent .
- the percentage of DISCOUNT Offer viewer is 70 percent .

In [None]:
offer_received = data[data['offer-received'] == 1].offer_id.value_counts()
offer_viewed = data[data['offer-viewed'] == 1].offer_id.value_counts()
offer_completed = data[data['offer-completed'] == 1].offer_id.value_counts()

offer_received , offer_viewed , offer_completed 

In [None]:
plt.subplot(131)
offer_received = data[data['offer-received'] == 1].offer_id.value_counts()
offer_received.plot(kind='bar', figsize=(15,5))
plt.ylabel('counts')
plt.xlabel('Offer Id ')
plt.title('Offer received with Offer Id ');

plt.subplot(132)
offer_viewed = data[data['offer-viewed'] == 1].offer_id.value_counts()
offer_viewed.plot(kind='bar' , figsize=(15,5))
plt.ylabel('counts')
plt.xlabel('Offer Id')
plt.title('Offer viewed with Offer Id ');

plt.subplot(133)
offer_completed = data[data['offer-completed'] == 1].offer_id.value_counts()
offer_completed.plot(kind='bar' , figsize=(15,5))
plt.ylabel('counts')
plt.xlabel('Offer Id')
plt.title('Offer completed received with Offer Id ');

### Observation :

- evry offer_id received eual offers .
- Viewing ratio decreased for some offer_ids like 0 , 6 , 7 , 5
- Offer completed ration is quite decent .

In [None]:
data[data['offer_type']=='bogo'].groupby('customer_id')['offer-received'].count()

In [None]:
data[data['offer_type']=='bogo'].groupby('customer_id')['offer-received'].count().hist();
plt.title('BOGO Offer Received by User');

### Observation :

- the BOGO offer is received by quite decent amount of users .

In [None]:
data[data['offer_type']=='informational'].groupby('customer_id')['offer-viewed'].count()

In [None]:
data[data['offer_type']=='informational'].groupby('customer_id')['offer-viewed'].count().hist();
plt.title('Informational Offer Received by User');

### Observation :

- the ratio is 2 - 4 offer viewed is very high .
- the difference is extremely high .

In [None]:
data[data['offer_type']=='discount'].groupby('customer_id')['offer-completed'].count()

In [None]:
data[data['offer_type']=='discount'].groupby('customer_id')['offer-completed'].count().hist();
plt.title('Discount Offer Received by User');

### Observation :

- the ratio od 2 - 4 times offer completed by the customer is very high .

## Modeling and Predictions :

- apply one hot encoding for Gender column and Offer_type Column ( Pre Model Prepration )

In [None]:
genders = {'O': 0, 'M': 1, 'F': 2}
data['gender'] = data['gender'].map(genders)

In [None]:
data.offer_type.value_counts()

In [None]:
offers = {'bogo': 0, 'discount': 1, 'informational': 2}
data['offer_type'] = data['offer_type'].map(offers)

In [None]:
data.head()

In [None]:
data.columns

In [None]:
X = data.drop(['customer_id', 'event_id' , 'event' , 'became_member_on','offer-completed', 'offer-received',
       'offer-viewed', 'transaction'], axis=1)
Y = data['event_id']

In [None]:
X.head()

In [None]:
Y.head()

In [None]:
X.shape , Y.shape

### Feature Scaling : 

#### Standardization & Normalization

Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.

Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

In [None]:
class_name =['offer recieved', 'offer viewed', 'transaction', 'offer completed']

In [None]:
#split the dataset into test and train sets.
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

In [None]:
std = StandardScaler()

In [None]:
X_train.income = std.fit_transform(X_train.income.values.reshape(-1, 1))
X_train.age = std.fit_transform(X_train.age.values.reshape(-1, 1))

X_train.reset_index(inplace=True)
X_train = X_train.drop(['index'], axis=1)

In [None]:
X_test.income = std.transform(X_test.income.values.reshape(-1, 1))
X_test.age = std.fit_transform(X_test.age.values.reshape(-1, 1))

X_test.reset_index(inplace=True)
X_test = X_test.drop(['index'], axis=1)

In [None]:
X_train.shape, X_test.shape

- converting the pandas dataframe into numpy array .

In [None]:
X_train = X_train.values
X_test = X_test.values
y_train = y_train.values
y_test = y_test.values

#### Build a Model :

In [None]:
ann = keras.models.Sequential()

In [None]:
ann.add(keras.layers.Dense(6, activation='relu'))
ann.add(keras.layers.Dense(6, activation='relu'))
ann.add(keras.layers.Dense(4, activation = 'softmax'))

In [None]:
ann.compile(optimizer = 'adam', 
            loss = 'sparse_categorical_crossentropy', 
            metrics = ['accuracy'])

In [None]:
ann_history = ann.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=15, batch_size=100)

In [None]:
ann.summary()       
# Summary of our model 

In [None]:
ann.history.params

In [None]:
pd.DataFrame(ann.history.history).plot(figsize=(8,5))
plt.grid(True)
plt.gca().set_ylim(0,1)               # Y AXIS RANGE LIMIT 
plt.show()

### Evaluation Martrix :

- evaluation can be done on the basis of accuracy obtained by the model and by Loss produced .

In [None]:
ann.evaluate(X_test , y_test)

## Observation :

- the test accuracy is only 25% .
- and the rate of accuracy remains constant throught the process .
- this model needs some correction and improvement for better result .

## Refinement : Improving Prediction Model 

- let's create a new X dataframe with highly recommended features ,and highly dependent features .
- more hidden layers .
- more hidden units .

In [None]:
data.columns

In [None]:
X = data.drop(['customer_id', 'event_id' ,  'amount','event' , 'became_member_on','offer-completed', 'offer-received',
       'offer-viewed','email', 'mobile', 'social', 'web', 'time','transaction', 'duration'], axis=1)
Y = data['event_id']

In [None]:
X.head()

In [None]:
X.shape , Y.shape

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

In [None]:
X_train.income = std.fit_transform(X_train.income.values.reshape(-1, 1))
X_train.age = std.fit_transform(X_train.age.values.reshape(-1, 1))

X_train.reset_index(inplace=True)
X_train = X_train.drop(['index'], axis=1)

In [None]:
X_test.income = std.transform(X_test.income.values.reshape(-1, 1))
X_test.age = std.fit_transform(X_test.age.values.reshape(-1, 1))

X_test.reset_index(inplace=True)
X_test = X_test.drop(['index'], axis=1)

In [None]:
# Convert the pandas dataframe into numpy array

X_train = X_train.values
X_test = X_test.values
y_train = y_train.values
y_test = y_test.values

#### Build a model :

In [None]:
ann = keras.models.Sequential()

In [None]:
ann.add(keras.layers.Dense(32, input_dim=7, kernel_initializer = 'normal' ,activation='relu'))
ann.add(keras.layers.Dense(15, kernel_initializer = 'normal' ,activation='relu'))
ann.add(keras.layers.Dense(10, kernel_initializer = 'normal' ,activation='relu'))
ann.add(keras.layers.Dense(6, kernel_initializer = 'normal' ,activation='relu'))
ann.add(keras.layers.Dense(4, kernel_initializer = 'normal' ,activation = 'softmax'))

In [None]:
ann.compile(optimizer = 'adam', 
            loss = 'sparse_categorical_crossentropy', 
            metrics = ['accuracy'])

In [None]:
ann_history = ann.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=15, batch_size=100 , verbose = 2)

In [None]:
ann.summary()

In [None]:
ann.history.params

In [None]:
pd.DataFrame(ann.history.history).plot(figsize=(8,5))
plt.grid(True)
plt.gca().set_ylim(0,1)               # Y AXIS RANGE LIMIT 
plt.show()

### Evaluation Martrix :

- evaluation can be done on the basis of accuracy obtained by the model and by Loss produced .

In [None]:
ann.evaluate(X_test , y_test)

In [None]:
# TO PERFORM OF CHECK THE RESULT IN NEW DATA SET
# AS WE DON'T HAVE NEW DATA SET , CREATE ONE FROM TEST DATA SET
# HOW TO PREDICT THE PROBABILITY and CLASSES IN UNSEEN DATA


x_new = X_test[:3]

In [None]:
# PROBABILITY OF EACH SET

y_prob = ann.predict(x_new)
y_prob.round(2)                        # RESULT IN 2 DECIMAL PLACE 

In [None]:
# CLASS OF EACH SET 

y_pred = ann.predict_classes(x_new)
y_pred

In [None]:
np.array(class_name)[y_pred]

## Observation :

- there is no difference between the 1st and 2nd model .
- the events are wrongly predicted as 'offer received'

# Conclusion :

- I found this project challenging, mainly due to the structure of the data in the transcript dataset.
- Majority classes are performing well but the minorities are not.Problem of imbalanced dataset 
- Most of the events are wrongly predicted as 'offer received'; offer received is the most occuring event or class.

**Main challenges and potential improvement:**
 Analysing and building the deep learning models .
 
 - The main goal I chose, was to build something practical the company could use make their choices more efficient.
 - But the results of the model seems like not so good . There is no change in rate of accuracy it remain constant .

# References : 

- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html
- https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.merge.html
- https://stackoverflow.com/questions/37600711/pandas-split-column-into-multiple-columns-by-comma
- https://www.researchgate.net/post/Is_there_a_universal_method_rule_to_choose_the_activation_function_for_a_MLP_neural_network#:~:text=For%20binary%20classification%20(i.e.%20problems,entropy%20as%20the%20cost%20function.
- https://stackoverflow.com/questions/55324762/the-added-layer-must-be-an-instance-of-class-layer-found-tensorflow-python-ke
- https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/
- https://www.tensorflow.org/tutorials/keras/classification

In [None]:
from subprocess import call
call(['python', '-m', 'nbconvert', 'Starbucks_Capstone_notebook.ipynb'])