Starbucks-Capstone-Project

Overview and Motivation

In this capstone project, I worked with datasets from Starbucks that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). The task was to combine transaction, demographic and offer data to determine which demographic groups respond best to offers overall. This project derived insights by merging the following 3 datasets:

Rewards program users (17000 users x 5 fields)
Offers sent during 30-day test period (10 offers x 6 fields)
Event log (306648 events x 4 fields)

Problem Statement

I’ve used the CRISP-DM process to understand the datasets and derive insights to make recommendations to the Starbucks team. Through this capstone project, I wanted to understand the following:

How does offer response depend on gender, age and income?
What type of demographic is least affected by offers?
What are the top 2 offers for each customer segment?
If we're given demographic features about a customer, how can we determine if they'll respond to a BOGO or a discount offer?

Analysis Summary

Impact of Gender, Age and Income

Females have a higher completion ratio than males.
The offer completion % is also increasing as the income level increase
We can’t conclude over here that the increase in income results in increase in completion % because offers received by the higher income levels is much lesser than that of the lower income levels
This might be due to the fact that there are fewer customers in the higher income brackets. Starbucks needs to ensure that the higher level income brackets receive more offers per customer than they do currently
Similarly, with respect to age and gender, females and males between ages 48–68 have received the most number of offers and their offer completion rate is also one of the highest
We can also see that offer completion ratios for customers between 98–108 years is the highest, and I would caution again to ignore this stat because of the less data we have at our disposal for this age segment

Demographic Least Affected by Offers

Customers with higher income (>$84,000) are mainly concentrated in the medium and low impact category
Customers between 17 and 30 years of age are also mainly concentrated in the medium and low impact category

Top 2 Offers for each customer segment

Please refer to the Jupyter notebook for the detailed analysis

Conclusion

Overall, this was a great challenging capstone project to test my knowledge from the last 5 months on the course. If I look back at where I started, I wouldn’t ever imagine to be at this place 5 months down the line.

One of the interesting and difficult aspects of this project was it was open-ended and up to me as to how I analyzed and published my recommendation. It was a little daunting when I started but as I got into the problem, it was easy for me to connect the dots and arrive at the results. I was able to answer the questions I had in the beginning. It also was very relatable because in the real world, problems are open-ended and it’s up to the data scientist to figure out how to derive the insights and showcase it to the audience.

Repository Contents

Data Folder

a. portfolio.json
- Rewards program users (17000 users x 5 fields)
- gender: (categorical) M, F, O, or null
- age: (numeric) missing value encoded as 118
- id: (string/hash)
- became_member_on: (date) format YYYYMMDD
- income: (numeric)
b. profile.json
- Offers sent during 30-day test period (10 offers x 6 fields)
- reward: (numeric) money awarded for the amount spent
- channels: (list) web, email, mobile, social
- difficulty: (numeric) money required to be spent to receive reward
- duration: (numeric) time for offer to be open, in days
- offer_type: (string) bogo, discount, informational
- id: (string/hash)
c. transcript.json Could not upload it here due to size.
- Event log (306648 events x 4 fields)
- person: (string/hash)
- event: (string) offer received, offer viewed, transaction, offer completed
- value: (dictionary) different values depending on event type
- offer id: (string/hash) not associated with any "transaction"
- amount: (numeric) money spent in "transaction"
- reward: (numeric) money gained from "offer completed"
- time: (numeric) hours after start of test
Starbucks_Capstone_notebook.ipynb: Jupyter Notebook with the Analysis

Blog Post

I've explained my approach and analysis through a blog on Medium

Libraries

pandas
numpy
math
json
matplotlib
seaborn
sklearn

Authors

Nihal Shah

Acknowledgement

Kaggle Udacity Starbucks

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
README.md		README.md
Starbucks_Capstone_notebook.ipynb		Starbucks_Capstone_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

README.md

Starbucks_Capstone_notebook.ipynb

Starbucks_Capstone_notebook.ipynb

Repository files navigation

Starbucks-Capstone-Project

Overview and Motivation

Problem Statement

Analysis Summary

Conclusion

Repository Contents

Blog Post

Libraries

Authors

Acknowledgement

About

Releases

Packages

Languages

NIHALDSM10/Starbucks-Capstone-Project

Folders and files

Latest commit

History

Repository files navigation

Starbucks-Capstone-Project

Overview and Motivation

Problem Statement

Analysis Summary

Conclusion

Repository Contents

Blog Post

Libraries

Authors

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages