An analysis of a data set that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Different features in the dataset were analyzed and classification models were built to predict how a user would respond to an offer. For a more descriptive analysis, check out this Medium Article that I wrote.
This project is done as part of Udacity Data Science Nanodegree - capstone project.
This project uses the following libraries:
In order to get a local copy of the project, you will need to download the dataset from the data folder, and add it to the same folder of the project.
The project has the following:
- Starbucks_Capstone_notebook.ipynb
- data folder
The jupyter notebook contains all code and data analysis.
Download the jupyter notebook from this repository and have fun!
In this project, I will be analyzing data coming out of this app and try to find trends and relations between users information and offer data. Finally, I will build a machine learning model to predict whether a user will respond to an offer or not.
Each user on the application has an account that can include demographic information on the user. A user can make a purchase, receive an offer, view an offer or complete an offer. There are three types of offers that can be sent: buy-one-get-one (BOGO), discount, and informational.
- BOGO: a user needs to spend a certain amount to get a reward equal to that threshold amount.
- Discount: a user gains a reward equal to a fraction of the amount spent.
- Informational: mere advertisement for a drink
Problem Statement:
The problem that we are trying to answer is how does a customer respond when an offer is sent to them.
The strategy that we will be following is:
- Data preprocessing and cleaning: we will look deeper at the data and understand its content. Data will then be cleaned from anomalies, null values, and duplicates.
- Data analysis and visualization: data will be further analyzed and visualized to answer more detailed questions relating to our problem.
- Data modelling: we will try to build a machine learning model that will predict whether a user will complete an offer or not. model is evaluated based on f1-score.
Datasets: There are three datasets available:
- portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
- profile.json - demographic data for each customer
- transcript.json - records for transactions, offers received, offers viewed, and offers completed
Further information on the dataset can be found in the Jupytr notebook.
Distributed under the MIT License. See LICENSE
for more information.
Shahed - shahedmashni@gmail.com