Skip to content

Latest commit

 

History

History
112 lines (98 loc) · 6.45 KB

File metadata and controls

112 lines (98 loc) · 6.45 KB

Instacart-Market-Basket-Analysis 🛒

graph

Project Description

Open directly from kaggle Notebooks!

EDA and Interactive Dashboard by Power BI 📊

  • General Analysis.
  • Analyzing Behavour of users: users who always order same products.
  • How Time affects the purchasing behaviour of customers?
  • Analyzing products
  • Analyzing Organic Prodcuts.
  • Purchasing behaviour on Departments and Aisles.
powerBI_dashboard.mp4

Next Order Recommender/Predictor 🍞 🍟 🍩

A predictive analysis model , that predicts the products ordered in users' future order based on each purchasing history. Primary Key is the user-product pair to predict whether will be in the future order or not.

XGBoost Classifier was used.

Features with highest importance used by the model:

  • up_orders_since_last_order: measures how long the user hasn't considered buying a specific product.
  • up_order_rate_since_first_time: measures the degree a user like a product. It's the ratio by which a user will buy a product from the first moment he/she knew about it.
  • prod_reorder_ratio: measures how customers in general like a product.
  • user_reorder_ratio: measures how this user is likely to buy something new! Feature Importance

Association Rules 🍌 ➡️ 🍅

We have used the Apriori algorithm to extract assciation rules embedded in instacart's data. The following functions are implemented to later serve as an API calls in the deployed version:

Function Documentation
most_10_frequent_items Takes the cardinality of the itemset and return the most 10 frequent item-sets of that cardinality.
all_items_with_at_least_support_and_len Returns the itemset of a specific cardinality and satisfying a minimum support.
show_itemset_support Returns the support of a given item-set
rules_with_specific_threshold Return (Filter) rules satisfying a given threshold. Threshold can be on confidence, support, lift, leverage and conviction.
select_rules_with_antecedents_length Return rules with a specific antecedents cardinality.
select_rules_with_antecedents_names Return rules of a specific antecedent.
select_rules_with_consequents_names Return rules of a specific consequent.

Deploying a user use-case

deploying_example.mp4

Project Challenges 🙀

Next Order Recommender

  • Data is sparse, we have very large number of products and of course the customer will have very few in his/her next order. Data is very skewed to the negative class. Class distribution: 90% negative class, 10% positive class.
  • First, we've found that there's alot of false negatives, do We changed the threshold to maximize the recall, while keeping the precision above a certain threshold [0.3].
  • In ther words, we wanted to reduce, the false negatives, the number of products the model say user won't predict in the future while he/she will actually does. On the other side, it's okay to allow some false positives, when the model recommends a products the user will less likely buy in his/her next order.

The PR-Curve

Project Organization

./
├── EDA
|     ├── eda-on-instacart-data.ipynb 
|     └── Instacart Power Bi dashboard.pdf            
├── Model
|     ├── Association Rules Using Apriori.ipynb                                              
|     └── predictive-analysis-model.ipynb 
├── Business Insights 
|     ├── Business Questions-Solution.pdf                                              
|     └── Project-Data Description.pdf
├── Example of Deployment
|     ├── model.html
|     ├── index.html
|     ├── rules.html
|     ├── js
|     └── css
├── Presentation
|     └── DS - Presentation.pptx 
└── images

Todos

  • Applying dynamic thresholding on user's products according to his/her average basket size.
  • Continue developing the time-dependent features.
  • Apply oversampling and undersampling, and observe whether this will help the imbalanced class problem.
  • Deploy the Full Model.

Contributors

Toka Khaled
Toka Khaled
Noran Hany
Noran Hany