Skip to content

For my capstone project at Juno College. Contains Exploratory Anlysis and a recommender system using K-Nearest Neighbors alogrithm and cosine similarity to provide User-User Collaborative Filtering based product recommendations.

baileywolkoff/Olist-Recommender-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OLIST Ecommerce - Analysis and Product Recommendation System

This repository is the capstone project for my Juno College Data Analytics Bootcamp. I have used excel, python, and Tableau to investigate OLIST's Eccommerce Data, and build a product recommendation system for their users. The purpose of this project is to test my ivestigative analysis, as well as challenge myself to learn how to use machine learning to build a recommendation system, using a variety of different algorithms and techniques. I still have much to learn and implement, as this is just a prototype version as it stands.

  • Accompanying Dashboard can be found Here where I take an overview look at OLIST's KPI Metrics.

The Data

The Data was taken from Kaggle where OLIST had uploaded real ecommerce data. Product, Seller, and Customer ID's were used to protect the privacy of all who would be affected.

This data came as a set of muiltiple CSV files, each being linked to one another using keys. A Data Schema accompanied the dataset to make this easier and more straightforward on the user:

The Files used are:

  1. customers_dataset.csv
  2. geolocation_dataset.csv
  3. order_item_dataset.csv
  4. order_payments_dataset.csv
  5. order_reviews.csv
  6. orders_dataset.csv
  7. products_dataset.csv
  8. sellers_dataset.csv
  9. product_category_name_translation.csv

Ethical Concerns

The Data used unidentifiable keys to protect customer and seller privacy. It should however be noted that a recommendation system can have concerns, as it may suggest a product for a user that is deemed inappropriate. This is currently unavoidable as I do not have information on the user, or the actual product name. I am also missing a lot of information on customers, and only have the rough location (City) from which they placed their order.

References

  • Big thank you to Yohan Jeong who's article on item based collabrotive filtering I followed very closely for my preliminary model.
  • Krish Naik who's youtube videos helped understand collaborative filtering.
  • Anna Barentz who's Dashboard inspired me.

Notes:

  • order_item_id is like that particular item number if customer ordered more than 1 one of the same product in that order

    • to find how many of a particular item was ordered: groupby('order_id')['order_item_id'].max()
  • want to look at most purchased category by month

    • need to join with orders to find purchase month

About

For my capstone project at Juno College. Contains Exploratory Anlysis and a recommender system using K-Nearest Neighbors alogrithm and cosine similarity to provide User-User Collaborative Filtering based product recommendations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published