Yelp Recommender System

Implementation to predict users' ratings on Yelp dataset based on various recommender system implementations.

This readme is a brief tutorial on how to run the project and re-produce the results in the report. In case of any question during running, please kindly contact chenqy@umich.edu and yilegu@umich.edu.

Pre-requisite:

Dataset: Yelp dataset (https://www.yelp.com/dataset), unzipped and put in the directory ./dataset. In ./dataset there should be five .json files. For example, the path to one of the .json file will be ./dataset/yelp_academic_dataset_business.json. To download the generated csv and json files, please visit our shared folder on Google Drive (https://drive.google.com/drive/folders/1xzkQkUKBEKK87HafFh0gs1etXGCUaE_7?usp=sharing).
Packages: Many of these are quite common packages and are likely to have been installed on users' platforms. In the case of missing a package, pip should install them fine. json numpy pandas collections matplotlib dataprocessing surprise lightfm sklearn plotly datapane

How to run the project:

Run "python3 filter_GA_restaurants_and_users.py" to get two files, "GA_restaurants.json" and "GA_users.json". This step filters our data of interest and limit the user and business data to a reasonable range as specified in the report. Note that the processed json files are included as part of the dataset.
Run "python3 GA_resturants_and_users_to_index.py" to create a mapping of users and businesses to their corresponding indices. You should have "GA_restaurants_indices.json" and "GA_users_indices.json" by then. Note that the processed json files are included as part of the dataset.
"python3 datapreprocessing.py" defines several useful functions for processing the filtered data. Run function "GA_processing()" to create the rating matrix R. It should generate a file called "GA.csv". Note that due to the size of generated csv file, it is not included in the zip file.
Run "python3 baseline_model.py" to use the baseline model and get corresponding metrics like RMSE. The results will be written to a file called "baseline_model_result.txt".
Run "python3 SVD_model.py" to search for hyperparameters and get the best set of hyperparameters for SVD model. Training results with RMSE metrics will be written to a file called "SVD_model_result.txt".
Run "python3 NMF_model.py" to search for hyperparameters and get the best set of hyperparameters for NMF model. Training results with RMSE metrics will be written to a file called "NMF_model_result.txt".
Run "python3 lightFM_model_FeatureBased.py" to train the feature-based lightFM model and get corrsponding metrics like AUC score. Additionally, the code will generate the most similar categories for "American (Traditional)", "Japanese" and "Cafes".
When needed, run "python3 prediction_recall_k.py" with the best set of hyperparamters to get metrics like AUC score for baseline, SVD and NMF models.
"python3 visualization.py" defines several functions to generate the plots in the report. Run function "generate_embeddings()" to generate the 2D embedding for SVD model.

MAKEFILE Usage

Do not run make commands on MAKEFILE since it is written in a bash style. Also note that some of the python files (specified in the previous section) only define useful functions for users to run. In order to run the project correctly, the user has to define the functions he or she wants to run in the corresponding python file.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
CODE		CODE
plots		plots
GA_restaurants.json		GA_restaurants.json
GA_restaurants_indices.json		GA_restaurants_indices.json
GA_resturants_and_users_to_index.py		GA_resturants_and_users_to_index.py
GA_users.json		GA_users.json
GA_users_indices.json		GA_users_indices.json
MAKEFILE		MAKEFILE
NMF_model.py		NMF_model.py
README.md		README.md
SVD_model.py		SVD_model.py
SVD_model_result.txt		SVD_model_result.txt
SVD_model_result2.txt		SVD_model_result2.txt
baseline.py		baseline.py
baseline_model.py		baseline_model.py
baseline_model_result.txt		baseline_model_result.txt
baseline_results.txt		baseline_results.txt
baseline_results_new.txt		baseline_results_new.txt
dataprocessing.py		dataprocessing.py
filter_GA_restaurants_and_users.py		filter_GA_restaurants_and_users.py
form_input_matrix.py		form_input_matrix.py
lightFM_model_CF.py		lightFM_model_CF.py
lightFM_model_FeatureBased.py		lightFM_model_FeatureBased.py
prediction_recall_k_solver.py		prediction_recall_k_solver.py
report.pdf		report.pdf
run.sh		run.sh
visualization.py		visualization.py

IKACE/Yelp-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

Yelp Recommender System

Pre-requisite:

How to run the project:

MAKEFILE Usage

Results

Business Per State

2D Embedding of Restaurants

Scores

About

Resources

Stars

Watchers

Forks

Languages