This repository contains Python scripts that implement collaborative filtering using Matrix Factorization with Alternating Least Squares (MF-ALS) for hotels and restaurants, Restricted Boltzmann Machines (RBM) for attractions, and content-based filtering using cosine similarity for the "More Like This" feature. Additionally, there is a code for user profiling to solve the cold start problem, and a code for itinerary planning based on a starting address.
To run the scripts, you'll need:
- Python 3.x
- NumPy
- Pandas
- Scikit-learn
- TensorFlow
- Keras
- Geopy
You can install the required libraries using pip. To install all the libraries at once, run the following command:
pip install numpy pandas scikit-learn tensorflow keras geopy
The idea that users with similar preferences are likely to have similar opinions on items is the foundation of the popular method of collaborative filtering used to create recommendation systems.
Matrix Factorization is a method used to
decompose a matrix into lower dimensions matrices. It is
used to uncover latent factors from a given matrix of data.
The scripts will train a collaborative filtering model using MF-ALS, and then generate recommendations for each user based on their past ratings.
Restricted Boltzmann Machines (RBMs) are a type
of artificial neural network used for unsupervised learning.
The script will train an RBM model, and then generate recommendations for each user based on their past ratings.
Content-based filtering is used in the system to recommend places similar to the places selected by the user, represented in a section called “More Like This” below each place the user views.
Cosine Similarity is used in this system to determine relevant hotels, restaurants, and attractions based on similarities between:
- Keywords description of the attractions
- Amenities for hotels
- Cuisine types for restaurants
The script generate a list of similar places based on the features of the entered place.
A major obstacle for Collaborative Filtering is the cold
start problem, which refers to the difficulty in making
recommendations when the users or the items are new (new
user). Moreover, it occurs when you have a new system with
no existing users (new system). Two techniques were used
to tackle this problem which are User Profiling and
Surveying - Surveying was done using google forms.
Each category has a user profile, the user at the beginning chooses a set of features he likes (e.g. preferred amenities in hotels) and then a user-item CSV is created so that the models can train on.
The code will prompt you to enter a starting address, and it will then use the Geopy library to convert the address to longitude and latitude coordinates. It will then generate an itinerary for you based on your starting points, number of days, average starting time, average end time and the recommendations generated by the collaborative filtering and RBM models.
The data used for the collaborative filtering and RBM models are contained in CSV files. The data contains information about hotels, restaurants, and attractions. and user-ratings CSV files.