Welcome to RecDB
RecDB is an Open Source Recommendation Engine Built Entirely Inside PostgreSQL 9.2. RecDB allows application developers to build recommendation applications in a heartbeat through a wide variety of built-in recommendation algorithms like user-user collaborative filtering, item-item collaborative filtering, singular value decomposition. Applications powered by RecDB can produce online and flexible personalized recommendations to end-users.
How to Get Source Code
You can check out the code, as follows:
$ git clone https://github.com/Sarwat/recdb-postgresql.git
Recommended Machine Specifications
RecDB is designed to be run on a Unix operating system. At least 1GB of RAM is recommended for most queries, though when working with very large data sets more RAM may be desirable, especially when you are not working with apriori (materialized) recommenders.
Building and Installation
Once you've synced with GitHub, the folder should contain the source code for PostgreSQL, as well as several Perl scripts in a directory named "./PostgreSQL/scripts/". If you are familiar with installing PostgreSQL from source, RecDB is installed the exact same way; for the sake of simplicity, however, we have included these scripts that will simplify the process. Note that the installation and remake-related scripts MUST be run from within the PostgreSQL folder in order for them to work correctly.
- Run the installation script install.pl.
perl scripts/install.pl [abs_path]
[abs_path] is the absolute path to the directory where you want PostgreSQL installed. The directory should exist before running this script. This will also create a folder "data" in the PostgreSQL folder; this is where the database will be located.
- Run the database server script pgbackend.pl.
The install.pl script stores the install path in a separate file, so there shouldn't be any need to specify it.
- In a second terminal, run the database interaction script pgfrontend.pl.
perl scripts/pgfrontend.pl [db_name] [server_host]
[db_name] is the name of the database that you intend to use. [server_host] is the address of the host server running the PostgreSQL backend. If this option is not specified, the script assumes it to be "localhost".
If you need to rebuild PostgreSQL, there are two options.
If you have not modified the grammar, you can do a quick rebuild with remake.pl.
If you have modified the grammar, you will need to do a longer rebuild with remakefull.pl.
perl scripts/remakefull.pl [abs_path]
[abs_path] is the absolute path to the directory where you want PostgreSQL installed. The directory should exist before running this script.
If you ever want to eliminate the current database , use the clean.pl script.
perl scripts/clean.pl [db_name] [server_host]
How RecDB Works
We provide the MovieLens data to build a "Hello-World" movie recommendation application using RecDB. You can load the data using the sql script called "initmovielens1mdatabase.sql" stored in "./PostgreSQL" directory. We provide the dataset at "./PostgreSQL/moviedata / MovieLens1M/" directory. For instance, the ratings (i.e., ml_ratings) table may have a schema as follows:
+-----------------------------+ | userid | itemid | ratingval | +-----------------------------+
Users may create recommenders apriori so that when a recommendation query is issued may be answer with less latency. The user needs to specify the ratings table in the ON clause and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings in the USING clause.
CREATE RECOMMENDER MovieRec ON ml_ratings USERS FROM userid ITEMS FROM itemid EVENTS FROM ratingval USING ItemCosCF
Currently, the available recommendation algorithms that could be passed to the USING clause are the following:
ItemCosCFItem-Item Collaborative Filtering using Cosine Similarity measure.
ItemPearCFItem-Item Collaborative Filtering using Pearson Correlation Similarity measure.
UserCosCFUser-User Collaborative Filtering using Cosine Similarity measure.
UserPearCFUser-User Collaborative Filtering using Cosine Similarity measure.
SVDSimon Funk Singular Value Decomposition.
Similarly, materialized recommenders can be removed with the following command:
DROP RECOMMENDER MovieRec
Note that if you query a materialized recommender, the three columns listed above will be the only ones returned, and attempting to reference any additional columns will result in an error.
In the recommendation query, the user needs to specify the ratings table and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings. For example, if ml_ratings(userid,itemid,ratingval) represents the ratings table in a movie recommendation application, then to recommend top-10 movies based on the rating predicted using Item-Item Collaborative filtering (applying cosine similarity measure) algorithm to user 1, the user writes the following SQL:
SELECT * FROM ml_ratings R RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF WHERE R.userid = 1 ORDER BY R.ratingval LIMIT 10
When you issue a query such as this, the only interesting data will come from the three columns specified in the RECOMMEND clause. Any other columns that exist in the specified ratings tables will be set to 0.
Note that if you do not specify which user(s) you want recommendations for, it will generate recommendations for all users, which can take an extremely long time to finish.
More Complex Queries
The main benefit of implementing the recommendation functionality inside a database engine (PostgreSQL) is to allow for integration with traditional database operations, e.g., selection, projection, join. For example, the following query recommends the top 10 Comedy movies to user 1. In order to do that, the query joins the recommendation with the Movies table and apply a filter on the movies genre column (genre LIKE '%Comedy%').
SELECT * FROM ml_ratings R, Movies M RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF WHERE R.userid = 1 AND M.movieid = R.itemid AND M.genre LIKE '%Comedy%' ORDER BY R.ratingval LIMIT 10
Recdb in Action: Recommendation Made Easy in Relational Databases. Mohamed Sarwat, James L. Avery, Mohamed F. Mokbel. Proceedings of the Very Large Databases Endowment, PVLDB 6 (12), 1242-1245, 2013
Recathon: A Middleware for Context-Aware Recommendation in Database Systems, Mohamed Sarwat, James L. Avery and Mohamed F. Mokbel. Proceedings of IEEE International Conference on Mobile Data Management MDM 2015, Pittsburgh, PA, USA June 2015
- Mohamed Sarwat [Twitter:https://twitter.com/MoSarwat]
- James Avery [Twitter:https://twitter.com/TheSoundDefense]
- Mohamed F. Mokbel [Twitter:https://twitter.com/MohamedMokbel]
Support or Contact
Having trouble with RecDB? contact firstname.lastname@example.org and we’ll help you sort it out.
Follow @Rec_DB on Twitter for updates