Skip to content
Enrico Palumbo edited this page Nov 21, 2018 · 13 revisions

Workflow

An example of the workflow to run entity2rec (v1.0) on the Movielens dataset.

entity2rec workflow

Data preparation

Before starting:

mkdir datasets/movielens_1m

mkdir datasets/movielens_1m/graphs

Move your train and test files inside datasets/movielens:

mv train.dat datasets/movielens_1m/train.dat

mv test.dat datasets/movielens_1m/test.dat

(1): cut -f1,2 datasets/movielens_1m/train.dat -d' ' | awk -F' ' '$3 >= 4' > datasets/movielens_1m/graphs/feedback.edgelist

The training and test set must have the format:

user_id item_id rating timestamp

where the user_id should be an integer, possibly preceded by the string user (i.e. 13 or user13) and the field separator is a space.

(2): It can be done manually if using local data or querying a SPARQL endpoint using the --sparql argument (only the first time, as it will download locally the files) which will use the properties specified in the config.json file or look for all the properties existing in the knowledge graph. If the name of the dataset is not found in the config/properties.json file and the --sparql argument is not specified, it will take all the files found in datasets/movielens_1m/

Feature generation

(3) + (4): python src/entity2rec.py --dataset movielens_1m --train datasets/movielens_1m/train.dat --test datasets/movielens_1m/test.dat --run_all

entity2rec accepts all the params of entity2vec and, in addition:

option default description
train null (Required) Path of the train set
test null (Required) Path of the test set
run_all false If true, it runs entity2vec to compute the embeddings before the recommendation task (in this case, it is suggested to add also the related command line arguments (https://github.com/MultimediaSemantics/entity2vec)). Otherwise, it expects that the embeddings are in the emb/ folder. Note that this needs to be done only the first time.
implicit false If true, it expects that the ratings are binary values (0/1) instead of a range of scores

As an output, entity2rec will generate a set of property-specific relatedness scores in the SVM format inside the folder features/my_dataset:

1 qid:1 1:0.186378 2:0.000000 3:0.329318 4:0.420169 5:0.000000 6:0.407551 7:0.000000 8:0.355113 9:0.198874 10:0.146273 11:0.354844 # http://dbpedia.org/resource/The_Secret_Garden_(1993_film)

Learning to Rank

(5): From ranking/ folder: java -jar RankLib-2.1-patched.jar -train ../features/movielens_1m/p1_q1/train_p1_q1.svm -ranker 6 -metric2t P@10 -tvs 0.9 -test ../features/movielens_1m/p1_q1/test_p1_q1.svm -norm sum

This command runs RankLib to learn the global relatedness model.

+Memory consumption +The ranking may require a large amount of memory to be allocated to the Java process to avoid out of memory errors.

Clone this wiki locally