-
Notifications
You must be signed in to change notification settings - Fork 44
Home
An example of the workflow to run entity2rec (v1.0) on the Movielens dataset.
Before starting:
mkdir datasets/movielens_1m
mkdir datasets/movielens_1m/graphs
Move your train and test files inside datasets/movielens:
mv train.dat datasets/movielens_1m/train.dat
mv test.dat datasets/movielens_1m/test.dat
(1): cut -f1,2 datasets/movielens_1m/train.dat -d' ' | awk -F' ' '$3 >= 4' > datasets/movielens_1m/graphs/feedback.edgelist
The training and test set must have the format:
user_id item_id rating timestamp
where the user_id
should be an integer, possibly preceded by the string user
(i.e. 13
or user13
) and the field separator is a space.
(2): It can be done manually if using local data or querying a SPARQL endpoint using the --sparql argument (only the first time, as it will download locally the files) which will use the properties specified in the config.json file or look for all the properties existing in the knowledge graph. If the name of the dataset is not found in the config/properties.json file and the --sparql argument is not specified, it will take all the files found in datasets/movielens_1m/
(3) + (4): python src/entity2rec.py --dataset movielens_1m --train datasets/movielens_1m/train.dat --test datasets/movielens_1m/test.dat --run_all
entity2rec accepts all the params of entity2vec and, in addition:
option | default | description |
---|---|---|
train |
null (Required) | Path of the train set |
test |
null (Required) | Path of the test set |
run_all |
false | If true , it runs entity2vec to compute the embeddings before the recommendation task (in this case, it is suggested to add also the related command line arguments (https://github.com/MultimediaSemantics/entity2vec)). Otherwise, it expects that the embeddings are in the emb/ folder. Note that this needs to be done only the first time. |
implicit |
false | If true , it expects that the ratings are binary values (0/1) instead of a range of scores |
As an output, entity2rec will generate a set of property-specific relatedness scores in the SVM format inside the folder features/my_dataset:
1 qid:1 1:0.186378 2:0.000000 3:0.329318 4:0.420169 5:0.000000 6:0.407551 7:0.000000 8:0.355113 9:0.198874 10:0.146273 11:0.354844 # http://dbpedia.org/resource/The_Secret_Garden_(1993_film)
(5): From ranking/ folder: java -jar RankLib-2.1-patched.jar -train ../features/movielens_1m/p1_q1/train_p1_q1.svm -ranker 6 -metric2t P@10 -tvs 0.9 -test ../features/movielens_1m/p1_q1/test_p1_q1.svm -norm sum
This command runs RankLib to learn the global relatedness model.
+Memory consumption +The ranking may require a large amount of memory to be allocated to the Java process to avoid out of memory errors.