Home

Workflow

An example of the workflow to run entity2rec (v1.0) on the Movielens dataset.

entity2rec workflow

Data preparation

Before starting:

mkdir datasets/movielens_1m

mkdir datasets/movielens_1m/graphs

Move your train and test files inside datasets/movielens:

mv train.dat datasets/movielens_1m/train.dat

mv test.dat datasets/movielens_1m/test.dat

(1): cut -f1,2 datasets/movielens_1m/train.dat -d' ' | awk -F' ' '$3 >= 4' > datasets/movielens_1m/graphs/feedback.edgelist

The training and test set must have the format:

user_id item_id rating timestamp

where the user_id should be an integer, possibly preceded by the string user (i.e. 13 or user13) and the field separator is a space.

(2): It can be done manually if using local data or querying a SPARQL endpoint using the --sparql argument (only the first time, as it will download locally the files) which will use the properties specified in the config.json file or look for all the properties existing in the knowledge graph. If the name of the dataset is not found in the config/properties.json file and the --sparql argument is not specified, it will take all the files found in datasets/movielens_1m/

Feature generation

(3) + (4): python src/entity2rec.py --dataset movielens_1m --train datasets/movielens_1m/train.dat --test datasets/movielens_1m/test.dat --run_all

entity2rec accepts all the params of entity2vec and, in addition:

option	default	description
`train`	null (Required)	Path of the train set
`test`	null (Required)	Path of the test set
`run_all`	false	If `true`, it runs entity2vec to compute the embeddings before the recommendation task (in this case, it is suggested to add also the related command line arguments (https://github.com/MultimediaSemantics/entity2vec)). Otherwise, it expects that the embeddings are in the `emb/` folder. Note that this needs to be done only the first time.
`implicit`	false	If `true`, it expects that the ratings are binary values (0/1) instead of a range of scores

As an output, entity2rec will generate a set of property-specific relatedness scores in the SVM format inside the folder features/my_dataset:

1 qid:1 1:0.186378 2:0.000000 3:0.329318 4:0.420169 5:0.000000 6:0.407551 7:0.000000 8:0.355113 9:0.198874 10:0.146273 11:0.354844 # http://dbpedia.org/resource/The_Secret_Garden_(1993_film)

Learning to Rank

(5): From ranking/ folder: java -jar RankLib-2.1-patched.jar -train ../features/movielens_1m/p1_q1/train_p1_q1.svm -ranker 6 -metric2t P@10 -tvs 0.9 -test ../features/movielens_1m/p1_q1/test_p1_q1.svm -norm sum

This command runs RankLib to learn the global relatedness model.

+Memory consumption +The ranking may require a large amount of memory to be allocated to the Java process to avoid out of memory errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Workflow

Data preparation

Feature generation

Learning to Rank

Clone this wiki locally