Skip to content
Chih-Ming Chen edited this page May 25, 2018 · 5 revisions

Input

Training rating data [InputFile].train

9::userA::movieA::5
4::userA::movieB::10
5::userB::movieB::3
4::userC::movieA::8
8::userC::movieC::11
3::userD::movieA::2

Testing rating data [InputFile].test

4::userB::movieC::8
7::userD::movieC::11

Instructions

  • -task 'data2sparse': convert data to sparse data format
  • -infile [InputFile].train,[InputFile].test: input file names, split by ','
  • -outfile [OutputFile].train,[OutputFile].test: output file names, split by ','
  • -target 0: get column 0 as prediction target
  • -cat 1,2: categorical encoding on columns 1,2
  • -num 3: numerical encoding on column 3
  • -sep '::': split data by '::'
  • -header 0: no header

Command

python DataEncode.py -task 'data2sparse' -infile [InputFile].train,[InputFile].test -outfile [OutputFile].train,[OutputFile].test -target 0 -cat 1,2 -num 3 -sep '::' -header 0

Output

Encoded training data [Outputfile].train

9 1:1 5:1 8:5
4 1:1 6:1 8:10
5 2:1 6:1 8:3
4 3:1 5:1 8:8
8 3:1 7:1 8:11
3 4:1 5:1 8:2

Encoded testing data [Outputfile].test

4 2:1 7:1 8:8
7 4:1 7:1 8:11
Clone this wiki locally