(This project is not maintained.)
A simple tool for Data Splitting and Data Encoding.
For Data Processing:
python DataProcess.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]
For Data Encoding:
python DataEncode.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]
Since no third-party package is used in this tool, so it supports pypy for fast execution.
pypy DataProcess.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]
pypy DataEncode.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]
More parameter options can be found in --help
or wiki page (not finished for now).
python main.py --help
DataProcess.py
DataEncode.py
- data2sparse -- convert general data into sparse data format
- data2rel -- convert general data into relational data format
- sparse2rel -- convert sparse data into relational data format
- data2vw -- convert general data into Vowpal Wabbit (VW) data format
- sparse2vw -- convert sparse dataformat into VW format
- vw2sparse -- convert VW dataformat into sparse format
-cat
-- like one-hot encode, usually for categorical feature (supports for multi-labeled features)-num
-- directly use the value, usually for numerical data-knn
-- automatically get similar features as meta features
-wcat
-- encode multi-labeled features with different weights