Skip to content

chihming/DataTransformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

(This project is not maintained.)

DataTransformer

A simple tool for Data Splitting and Data Encoding.

Usage

For Data Processing:

python DataProcess.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]

For Data Encoding:

python DataEncode.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]

Since no third-party package is used in this tool, so it supports pypy for fast execution.

pypy DataProcess.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]
pypy DataEncode.py -task [Task] -infile [InputFile] -outfile [Outputfile] [Options]

More parameter options can be found in --help or wiki page (not finished for now).

python main.py --help

File Format

Supported Task

DataProcess.py

  • dsplit -- split data into train & test
  • djoin -- join relational feature to data

DataEncode.py

  • data2sparse -- convert general data into sparse data format
  • data2rel -- convert general data into relational data format

TODO Task

  • sparse2rel -- convert sparse data into relational data format
  • data2vw -- convert general data into Vowpal Wabbit (VW) data format
  • sparse2vw -- convert sparse dataformat into VW format
  • vw2sparse -- convert VW dataformat into sparse format

Supported Encoding Method

  • -cat -- like one-hot encode, usually for categorical feature (supports for multi-labeled features)
  • -num -- directly use the value, usually for numerical data
  • -knn -- automatically get similar features as meta features

TODO Encoding Method

  • -wcat -- encode multi-labeled features with different weights

Demo

About

An easy-to-use tool for Data Splitting and Data Encoding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages