Skip to content

A set of simple Python scripts for pre-processing large files csv <--> libsvm <--> vowpal

License

Notifications You must be signed in to change notification settings

directorscut82/phraug2

 
 

Repository files navigation

phraug2

A new version of phraug (pron. frog) with improved command line arguments parsing, thanks to jofusa.

This is a set of simple Python scripts for pre-processing large files, things like splitting and format conversion. The names phraug comes from a great book, Made to Stick, by Chip and Dan Heath.

See http://fastml.com/processing-large-files-line-by-line/ for the basic idea.

There's always at least one input file and usually one or more output files. An input file always stays unchanged.

For documentation:

Example:

>python split.py
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
				input_file output_file1 output_file2
split.py: error: too few arguments

>python split.py -h
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
				input_file output_file1 output_file2

split a file into two randomly, line by line.

positional arguments:
  input_file            path to an input file
  output_file1          path to the first output file
  output_file2          path to the second output file

optional arguments:
  -h, --help            show this help message and exit
  -p PROBABILITY, --probability PROBABILITY
						probability of writing to the first file (default 0.9)
  -r RANDOM_SEED, --random_seed RANDOM_SEED
						random seed
  -s, --skip_headers    skip the header line
  -c, --copy_headers    copy the header line to both output files

About

A set of simple Python scripts for pre-processing large files csv <--> libsvm <--> vowpal

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%