A simple utility to parse and split csv-files into manageable chunks
C
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Makefile
README.md
csv-buf.c
csv-buf.h
csv-split.1
csv-split.c
csv-split.h
csv.h
libcsv.c
queue.c
queue.h

README.md

csv-split

csv-split is a simple utility that can parse and break up large CSV files into smaller peices, with various options on how it does that.


Compiling


Clone the repository, cd into the directory, and run the following:

make && make install 

Usage


csv-split [OPTIONS] FILE OUT-PATH

or

csv-split [OPTIONS] --stdin PREFIX OUT-PATH

  • --g, --group-col The zero based column with values that must remain together. If specified, csv-split will not seperate rows with the same value in this column acros multiple files. This assumes the file is already sorted by this colum, as csv-split doesn't sort the file.

  • -n, --num-rows The maximum number of rows to put in each file. If we're grouping column values (see above), you can end up with files with slightly more rows

  • --stdin If specified, csv-split will read data from STDIN instead of a provided file, and the file argument will be treated as a prefix to use when writing output chunks.

  • -t, --trigger Each time csv-split writes a file, it can be configured to run a command specified by this option. Two environment variables will be set prior to the execution of the command:

    CSV-PAYLOAD_FILE -- The filename that was written
    CSV_ROWCOUNT     -- How many rows are in this file
    
  • -d, --header If you pass the --header option, csv-split will treat the first row of the input csv file as a header and inject it into each split file. By default, the header row is not counted toward the total number of rows written per file, but can be counted if you pass 1 to this argument (e.g. -d1, --header=1).