Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP
Open a large CSV file line-by-line, applying headers to each row
JavaScript
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib/readline
test
.gitignore
.travis.yml
LICENSE
README.md
index.js
package.json

README.md

streamCSV

v0.0.2

Read a CSV or TSV file line-by-line, applying headers to each row and guessing the correct type for each entry. Convenient for very large files.

Build Status

Installation

npm install node-stream-csv

How to use

streamCSV takes three arguments: A filename, a callback for each row, and a callback when it's done. The last is optional.

Examples

These are all examples from test/test.js

var streamCSV = require("node-stream-csv");

streamCSV("state_population.tsv", function(state) {
    console.log(state);
});

/* Input

SUMLEV  REGION  DIVISION    STATE   NAME    POPESTIMATE2013 POPEST18PLUS2013    PCNT_POPEST18PLUS
10  0   0   0   United States   316128839   242542967   76.7
40  3   6   1   Alabama 4833722 3722241 77
40  4   9   2   Alaska  735132  547000  74.4
*/

/* Output:
{ SUMLEV: 10,
  REGION: 0,
  DIVISION: 0,
  STATE: 0,
  NAME: 'United States',
  POPESTIMATE2013: 316128839,
  POPEST18PLUS2013: 242542967,
  PCNT_POPEST18PLUS: 76.7 }
{ SUMLEV: 40,
  REGION: 3,
  DIVISION: 6,
  STATE: 1,
  NAME: 'Alabama',
  POPESTIMATE2013: 4833722,
  POPEST18PLUS2013: 3722241,
  PCNT_POPEST18PLUS: 77 }
{ SUMLEV: 40,
  REGION: 4,
  DIVISION: 9,
  STATE: 2,
  NAME: 'Alaska',
  POPESTIMATE2013: 735132,
  POPEST18PLUS2013: 547000,
  PCNT_POPEST18PLUS: 74.4 }   
*/

Note that the module correctly interpreted the numbers as numbers. If you don't want this behavior, such as when you've got leading zeroes you want to preserve, pass an options object to the first argument with dontguess:

streamCSV(
    {
        filename: "county_population.csv",
        dontguess: true
    },
    function(datum, c) {
        process.stdout.write("Read " + c + " lines from county file.\r");
    },
    function() { // on close 
        cb();
    }       
);

/* Output 
{ SUMLEV: '050',
  REGION: '3',
  DIVISION: '6',
  STATE: '01',
  COUNTY: '001',
  STNAME: 'Alabama',
  CTYNAME: 'Autauga County',
  CENSUS2010POP: '54571'
  // etc etc
}

Options

  • dontguess: No type guessing, just import every data point as a string. You can also pass a comma-separated list of fields not to guess on

  • noheader: The first line is a regular line of data, not a set of headers. Just give me back arrays of each line.

  • sparse: If there is a header but the field value is blank, don't include it in the object for that row

  • mode: "csv" or "tsv". If unspecified, guesses from file extension.

Dependencies

  • This module uses d3's csv parser on each line.

  • It uses a fork of maleck13's readline module to convert the filestream into line-by-line events. The fork allows for the fact that most readline modules, including the core module, often miss the last line of the file. The fork is distributed in this repo without the git files.

Something went wrong with that request. Please try again.