Skip to content

CSV_INVALID_CLOSING_QUOTE in large datasets #265

@sehrgut

Description

@sehrgut

I'm using node-csv to pluck data from large CSV files. On extremely large datasets (>500MB), I'm getting this. I'm pretty sure it's related to incorrect handling of a buffer boundary, because the location of the error shifts if I change the preceding data. For instance, this particular instance disappeared when I shifted the 100,000-line window by 1 line (using head to pipe gunzip output to node).

It doesn't seem to be strictly size-based, since I have been able to parse up to 250MB before seeing this error, though it is reproducible by searching for subsets that exhibit this behaviour in as little as 30MB. That's why I think it's likely due to a buffer boundary occurring within a quoted object.

All data in one of these subsets is parseable if divided into small batches, so I'm certain it's not a syntax error.

Unfortunately, the dataset is logs that can't be shared for security reasons, and I haven't had a change to try constructing a random dataset exhibiting this behaviour yet.

events.js:292
      throw er; // Unhandled 'error' event
      ^

CsvError: Invalid Closing Quote: got " " at line 90262 instead of delimiter, row delimiter, trimable character (if activated) or comment
    at Parser.__parse (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/csv-parse/lib/index.js:529:17)
    at Parser._transform (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/csv-parse/lib/index.js:403:22)
    at Parser.Transform._read (_stream_transform.js:191:10)
    at Parser.Transform._write (_stream_transform.js:179:12)
    at doWrite (_stream_writable.js:403:12)
    at writeOrBuffer (_stream_writable.js:387:5)
    at Parser.Writable.write (_stream_writable.js:318:11)
    at /Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:640:33
    at Stream.s._send (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1560:9)
    at Stream.write (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1661:18)
Emitted 'error' event on Stream instance at:
    at Stream._send (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:998:18)
    at push (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1526:19)
    at /Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:2212:13
    at Stream.s._send (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1560:9)
    at Stream.write (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1658:18)
    at Stream._send (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:984:26)
    at push (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1526:19)
    at /Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:2212:13
    at Stream.s._send (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1560:9)
    at Stream.write (/Users/USERNAME/Documents/src/tools/mobile-logspam/node_modules/highland/lib/index.js:1658:18) {
  code: 'CSV_INVALID_CLOSING_QUOTE',
  column: 'log',
  empty_lines: 0,
  header: false,
  index: 6,
  invalid_field_length: 0,
  quoting: false,
  lines: 90262,
  records: 90260
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions