-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support N-x configuration for To option to skip trailer rows #356
Comments
Could you provide a sample of what you expect, that would help the understanding. |
sure. eg csv: "Data","Data","Data","More Data","More and more data","Final data field" Instead of processing that trailer row, I want to skip it. So, I would want to configure the parser: skip_trailing: 1 Does that make more sense? |
Not really, do you want this: const data = "a,b,c\nRows:1"'
const recors = parse(data, {skip_trailing: 1})
records.should.eql(["a","b","c"]) |
Yes! Thank you for reading my mind, since apparently my communication skills are weak! |
If you think this is a viable idea, and you would like me to take a stab at it, I'm happy to submit a pull request later this week. If you don't think it's a good idea, or would rather do it yourself, I won't dive in. |
Hum, I don't think that possible. The way csv-parse is created is to handle an unlimited number of records. This use-case involve knowing in advance how many records there are to skip the last n-records. This isn't scalable. |
I'm already doing it outside your parser by buffering the N trailing rows to drop in a rotating queue and then dropping the queue on end-of-stream; i could implement that in the parser if you wanted the functionality. |
Currious to see your code. Not sure that I want to make the parser more complex but let me look first at how you are doing this. |
Summary
In csv-parse, the 'to' and 'to_line' options do not adequately support stripping trailing records. Recommend adding functionality that would either allow a syntax to these options, eg: 'T.1' to indicate stopping at end-1.
Motivation
We process some files that include a trailing "summary" record for the file. These are difficult to deal with.
Alternative
We could pre-process the file to strip these trailing rows, but it can be tricky if they contain fields with embedded newlines in field.
Draft
This can be easily implemented by dropping each parsed row into a holding queue; by pushing the configured "T-minus" row along the pipe rather than the current one, and simply dropping the queue at EOF.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: