Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse csv content is not correct #70

Open
quangson91 opened this issue Feb 18, 2024 · 2 comments
Open

Parse csv content is not correct #70

quangson91 opened this issue Feb 18, 2024 · 2 comments
Labels

Comments

@quangson91
Copy link

quangson91 commented Feb 18, 2024

Hi,
Thank you for your hardword to make a great lib.

Currently, I found an issue with following csv content:

"A B", "C, D"

I think the result should be two fields: A B & C, D.
But actually, the lib parse result as three fields: A B, C and D;


Here is my sample code:

import 'package:csv/csv.dart';

void main() {
  final rows = const CsvToListConverter().convert('"A B", "C, D"\r\n');
  for (final row in rows) {
    print(row.join('_ '));
  }
}

Output

A B_  C_  D
@close2 close2 added the bug label Feb 19, 2024
@close2
Copy link
Owner

close2 commented Feb 19, 2024

Because of the space after the first , the library does not recognize " as a quote character.

This however also raises the question if we really want to simply ignore quote characters, if they appear inside unquoted strings.

We definitely shouldn't ignore whitespace. (and papa parse doesn't either: https://www.papaparse.com/demo )
I do intend to add the possibility to pre- and post-process data, which would take care of this.

Ignoring the quote characters is IMO a bug. Papa parse also doesn't ignore the quote character.

@quangson91
Copy link
Author

@close2
Thank for your fast reponse.

I also think it should consider a bug.
After read the whole code,
I think we can simply fix by change:
https://github.com/close2/csv/blob/master/lib/src/csv_parser.dart#L391

From

    // If we are not yet inside a string, we are now
    if (!_insideString) {
      _insideString = true;
      _insideQuotedString = true;
    }

To

    // We are now inside quote string.
    _insideString = true;
    _insideQuotedString = true;

How do you think?

close2 added a commit that referenced this issue Feb 26, 2024
Fixes issue #70.

If inside an unquoted string, text-delimiters are ignored instead of swallowed.
This (partially?) fixes issue #70.
Example: `"A B", "C, D"` will now produce `[["A B",' "C',' D"']]` instead of `[["A B",' C',' D']]`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants