Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update rapidcsv.h for read CSV content. #28

Closed
wants to merge 2 commits into from

Conversation

akeyliu
Copy link

@akeyliu akeyliu commented Jun 7, 2019

I think the logic to read CSV content with quoted content can be improved:

@akeyliu akeyliu changed the title Update rapidcsv.h Update rapidcsv.h for read CSV content. Jun 7, 2019
@akeyliu
Copy link
Author

akeyliu commented Jun 7, 2019

I think the logic to read CSV field with quoted content can be improved:

  1. The quoted character of " should be paired in CSV Field;
  2. When in quoted state, unless read the paired quoted character or '\n', all read content should be in quoted string, no matter ';', '\r' etc.
  3. '\n' is another method to stop quoted state;

This update logic may not process single " well, but I think the separator ';' in quoted string is more often to occur, and mostly " is mostly used in pair.

Maybe we can define a more complex rules to process ", i.e. double " to ignore the quoted rule just means single original ", but I think it is not worth to define these rules which is not a normal understanding for the users.

@d99kris
Copy link
Owner

d99kris commented Jun 9, 2019

As can be seen below "All checks have failed" and in the "Details" links we can see that only 27% tests passed. So the proposed changes breaks a bit of existing functionality (not only parsing related to quoted data). For a patch / pull request to be considered for merge, all the tests need to pass. Changes may be allowed to some existing tests to make it 100% pass, but then the new parsing rule to be implemented need to be aligned on first.

Rapidcsv mostly follows the basic CSV parse rules listed at https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules although it does not currently support line-breaks in quoted strings.

Perhaps you could help let me know how your proposed rule is different from the rules in the link above? And if it's breaking any of the common rules - justify why it should be broken. Thanks!

@d99kris
Copy link
Owner

d99kris commented Jun 9, 2019

And by the way, if you are making changes and want to test them, the complete test suite can be run like this on Mac / Linux:

mkdir -p build && cd build && cmake .. && make && ctest -C unit --output-on-failure && ctest -C perf --verbose ; cd -

For Windows refer to https://github.com/d99kris/rapidcsv/blob/master/appveyor.yml

@d99kris
Copy link
Owner

d99kris commented Jun 9, 2019

One more thing - instead of describing the proposed parsing rules or updated code - perhaps it's easier if you provide an example of the CSV data you are trying to read (and that rapidcsv currently does not support). It will make it much easier for me to understand the issue.

@akeyliu
Copy link
Author

akeyliu commented Jun 9, 2019

Ok, I'm not good at Github, and will try it follow your suggestion.

@d99kris
Copy link
Owner

d99kris commented Jun 9, 2019

No worries, I understand you're just trying to contribute some improvement to the project. I do appreciate you taking the time to create a pull request for the suggested changes.

But as it turns out, the proposed changes break multiple tests, so that needs to be addressed before any new functionality can be merged.

If it's easier for you, perhaps you can share the CSV file you are unable to read, then we can work out a solution together on how that CSV file could be parsed by rapidcsv. Thanks!

@d99kris
Copy link
Owner

d99kris commented Jul 10, 2019

Closing this PR, feel free to re-open if you want to pursue this in the future.

@d99kris d99kris closed this Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants