Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV to JSON is too fragile #127

Open
sunnywalker opened this issue Jul 21, 2020 · 5 comments
Open

CSV to JSON is too fragile #127

sunnywalker opened this issue Jul 21, 2020 · 5 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@sunnywalker
Copy link
Contributor

sunnywalker commented Jul 21, 2020

The CSV to JSON script is too fragile and breaks on real-world data such as double-quoted columns to contain commas and double-double-quotes for literal double-quotes.

I had a \r in some of my data which means the script isn't standardizing to \n line endings.

Test:

Screen Shot 2020-07-20 at 15 44 11

Result:

Screen Shot 2020-07-20 at 15 44 29

Expected:

Screen Shot 2020-07-20 at 15 47 24

The JSON to CSV gets similarly confused in that it uses \" for literal double-quotes instead of the CSV convention of double-double-quotes within a quoted column.

@IvanMathy
Copy link
Owner

Hey there! Thanks for the report, I'll try to either fix the current code or try to find a more solid library for conversion. I have to admit this isn't a script I use a lot personally so it's hard to see shortcomings... I appreciate the details and will look into the conventions for converting between the two formats.

@IvanMathy IvanMathy added bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed labels Jul 21, 2020
@Flare576
Copy link
Contributor

[Off-Topic, but context: I'm attempting to make a JSON->YAML converter script and noticed this Issue; @IvanMathy's comment about "more solid library" for conversions hits home with the problem I'm facing; external libraries.]

The best library for converting CSV to JSON I've found is https://www.papaparse.com/ . I've tried creating a script like this:

const Papa = require('papaparse');

/**
	{
		"api":1,
		"name":"(beta) Convert CSV To JSON",
		"description":"Changes CSV to JSON",
		"author":"flare576",
		"icon":"quote",
		"tags":"boop,state,script,csv,json,convert"
	}
**/

function main(state) {
  try {
    state.text = Papa.parse(state.text);
  }
  catch(error) {
    state.fullText = `${state.fullText}\n\n${error}`;
    state.postError("Something strange happened here...");
  }
}

But I get the error

TypeError: undefined is not an object (evaluating 'Papa.parse')

Which makes sense: it's not one of the included libraries laid out in your Modules docs here...

So, what's the suggested process here? Feel free to "RTFM" with a link :D

@Flare576
Copy link
Contributor

For what it's worth: I grabbed the minified version of Papa Parse and pasted it straight into my csv_to_json.js, and it did work. I did have to take out the require and fix the code to be

function main(state) {
  try {
    const { data } = Papa.parse(state.text);
    state.text = JSON.stringify(data);
  }
  catch(error) {
    state.postError("Something strange happened here...");
  }
}

If the pattern Boop wants to follow is to have all dependencies in the lib folder, then creating a new file in there for Papa (with its LICENSE, of course) is probably the cleanest way, but before I throw up a PR I wanted to double-check that this is the approach you want.

Flare576 added a commit to Flare576/Boop that referenced this issue Oct 29, 2020
Original implementation couldn't handle many common "Gotchas" in CSV, including quoted commas

IvanMathy#127
@Flare576
Copy link
Contributor

Put up a PR that (I THINK) is aligned with the Boop docs:

If you'd like some helper functions, consider extracting only the part you need from the library (and clealy mention where it comes from alongside the license) rather than pasting the full source in your script.

https://github.com/IvanMathy/Boop/blob/main/Boop/Documentation/CustomScripts.md#performance

The entire PapaParse library was 20k. Not sure how this compares to other "helpers", so I opted to start "Fast and easy". If this is too large, there are components (mostly around file read/write) that we don't need of the library, but removing them piecemeal isn't my idea of a great time (and probably won't save much space considering the maturity of PapaParse), so I left it for now.

@Flare576
Copy link
Contributor

Flare576 commented Oct 29, 2020

Sorry for the double-post, but I just realized that there's a combination of an assumption and a "Limitation" at play around the "header" stuff in this script.

First, we're assuming that this data has a header (both in the original and in my fix). There's a "Limitation" (and I use that term VERY loosely because I actually see this as a feature/simplicity) in Boop that you can't pass parameters to the scripts.

So... the solutions I see are:

  1. Add another script and define one as CSV to JSON (headers), the other CSV to JSON (headerless)
  2. 🤮 Add some sort of design where the first line of state.text could be settings for a script 🤮
  3. ❓ Auto-detect header rows ❓ (I have NO idea how to do this and am pretty sure it's a fool's errand)

If we went with # 1, I'd argue that the 2nd script would just be an optional script, and the default would be headers

softdevjs4 added a commit to softdevjs4/veu-Boop that referenced this issue Sep 6, 2022
Original implementation couldn't handle many common "Gotchas" in CSV, including quoted commas

IvanMathy/Boop#127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants