Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode characters on downloaded CSV (utf-8 not supported) #78

Closed
elbaza1 opened this issue Jul 12, 2019 · 8 comments
Closed

unicode characters on downloaded CSV (utf-8 not supported) #78

elbaza1 opened this issue Jul 12, 2019 · 8 comments

Comments

@elbaza1
Copy link

@elbaza1 elbaza1 commented Jul 12, 2019

Link of the test case ::
https://www.csvjson.com/json2csv/df61580582fea1929d2c1ba50f5cfb8e

French Characters like 'é' are converted to 'é' for example.
I suggest allowing 'utf-8' on download, or writing the csv files as 'utf-8' in the api before downloading
Thanks

@DrorHarari
Copy link

@DrorHarari DrorHarari commented Jul 12, 2019

The download is in UTF-8. The reason you see 'é' is because the program you use (likely Excel or Wordpad) does not automatically recognize the data is in UTF-8 so the bytes 0xC3, 0xA9 are interpreted as two characters rather than the single character 'é'. Some other programs such as Notepad actually do recognize the file as UTF-8 and show the data fine.

You can ask why not automatically insert at the top of the file the BOM character (0xEF, 0xBB, 0xBF) which will make Excel, Wordpad and many other software products recognize the file as UTF-8?

The answer is that many CSV file processors actually balk at the BOM sequence at the beginning of the file and include it in the data where it looks like a garbage character. That is not universally the case and that is why, it may be useful to have another checkbox "Include BOM" to let the user ask for a BOM to be added.

image

martindrapeau added a commit that referenced this issue Jul 16, 2019
@martindrapeau
Copy link
Collaborator

@martindrapeau martindrapeau commented Jul 16, 2019

I've gone ahead and added the BOM character at the beginning as I believe most people use this to work in Excel afterwards. If this becomes an issue, I can add the option.

@elbaza1
Copy link
Author

@elbaza1 elbaza1 commented Jul 16, 2019

@jmappala
Copy link

@jmappala jmappala commented Jul 17, 2019

HI.... I am having an error now. It has been working more than week ago.

I saved an iNav SQL data extract into CSV, then used CVSJSON to convert into JSON.

Run K6 to do praalel run, and got this: level=error msg="SyntaxError: invalid character 'ï' looking for beginning of value at parse (native)"

@jmappala
Copy link

@jmappala jmappala commented Jul 17, 2019

FYI.... I used http://www.convertcsv.com/csv-to-json.htm, and it worked.

@martindrapeau
Copy link
Collaborator

@martindrapeau martindrapeau commented Jul 17, 2019

Could you try again @jmappala? Should be fixed now.

@DrorHarari
Copy link

@DrorHarari DrorHarari commented Jul 17, 2019

Adding the BOM automatically can break apps that do not expect it - the BOM would be seen as the first character of the first column name. If the app expects a specific column name, it would not find it (unless it expects the BOM). Hence while adding the BOM by default for the expected Excel audience, it may be safer to allow the user to request a CSV without it (via that checkbox). @martindrapeau - I understand you are waiting for that hypothetical person to stand up 😉, ok.

@jmappala
Copy link

@jmappala jmappala commented Jul 18, 2019

@martindrapeau, and it worked. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants