Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squib.csv strange behavior with Unicode BOM #322

Open
vtbassmatt opened this issue Oct 22, 2020 · 2 comments
Open

Squib.csv strange behavior with Unicode BOM #322

vtbassmatt opened this issue Oct 22, 2020 · 2 comments
Labels

Comments

@vtbassmatt
Copy link
Contributor

vtbassmatt commented Oct 22, 2020

(I'm not a Ruby person, so please forgive me if this is an expected behavior or otherwise widely known.)

Short version

Squib.csv exhibits really strange behavior in the face of a BOM (byte order marker) and/or CRLF (carriage return + linefeed) in CSV data. This is relevant since Excel's default is to write CSVs with these characters. Rows of data appear or fail to appear depending on which DataFrame methods you call!

Longer version, or how I got here

I used the --advanced project layout and immediately wanted to switch from XLSX to CSV-based data. Using Excel for Mac, I saved the default XLSX as CSV using whatever Excel's default was -- UTF-8 I think. I didn't realize it was going to use CRLF line endings + a Unicode BOM. The generated deck.rb immediately started giving me errors like this: NoMethodError: undefined method name' for #Squib::DataFrame:0x00007fd64ebdbce0`

Poking around in irb was curious. Sometimes the DataFrame thought it contained data, while other times it didn't.

Here's a slightly cleaned-up version of my session:

irb(main):001:0> require 'squib'
=> true

# data will be our original Excel file, data2 is from the CSV
irb(main):002:0> data = Squib.xlsx file: 'data/game.xlsx', sheet: 0
=> #<Squib::DataFrame:0x00007fbcf6b514a8 @hash={"Name"=>["Elf", "Dwarf"], "...
irb(main):003:0> data2 = Squib.csv file: 'data/game.csv'
=> #<Squib::DataFrame:0x00007fbcf7a73b98 @hash={"Name"=>["Elf", "Dwarf"], ...

# Both have 2 rows of data
irb(main):004:0> data.nrows
=> 2
irb(main):005:0> data2.nrows
=> 2

# not shown - the Excel-based version happily responds to .name and ['Name']
irb(main):006:0> data2.name
Traceback (most recent call last):
        4: from /usr/local/opt/ruby/bin/irb:23:in `<main>'
        3: from /usr/local/opt/ruby/bin/irb:23:in `load'
        2: from /usr/local/Cellar/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/irb-1.2.6/exe/irb:11:in `<top (required)>'
        1: from (irb):6
NoMethodError (undefined method `name' for #<Squib::DataFrame:0x00007fbcf7a73b98>)
Did you mean?  name
irb(main):007:0> data2['Name']
=> nil

# column doesn't exist?
irb(main):008:0> data2.col? 'name'
=> false

# but the data's in the JSON output...
irb(main):009:0> data2.to_json
=> "{\"Name\":[\"Elf\",\"Dwarf\"],\"ATK\":[3,2],\"DEF\":[2,3]}"

On a hunch, I replaced the CRLFs with LFs and removed the BOM. Everything worked after that.

@andymeneely
Copy link
Owner

So... on a Mac, Excel saved it with CRLF? Interesting. And I'll look into how a BOM would get handled here.

@vtbassmatt
Copy link
Contributor Author

Yep - I was surprised too. The CRLF's don't seem to matter, as it turns out. And if you have Excel read a file without a BOM, it doesn't appear to insert a BOM. Only when I converted an XLSX to CSV did it give me trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants