Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headers as atoms #47

Closed
niahoo opened this issue Jun 28, 2016 · 4 comments
Closed

Headers as atoms #47

niahoo opened this issue Jun 28, 2016 · 4 comments

Comments

@niahoo
Copy link

niahoo commented Jun 28, 2016

Hello,

Is there any way to read the headers from the first line, but to transform them into atoms before converting rows to maps.

What I want is to have atoms as keys in maps, not strings.

Thank you !

@beatrichartz
Copy link
Owner

Erlang does not allow the full utf-8 range to be represented as atoms . Since this is partly due to BEAM, I do not see that changing anytime soon. This library is focused on decoding utf-8 files - since headers are part of the file, it would not make sense to allow decoding into atoms when there is a potential for quite an intricate error that will confuse a lot of users.

There is also no clear performance advantage - in fact, headers as atoms will perform about 10% worse in terms of operations and only slightly better on memory.

Curious about your usecase - is there any particular reason you'd like to have atoms as the map keys?

@niahoo
Copy link
Author

niahoo commented Jul 4, 2016

Hi !
Thanks for answering. My use case is reading csv files and building Amnesia models with the data. I'll see if I can skip the conversion and use strings.
I know for the utf-8 thing but I think it's the developer who is responsible of turning this option on and to know that the first row of the file is OK.
Thank you :)

@beatrichartz
Copy link
Owner

Great, let me know if you were successful using strings. Out of interest, Mnesia itself does work with tuples, are you using a wrapper library that works with maps?

It's best to avoid confusion around an API. There are usecases where files are read in dynamically and headers can not be known ahead of time, or files are in languages having characters outside the latin-1 range. I think it is a direct responsibility of this library to ensure that it reliably produces output given utf-8 input, which includes headers. Allowing atoms would create bugs that are hard to detect and painful to fix.

So this won't make it into the library I'm afraid. You can still get the first row of a CSV as a list, convert it to atoms and feed it back in though if you really prefer:

headers = inputstream 
          |> CSV.decode!
          |> Enum.take(1) 
          |> List.first 
          |> Enum.map(&String.to_atom/1)

inputstream
          |> Stream.drop(1)
          |> CSV.decode!(headers: headers)

@niahoo
Copy link
Author

niahoo commented Jul 7, 2016

Ok, I understand your arguments :)

I used to do something similar to your code but I was afraid of calling CSV.decode! twice. Does it read the entire file each time, or maybe with Stream.take instead of Enum.take it could read only the first line from the stream ?

I use Amnesia, an elixir wrapper on top of mnesia, it let you work with maps but stores regular tuples in mnesia.

Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants