You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To summarize, would it be inline with the parsing feature discussion here to include an option to automatically parse column names from the header row if it has the following formatting:
Currently, one workaround is to manually drop the header comment character (# in the above example) before reading. I believe that @fredrikekre suggested including a keyword to handle the parsing (e.g., header::Regex) that could be used to handle cases like this.
Would something like this be a reasonable feature to include, or are there alternatives ways to accomplish this in CSV.File already? Sorry if this has already been discussed elsewhere!
The text was updated successfully, but these errors were encountered:
Yeah, if you don't have commented lines in your data, you could just pass normalizenames=true and the # character would be "normalized" out.
We could maybe allow passing header::Regex; we would need to handle it here and here. But what exactly is the Regex expected to do? Just parse a single column name? Parse the whole line? And we would need someway to know how many characters were "consumed" by the Regex. I haven't played w/ Regex internals enough to know if that would be available somehow.
I've had the thought before that we could allow some kind of applyheader::Function keyword that would just be a function with form f(x::Symbol) -> Symbol, so we'd parse each column name, and then call applyheader to each one that could do any kind of transform it wanted. That might end up being more flexible and general.
Thanks for suggesting the normalizenames=true option, it really helps a lot! I see what you mean about the Regex option and think that going the function route would be a really welcome feature. Would it make sense to also be able to apply the transform to the already "normalized" column names if it is used, since it already does so much of the heavy lifting?
I think the right solution here is to probably go all in with something like Lyndon/I described here. I might try and get that implemented before the next 0.9 release.
Thanks for providing such a wonderful package! I am opening this issue to follow-up on the Zulip discussion here: https://julialang.zulipchat.com/#narrow/stream/274208-helpdesk-.28published.29/topic/reading.20a.20file.20with.20header.20with.20CSV.2Ejl/near/240593595
To summarize, would it be inline with the parsing feature discussion here to include an option to automatically parse column names from the header row if it has the following formatting:
Currently, one workaround is to manually drop the header comment character (
#
in the above example) before reading. I believe that @fredrikekre suggested including a keyword to handle the parsing (e.g.,header::Regex
) that could be used to handle cases like this.Would something like this be a reasonable feature to include, or are there alternatives ways to accomplish this in
CSV.File
already? Sorry if this has already been discussed elsewhere!The text was updated successfully, but these errors were encountered: