Feature request: Additional header parsing control #840

icweaver · 2021-05-28T17:30:58Z

Thanks for providing such a wonderful package! I am opening this issue to follow-up on the Zulip discussion here: https://julialang.zulipchat.com/#narrow/stream/274208-helpdesk-.28published.29/topic/reading.20a.20file.20with.20header.20with.20CSV.2Ejl/near/240593595

To summarize, would it be inline with the parsing feature discussion here to include an option to automatically parse column names from the header row if it has the following formatting:

# column_name_1,column_name_2,...
x1,y1,...
x2,y2,...
.
.
.

Currently, one workaround is to manually drop the header comment character (# in the above example) before reading. I believe that @fredrikekre suggested including a keyword to handle the parsing (e.g., header::Regex) that could be used to handle cases like this.

Would something like this be a reasonable feature to include, or are there alternatives ways to accomplish this in CSV.File already? Sorry if this has already been discussed elsewhere!

The text was updated successfully, but these errors were encountered:

quinnj · 2021-06-05T02:32:50Z

Yeah, if you don't have commented lines in your data, you could just pass normalizenames=true and the # character would be "normalized" out.

We could maybe allow passing header::Regex; we would need to handle it here and here. But what exactly is the Regex expected to do? Just parse a single column name? Parse the whole line? And we would need someway to know how many characters were "consumed" by the Regex. I haven't played w/ Regex internals enough to know if that would be available somehow.

I've had the thought before that we could allow some kind of applyheader::Function keyword that would just be a function with form f(x::Symbol) -> Symbol, so we'd parse each column name, and then call applyheader to each one that could do any kind of transform it wanted. That might end up being more flexible and general.

icweaver · 2021-06-08T16:27:19Z

Thanks for suggesting the normalizenames=true option, it really helps a lot! I see what you mean about the Regex option and think that going the function route would be a really welcome feature. Would it make sense to also be able to apply the transform to the already "normalized" column names if it is used, since it already does so much of the heavy lifting?

quinnj · 2021-08-20T05:12:35Z

I think the right solution here is to probably go all in with something like Lyndon/I described here. I might try and get that implemented before the next 0.9 release.

quinnj added the new feature label Aug 20, 2021

quinnj added this to the 0.9 milestone Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Additional header parsing control #840

Feature request: Additional header parsing control #840

icweaver commented May 28, 2021

quinnj commented Jun 5, 2021

icweaver commented Jun 8, 2021

quinnj commented Aug 20, 2021

Feature request: Additional header parsing control #840

Feature request: Additional header parsing control #840

Comments

icweaver commented May 28, 2021

quinnj commented Jun 5, 2021

icweaver commented Jun 8, 2021

quinnj commented Aug 20, 2021