CsvSchema skipFirstDataRow & useHeader #176

happyhua · 2020-02-28T15:36:56Z

When both skipFirstDataRow and useHeader value are set true, the skipping will take place on the second line of the document.

whether the first data line (either first line of the document, if useHeader=false, or second, if useHeader=true) should be completely ignored by parser.

Is it possible to overwrite this behaviour? Namely skipping the first line and use the second line as the header.

For example, this kind of document:
sep=,
header1,header2
value1,value2.

cowtowncoder · 2020-02-28T17:54:39Z

At this point, assumption is that if there is header line, that would be the very first line; this is suggested by term "data" in skipFirstDataRow as data can only follow header.
There is no way to change this behavior.

In theory we could add yet another feature for something like "always skip first line", but I suspect it would be easier to just create and use a Reader that will skip the first line of input, pass that Reader to Jackson.

edgarklerks · 2020-03-02T10:43:25Z

I think that makes sense, the property is called skipFirstDataRow and not skipFirstLine. However sep=<separator> seems to occur specifically when exporting spreadsheets in Excel. Perhaps it is possible to support this from that point of view? E.g. Let the user pass that this is an Excel file?

cowtowncoder · 2020-03-03T00:16:49Z

@edgarklerks I am open to a configuration option, but my main concern is just whether addition can be

Described in meaningful way (suitable name) and
Covers enough use cases (i.e. do not want to add many different choices)

And I guess the only possible confusing case would be "Skip first line of file" + "no header" + "skip first data line" which should mean "skip first 2 lines of file", basically.

edgarklerks · 2020-03-06T15:04:13Z

I understand your point and adding an option that signals that this is an excel file and should be treated differently is also not very elegant, that gets tedious quickly if you have a lot of vendors to support.

edgarklerks · 2020-03-06T15:05:21Z

Perhaps an option that signals that the csv being read specifies it's own separator? Like hasSeparatorHeader?

cowtowncoder · 2020-03-06T18:32:02Z

@edgarklerks not quite sure what hasSeparatorHeader would signify here. CSV headers do not really look any different from data rows as far I can see.

cowtowncoder added 2.11 csv labels Mar 3, 2020

cowtowncoder added 2.12 and removed 2.11 labels Nov 22, 2020

cowtowncoder removed the 2.12 label Apr 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CsvSchema skipFirstDataRow & useHeader #176

CsvSchema skipFirstDataRow & useHeader #176

happyhua commented Feb 28, 2020

cowtowncoder commented Feb 28, 2020

edgarklerks commented Mar 2, 2020

cowtowncoder commented Mar 3, 2020

edgarklerks commented Mar 6, 2020

edgarklerks commented Mar 6, 2020

cowtowncoder commented Mar 6, 2020

CsvSchema skipFirstDataRow & useHeader #176

CsvSchema skipFirstDataRow & useHeader #176

Comments

happyhua commented Feb 28, 2020

cowtowncoder commented Feb 28, 2020

edgarklerks commented Mar 2, 2020

cowtowncoder commented Mar 3, 2020

edgarklerks commented Mar 6, 2020

edgarklerks commented Mar 6, 2020

cowtowncoder commented Mar 6, 2020