Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CsvSchema skipFirstDataRow & useHeader #176

Open
happyhua opened this issue Feb 28, 2020 · 6 comments
Open

CsvSchema skipFirstDataRow & useHeader #176

happyhua opened this issue Feb 28, 2020 · 6 comments
Labels

Comments

@happyhua
Copy link

When both skipFirstDataRow and useHeader value are set true, the skipping will take place on the second line of the document.

whether the first data line (either first line of the document, if useHeader=false, or second, if useHeader=true) should be completely ignored by parser.

Is it possible to overwrite this behaviour? Namely skipping the first line and use the second line as the header.

For example, this kind of document:
sep=,
header1,header2
value1,value2.

@cowtowncoder
Copy link
Member

At this point, assumption is that if there is header line, that would be the very first line; this is suggested by term "data" in skipFirstDataRow as data can only follow header.
There is no way to change this behavior.

In theory we could add yet another feature for something like "always skip first line", but I suspect it would be easier to just create and use a Reader that will skip the first line of input, pass that Reader to Jackson.

@edgarklerks
Copy link

I think that makes sense, the property is called skipFirstDataRow and not skipFirstLine. However sep=<separator> seems to occur specifically when exporting spreadsheets in Excel. Perhaps it is possible to support this from that point of view? E.g. Let the user pass that this is an Excel file?

@cowtowncoder
Copy link
Member

@edgarklerks I am open to a configuration option, but my main concern is just whether addition can be

  1. Described in meaningful way (suitable name) and
  2. Covers enough use cases (i.e. do not want to add many different choices)

And I guess the only possible confusing case would be "Skip first line of file" + "no header" + "skip first data line" which should mean "skip first 2 lines of file", basically.

@edgarklerks
Copy link

I understand your point and adding an option that signals that this is an excel file and should be treated differently is also not very elegant, that gets tedious quickly if you have a lot of vendors to support.

@edgarklerks
Copy link

Perhaps an option that signals that the csv being read specifies it's own separator? Like hasSeparatorHeader?

@cowtowncoder
Copy link
Member

@edgarklerks not quite sure what hasSeparatorHeader would signify here. CSV headers do not really look any different from data rows as far I can see.

@cowtowncoder cowtowncoder added 2.12 and removed 2.11 labels Nov 22, 2020
@cowtowncoder cowtowncoder removed the 2.12 label Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants