Support parsing CSV with header regardless of unknown columns #286

bjmi · 2021-08-26T13:38:20Z

When reading given CSV with jackson-dataformat-csv 2.11.4

name,weight,age
Roger,69,27
Chris,89,53

using following snippet

CsvMapper csvMapper = new CsvMapper();
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true)
        .addColumn("name").addColumn("age").build();
List<Person> persons = csvMapper
        .readerFor(Person.class)
        .with(csvSchema)
        .<Person> readValues(csv)
        .readAll();
...
class Person {
    public String name;
    public int age;
}

a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema.
csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); still leads to the same CsvMappingException.
Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS (disabled by default) that allows reading CSV regardless of unknown columns.

The text was updated successfully, but these errors were encountered:

kpankowski · 2021-09-15T20:26:45Z

Reorder the columns:

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();

or skip adding columns explicitly when using setUseHeader(true)

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();

bjmi · 2021-09-16T08:46:22Z

Reorder the columns:

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();

But the use case expects the columns name and age in given order and should fail otherwise.
At the moment explicitly declaring header columns and the reorder column feature are mutually exclusive due to this:

jackson-dataformats-text/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java

Line 787 in 8107723

if (_schema.size() > 0 && !_schema.reordersColumns()) {

and can be considered as a bug.

or skip adding columns explicitly when using setUseHeader(true)
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();

But then FAIL_ON_MISSING_COLUMNS feature can't be used anymore and name and age aren't required columns anymore.

ZijiePan1996 · 2023-04-12T03:04:19Z

Same issue was encountered with jackson-dataformat-csv 2.13.4, trying to parse a csv file(>100 columns) to a Java entity(10 attributes). I have tried to use

ObjectReader csvReader = csvMapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES) .readerFor(BlackList.class) .with(csvSchema);

But I have found that the values in the unknown columns are parsed to the next column, messed up data in the DB. As @bjmi mentioned, IGNORE_UNKNOWN_PROPERTIES will likely solve my problem

a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema.
csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); still leads to the same CsvMappingException.
Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS (disabled by default) that allows reading CSV regardless of unknown columns.

redvasily · 2023-06-08T15:38:58Z

I can get it to work if when reading I use a schema .withHeader() and .withColumnReordering().

FAIL_ON_UNKNOWN_PROPERTIES is disabled for me, but I didn't test if it's necessary.

So in the end I am using two different schemas: for writing without column reordering and for reading with column reordering.

bjmi mentioned this issue Jan 7, 2022

Missing columns from header line (compare to CsvSchema) not detected when reordering columns (add CsvParser.Feature.FAIL_ON_MISSING_HEADER_COLUMNS) #285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support parsing CSV with header regardless of unknown columns #286

Support parsing CSV with header regardless of unknown columns #286

bjmi commented Aug 26, 2021

kpankowski commented Sep 15, 2021

bjmi commented Sep 16, 2021 •

edited

ZijiePan1996 commented Apr 12, 2023 •

edited

redvasily commented Jun 8, 2023

Support parsing CSV with header regardless of unknown columns #286

Support parsing CSV with header regardless of unknown columns #286

Comments

bjmi commented Aug 26, 2021

kpankowski commented Sep 15, 2021

bjmi commented Sep 16, 2021 • edited

ZijiePan1996 commented Apr 12, 2023 • edited

redvasily commented Jun 8, 2023

bjmi commented Sep 16, 2021 •

edited

ZijiePan1996 commented Apr 12, 2023 •

edited