Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing CSV with header regardless of unknown columns #286

Open
bjmi opened this issue Aug 26, 2021 · 4 comments
Open

Support parsing CSV with header regardless of unknown columns #286

bjmi opened this issue Aug 26, 2021 · 4 comments

Comments

@bjmi
Copy link

bjmi commented Aug 26, 2021

When reading given CSV with jackson-dataformat-csv 2.11.4

name,weight,age
Roger,69,27
Chris,89,53

using following snippet

CsvMapper csvMapper = new CsvMapper();
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true)
        .addColumn("name").addColumn("age").build();
List<Person> persons = csvMapper
        .readerFor(Person.class)
        .with(csvSchema)
        .<Person> readValues(csv)
        .readAll();
...
class Person {
    public String name;
    public int age;
}

a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema.
csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); still leads to the same CsvMappingException.
Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS (disabled by default) that allows reading CSV regardless of unknown columns.

@kpankowski
Copy link

Reorder the columns:

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();

or skip adding columns explicitly when using setUseHeader(true)

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();

@bjmi
Copy link
Author

bjmi commented Sep 16, 2021

Reorder the columns:

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();

But the use case expects the columns name and age in given order and should fail otherwise.
At the moment explicitly declaring header columns and the reorder column feature are mutually exclusive due to this:

if (_schema.size() > 0 && !_schema.reordersColumns()) {

and can be considered as a bug.

or skip adding columns explicitly when using setUseHeader(true)
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();

But then FAIL_ON_MISSING_COLUMNS feature can't be used anymore and name and age aren't required columns anymore.

@ZijiePan1996
Copy link

ZijiePan1996 commented Apr 12, 2023

Same issue was encountered with jackson-dataformat-csv 2.13.4, trying to parse a csv file(>100 columns) to a Java entity(10 attributes). I have tried to use

ObjectReader csvReader = csvMapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES) .readerFor(BlackList.class) .with(csvSchema);

But I have found that the values in the unknown columns are parsed to the next column, messed up data in the DB. As @bjmi mentioned, IGNORE_UNKNOWN_PROPERTIES will likely solve my problem

a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema.
csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); still leads to the same CsvMappingException.
Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS (disabled by default) that allows reading CSV regardless of unknown columns.

@redvasily
Copy link

I can get it to work if when reading I use a schema .withHeader() and .withColumnReordering().

FAIL_ON_UNKNOWN_PROPERTIES is disabled for me, but I didn't test if it's necessary.

So in the end I am using two different schemas: for writing without column reordering and for reading with column reordering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants