Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

Sparsely populated POJO ( ignore columns ) #82

Closed
andlaz opened this issue Jun 12, 2015 · 11 comments
Closed

Sparsely populated POJO ( ignore columns ) #82

andlaz opened this issue Jun 12, 2015 · 11 comments

Comments

@andlaz
Copy link

andlaz commented Jun 12, 2015

Hello-

i was wondering if this was an un-supported use case, ( and if you thought implementing it was feasible/sensible? ) or a bug?

  • given a csv record of 4 columns
  • given
@JsonPropertyOrder( value = { "dont_care", "the_one", "not_important", "irrelevant" } ) 

annotation on the POJO

  • give a single field on the POJO
@JsonProperty(value="the_one") private Integer theOne;

Parsing a CSV record of

"one", "2", "three", "four"

Yields a type cast exception, because the value from the first column is being populated in to "theOne"

is @JsonPropertyOrder ignored? i tried annotating the field

@JsonProperty(value="the_one", index=1)

which looked redundant and made no difference.

n.b.: If i add four fields of compatible types ( String, Integer, String, String ) all is fine. I can even shuffle them around in @JsonPropertyOrder and see the field values affected.

@cowtowncoder
Copy link
Member

Note that @JsonPropertyOrder only defines ordering of properties that exist (from Jackson's perspective). So you do need to define CsvSchema that maps all columns to names. These do not necessarily need to match to POJO properties (you may want to use @JsonIgnoreProperties(ignoreUnknown=true), or disable DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES).

Depending on how you construct CsvSchema, mechanism differs a bit.

But anyway, what you want is definitely doable, we just need to figure out what is the best way in your case.

@andlaz
Copy link
Author

andlaz commented Jun 18, 2015

Thanks. So, if i understand this correctly, i need to define CsvSchema no matter what? No way to entirely do this via annotations on the POJO?

To clarify- if i don't want to declare all columns in the CSV, as the POJO's fields ( via @JsonProperty ) , even if i have the indexes of the ones i am interested in and am willing to ignore the rest, i will have to create a CsvSchema object with all columns declared in it?

@cowtowncoder
Copy link
Member

You can generate CsvSchema from POJO definition, and if so, @JsonPropertyOrder is taken into account. But the problem here is that POJO only has definition of a single column, and thus schema generated also only has one property. Inclusion of "non-existing" properties in @JsonPropertyOrder is not an error, but such entries do not cause addition of virtual properties.

So: the question is, how could you add these placeholders. Currently the only way would be to manually build CsvSchema to indicate their existence in CSV document, and then use @JsonIgnoreProperties (or one of alternatives) to drop bogus values (nulls) that would be seen.

Now... perhaps there is a feature request here, which would actually add "virtual" properties for entries defined in @JsonPropertyOrder, but for which no actual POJO property exists?

But on short term, you could try first generating CsvSchema from POJO, then modifying it by inserting columns manually. I don't know if that is much less work, at least for this case.
Note, too, that defining/modifying CsvSchema really just means adding column-to-name mapping, to give logical name to a value at specified column (index). So while additional work, it is not particularly complicated.

@andlaz
Copy link
Author

andlaz commented Jun 25, 2015

My use case is this: i have 100+ columns, and i'm interested in 3. I would prefer to be able to just say : "these are the 3 properties i'm interested in and this is their index. Ignore the rest". Preferably all via annotations :-) Seems to me there is no way to do this with annotations or by building a CsvSchema object. Is that correct?

would you consider this too much of an edge-case?

@andlaz
Copy link
Author

andlaz commented Jun 25, 2015

i should clarify: no way to do this without specifying all 100+ fields

@cowtowncoder
Copy link
Member

Right, there is no way to currently specify mapping for just a subset.

But since you do know indexes, you could generate CsvSchema, by adding auto-generated meaningless names ("skip1", "skip2", ....), and then allow skipping of unknown properties for databinding.

@andlaz
Copy link
Author

andlaz commented Jul 17, 2015

ok, thanks. I'll see if i can formulate a feature request around this maybe.

@andlaz andlaz closed this as completed Jul 17, 2015
@m-rossini
Copy link

I do not think this is an edge case. It seems to be pretty reasonable and I think this is the normal and most common use case. It probably does not get raised too much because in most cases people are just dropping out very few columns, so this ones can be easily handled with above technique, otherwise it is a real pain. If we consider that schema builder add columns by name setting a property to ignore CSB Columns not found on the POJO would be easy.

@cowtowncoder
Copy link
Member

@m-rossini I am not 100% sure what you asking here, so it might be best to actually file a new request with exact details.
But... Jackson 2.7 added CsvParser.Feature.IGNORE_TRAILING_UNMAPPABLE which does allow ignoring of columns beyond ones user has defined and might do what you want.

@Stephan202
Copy link

CsvParser.Feature.IGNORE_TRAILING_UNMAPPABLE works well for a subset of the cases described here.

I currently have a use case where I'd like to extract two out of seventeen columns; the columns generally have a stable order but this is not something I'd want to rely on. (There is a header, so reordering shouldn't be a problem.) And in the future additional irrelevant columns may be introduced.

So just to confirm: parsing a CSV file with an arbitrary number of extraneous columns with are arbitrarily sorted into a POJO which describes a subset of those columns is not currently supported?

@cowtowncoder
Copy link
Member

@Stephan202 Correct. Arbitrary extraction is not supported with name mapping.
The only way would be to do use 'untyped' mapping into List<String> and handle it manually.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants