This repository was archived by the owner on Jun 15, 2023. It is now read-only.
Update the documentation for the OpenX JSON SerDe case.insensitive property #46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current documentation for the
case.insensitive
property is misleading or wrong. Athena does not require the data to have lower case keys, as is implied. It also left out a very important part of how to use the property, that without explicit mappings the properties will not be found.What happens is that Athena will lower case keys. Column names will always be lower cased when you create the table through Athena (not sure if it's the same if you do it through Glue). This means that when
case.insensitive
is false Athena will look for a lower case key, but the serde will have preserved the casing of the keys, and you end up withNULL
for all columns where the underlying key has upper case characters.With the behaviour just described, the only reason to set this property is to get around duplicate key errors, and I've provided guidance for that in the documentation. By default, if you have the properties "URL" and "Url" you will get a duplicate key error (
HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key "url"
) because they will both be lower cased to the same string. By setting the property to false and providing mappings you can get around that problem.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.