Skip to content
This repository was archived by the owner on Jun 15, 2023. It is now read-only.

Conversation

iconara
Copy link
Contributor

@iconara iconara commented May 13, 2020

The current documentation for the case.insensitive property is misleading or wrong. Athena does not require the data to have lower case keys, as is implied. It also left out a very important part of how to use the property, that without explicit mappings the properties will not be found.

What happens is that Athena will lower case keys. Column names will always be lower cased when you create the table through Athena (not sure if it's the same if you do it through Glue). This means that when case.insensitive is false Athena will look for a lower case key, but the serde will have preserved the casing of the keys, and you end up with NULL for all columns where the underlying key has upper case characters.

With the behaviour just described, the only reason to set this property is to get around duplicate key errors, and I've provided guidance for that in the documentation. By default, if you have the properties "URL" and "Url" you will get a duplicate key error (HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key "url") because they will both be lower cased to the same string. By setting the property to false and providing mappings you can get around that problem.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The current documentation for the case.insensitive property is misleading or wrong. Athena does not require the data to have lower case keys, as is implied. It also left out a very important part of how to use the property, that without explicit mappings the properties will not be found.

What happens is that Athena will lower case keys. Column names will always be lower cased when you create the table through Athena (not sure if it's the same if you do it through Glue). This means that when `case.insensitive` is false Athena will look for a lower case key, but the serde will have preserved the casing of the keys, and you end up with NULL for all columns where the underlying key has upper case characters.

With the behaviour just described, the only reason to set this property is to get around duplicate key errors, and I've provided guidance for that in the documentation. By default, if you have the properties "URL" and "Url" you will get a duplicate key error (`HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key "url"`) because they will both be lower cased to the same string. By setting the property to false and providing mappings you can get around that problem.
@taammann
Copy link
Contributor

iconara,

Thanks for the clarification of that nuanced casing behavior and how to deal with it!
We'll get the change added in.

@taammann taammann merged commit 0662608 into awsdocs:master May 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants