As a fallout of [PR 956|https://github.com//pull/956] we would like to understand how Avro behaves with case sensitive column names.
Couple of action items:
- Test with different field names just differing in case.
- AbstractRealtimeRecordReader is one of the classes where we are converting Avro Schema field names to lower case, to be able to verify them against column names from Hive. We can consider removing the lowercase conversion there if we verify it does not break anything.
JIRA info
Comments
23/May/20 22:07;shivnarayan;[~guoyihua]: this ticket is also related to case sensitivity. If you plan to take the other ticket, this should be on similar lines. ;;;
19/Oct/20 13:42;309637554;i do not think this should fix. because hive meta column is case insensitive. if do not lowercase will not match the hive meta schema with avro schema. just like : hive_metastoreConstants.META_TABLE_COLUMNS will be case insensitive.
Map<String, Field> schemaFieldsMap = HoodieRealtimeRecordReaderUtils.getNameToFieldMap(writerSchema);
hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap);
// Get all column names of hive table
String hiveColumnString = jobConf.get(hive_metastoreConstants.META_TABLE_COLUMNS);
LOG.info("Hive Columns : " + hiveColumnString);
String[] hiveColumns = hiveColumnString.split(",");
LOG.info("Hive Columns : " + hiveColumnString);
List hiveSchemaFields = new ArrayList<>();
for (String columnName : hiveColumns) {
Field field = schemaFieldsMap.get(columnName.toLowerCase());
if (field != null) {
hiveSchemaFields.add(new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultVal()));
} else {
// Hive has some extra virtual columns like BLOCK__OFFSET__INSIDE__FILE which do not exist in table schema.
// They will get skipped as they won't be found in the original schema.
LOG.debug("Skipping Hive Column => " + columnName);
}
};;;
19/Oct/20 13:45;309637554;[~uditme] , [~vinoth] what do you think about this :D**;;;
19/Oct/20 23:58;vinoth;[~309637554] this task is about exploring all possibilities and making a call. IIUC you are making the case for retaining the lower casing. I think what you point out is why we lower cased this.
I can't decide for myself until we paint the full picture. :) ;;;
As a fallout of [PR 956|https://github.com//pull/956] we would like to understand how Avro behaves with case sensitive column names.
Couple of action items:
JIRA info
Comments
23/May/20 22:07;shivnarayan;[~guoyihua]: this ticket is also related to case sensitivity. If you plan to take the other ticket, this should be on similar lines. ;;;
19/Oct/20 13:42;309637554;i do not think this should fix. because hive meta column is case insensitive. if do not lowercase will not match the hive meta schema with avro schema. just like : hive_metastoreConstants.META_TABLE_COLUMNS will be case insensitive.
Map<String, Field> schemaFieldsMap = HoodieRealtimeRecordReaderUtils.getNameToFieldMap(writerSchema);
hiveSchema = constructHiveOrderedSchema(writerSchema, schemaFieldsMap);
// Get all column names of hive table
String hiveColumnString = jobConf.get(hive_metastoreConstants.META_TABLE_COLUMNS);
LOG.info("Hive Columns : " + hiveColumnString);
String[] hiveColumns = hiveColumnString.split(",");
LOG.info("Hive Columns : " + hiveColumnString);
List hiveSchemaFields = new ArrayList<>();
for (String columnName : hiveColumns) {
Field field = schemaFieldsMap.get(columnName.toLowerCase());
if (field != null) {
hiveSchemaFields.add(new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultVal()));
} else {
// Hive has some extra virtual columns like BLOCK__OFFSET__INSIDE__FILE which do not exist in table schema.
// They will get skipped as they won't be found in the original schema.
LOG.debug("Skipping Hive Column => " + columnName);
}
};;;
19/Oct/20 13:45;309637554;[~uditme] , [~vinoth] what do you think about this :D**;;;
19/Oct/20 23:58;vinoth;[~309637554] this task is about exploring all possibilities and making a call. IIUC you are making the case for retaining the lower casing. I think what you point out is why we lower cased this.
I can't decide for myself until we paint the full picture. :) ;;;