Skip to content

Conversation

@vihangk1
Copy link
Contributor

@vihangk1 vihangk1 commented Feb 22, 2018

This version of patch moves TypeInfo and its sub-classes to standalone-metastore. The motivation of doing this is that metastore needs the TypeInfo like classes to store the metadata about types. This is implemented by TypeInfos in Hive. Metastore needs this information because table like avro can define schema externally using url to a file containing schema or a string value of the schema added as a table property. In such cases metastore need to parse this information and convert them into FieldSchema. Before this patch this String->FieldSchema conversion was done using SerDes using the ObjectInspectors and the typeInfos from them. This patch bypasses a lot of that to remove the dependency to the SerDes such that it converts the String -> TypeInfo -> FieldSchema.

In order to achieve this and also for reducing duplicate code and a cleaner design, this patch moves TypeInfo and its subclasses (ListTypeInfo, MapTypeInfo, StructTypeInfo, UnionTypeInfo), TypeInfoParser to standalone metastore. In case of PrimitiveTypeInfo, Hive code has added lot more than just type metadata in PrimitiveTypeInfo. Specifically, PrimitiveTypeEntry, PrimitiveCategory is type implementation detail which cannot be moved to standalone-metastore. Not to mention bring in PrimitiveTypeEntry bring in a whole lot of dependent code with it. To workaround this issue, a new class called MetastorePrimitiveTypeInfo is introduced in standalone-metastore. This class contains only the information which is needed by metastore from PrimitiveTypeInfo and PrimitiveTypeInfo extends MetastorePrimitiveTypeInfo. This way we reduce the scope of changes greatly. PrimitiveTypeInfo now contains implementation details of Hive's primitive types. Moving TypeInfo to standalone-metastore also needs the Category enum which unfortunately was defined in ObjectInspector. In order to get around this ObjectInspector is moved to storage-api so that standalone-metastore can access the Category enum from TypeInfo.

Moving TypeInfoFactory was also very disruptive and hence an interface called ITypeInfoFactory is created in metastore and both metastore and hive implement this interface. The Avro storage schema reader now can use the TypeInfoToSchema and SchemaToTypeInfo util classes (also moved to metastore) using the ITypeInfoFactory interface.

/**
* Category.
*
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this change break anyone with their own ObjectInspector? For example, a quick perusal of the Parquet code looks like it uses this quite frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just check the Parquet code and I think you are right. I didn't know that Parquet (and now possibly other codebases outside hive) would be using this class. In that case, I think we will have to copy (or map it to a new Category enum in the standalone-metastore). It might be little hacky but I can't think of a better way. I will give it shot and update the patch today. Thanks for catching this!

Copy link
Contributor Author

@vihangk1 vihangk1 Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Alan, I updated the PR with the latest changes. Basically instead of moving Category enum out of ObjectInspector, I moved ObjectInspector interface along with Category enum to storage-api which can be accessed from standalone-metastore. This way the change becomes even smaller and also maintain backwards compatibility.

@vihangk1 vihangk1 force-pushed the vihangk1_HIVE-17580v4 branch from ddc538a to b9002fa Compare February 28, 2018 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants