Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue #1715

Closed
wants to merge 2 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Nov 27, 2020

What changes were proposed in this pull request?

This pr replace null with JsonProperties.NULL_VALUE to fix compatibility issue:

  1. java.lang.NoSuchMethodError: 'void org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, java.lang.String, org.codehaus.jackson.JsonNode)'
    - create hive serde table with Catalog
    *** RUN ABORTED ***
      java.lang.NoSuchMethodError: 'void org.apache.avro.Schema$Field.<init>(java.lang.String, org.apache.avro.Schema, 
    java.lang.String, org.codehaus.jackson.JsonNode)'
      at org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.createAvroField(TypeInfoToSchema.java:76)
      at org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.convert(TypeInfoToSchema.java:61)
      at org.apache.hadoop.hive.serde2.avro.AvroSerDe.getSchemaFromCols(AvroSerDe.java:170)
      at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:114)
      at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
      at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
      at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
      at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
      at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
      at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
    
  2. org.apache.avro.AvroRuntimeException: Unknown datum class: class org.codehaus.jackson.node.NullNode
    - alter hive serde table add columns -- partitioned - AVRO *** FAILED ***
      org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: 
    org.apache.avro.AvroRuntimeException: Unknown datum class: class org.codehaus.jackson.node.NullNode;
      at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
      at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
      at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
      at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:346)
      at org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:166)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
      at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
      at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3680)
    

Why are the changes needed?

For compatibility with Avro 1.9.x and Avro 1.10.0.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Build and run Spark test:

mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.execution.HiveDDLSuite test -pl sql/hive

if (schemaField.schema().getType() == Schema.Type.RECORD) {
for (Schema.Field field : schemaField.schema().getFields()) {
fields.add(new Schema.Field(field.name(), field.schema(), field.doc(), nullDefault));
}
} else {
fields.add(new Schema.Field(schemaField.name(), schemaField.schema(), schemaField.doc(),
nullDefault));
nullDefault));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum
Copy link
Member Author

wangyum commented Nov 27, 2020

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @wangyum !

Comment on lines +77 to 78
return new Schema.Field(name, createAvroSchema(typeInfo), comment, JsonProperties.NULL_VALUE);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix, @wangyum and all. The patch looks reasonable and safe to me too.

BTW, all of 25 failures are irrelevant?

Existing failures - 25

@sunchao
Copy link
Member

sunchao commented Nov 30, 2020

Yes @dongjoon-hyun , these test failures have been there since 2.3.7 release. I do plan to take a look at them later.

@wangyum I believe the issue exists in the master branch as well? if so, can we make this PR against the master and backport to branch-2.3/branch-3.1 later once that is merged?

@wangyum
Copy link
Member Author

wangyum commented Dec 1, 2020

This is for master branch: #1722

@sunchao sunchao closed this Dec 1, 2020
@sunchao
Copy link
Member

sunchao commented Dec 1, 2020

Closing this one since #1722 is merged and backported to branch-2.3

@wangyum wangyum deleted the HIVE-24436 branch December 2, 2020 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants