Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17920][SPARK-19580][SPARK-19878][SQL] Backport PR 19779 to branch-2.2 - Support writing to Hive table which uses Avro schema url 'avro.schema.url' #19795

Closed

Conversation

@vinodkc
Copy link
Contributor

vinodkc commented Nov 22, 2017

What changes were proposed in this pull request?

Backport #19779 to branch-2.2

SPARK-19580 Support for avro.schema.url while writing to hive table
SPARK-19878 Add hive configuration when initialize hive serde in InsertIntoHiveTable.scala
SPARK-17920 HiveWriterContainer passes null configuration to serde.initialize, causing NullPointerException in AvroSerde when using avro.schema.url

Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex:
create external table avro_in (a string) stored as avro location '/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

create external table avro_out (a string) stored as avro location '/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

insert overwrite table avro_out select * from avro_in; // fails with java.lang.NullPointerException

WARN AvroSerDe: Encountered exception determining schema. Returning signal schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)

Changes proposed in this fix

Currently 'null' value is passed to serializer, which causes NPE during insert operation, instead pass Hadoop configuration object

How was this patch tested?

Added new test case in VersionsSuite

@vinodkc

This comment has been minimized.

Copy link
Contributor Author

vinodkc commented Nov 22, 2017

@cloud-fan

This comment has been minimized.

Copy link
Contributor

cloud-fan commented Nov 22, 2017

LGTM

@SparkQA

This comment has been minimized.

Copy link

SparkQA commented Nov 22, 2017

Test build #84107 has finished for PR 19795 at commit 63e40e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
asfgit pushed a commit that referenced this pull request Nov 22, 2017
…nch-2.2 - Support writing to Hive table which uses Avro schema url 'avro.schema.url'

## What changes were proposed in this pull request?

> Backport #19779 to branch-2.2

SPARK-19580 Support for avro.schema.url while writing to hive table
SPARK-19878 Add hive configuration when initialize hive serde in InsertIntoHiveTable.scala
SPARK-17920 HiveWriterContainer passes null configuration to serde.initialize, causing NullPointerException in AvroSerde when using avro.schema.url

Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex:
create external table avro_in (a string) stored as avro location '/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

create external table avro_out (a string) stored as avro location '/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

insert overwrite table avro_out select * from avro_in; // fails with java.lang.NullPointerException

WARN AvroSerDe: Encountered exception determining schema. Returning signal schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)
## Changes proposed in this fix
Currently 'null' value is passed to serializer, which causes NPE during insert operation, instead pass Hadoop configuration object
## How was this patch tested?
Added new test case in VersionsSuite

Author: vinodkc <vinod.kc.in@gmail.com>

Closes #19795 from vinodkc/br_Fix_SPARK-17920_branch-2.2.
@gatorsmile

This comment has been minimized.

Copy link
Member

gatorsmile commented Nov 22, 2017

Thanks! Merged to 2.2

Could you please close this?

@vinodkc

This comment has been minimized.

Copy link
Contributor Author

vinodkc commented Nov 23, 2017

Thank you

@vinodkc vinodkc closed this Nov 23, 2017
asfgit pushed a commit that referenced this pull request Nov 24, 2017
## What changes were proposed in this pull request?

A followup of  #19795 , to simplify the file creation.

## How was this patch tested?

Only test case is updated

Author: vinodkc <vinod.kc.in@gmail.com>

Closes #19809 from vinodkc/br_FollowupSPARK-17920_branch-2.2.
MatthewRBruce added a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…nch-2.2 - Support writing to Hive table which uses Avro schema url 'avro.schema.url'

## What changes were proposed in this pull request?

> Backport apache#19779 to branch-2.2

SPARK-19580 Support for avro.schema.url while writing to hive table
SPARK-19878 Add hive configuration when initialize hive serde in InsertIntoHiveTable.scala
SPARK-17920 HiveWriterContainer passes null configuration to serde.initialize, causing NullPointerException in AvroSerde when using avro.schema.url

Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex:
create external table avro_in (a string) stored as avro location '/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

create external table avro_out (a string) stored as avro location '/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

insert overwrite table avro_out select * from avro_in; // fails with java.lang.NullPointerException

WARN AvroSerDe: Encountered exception determining schema. Returning signal schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)
## Changes proposed in this fix
Currently 'null' value is passed to serializer, which causes NPE during insert operation, instead pass Hadoop configuration object
## How was this patch tested?
Added new test case in VersionsSuite

Author: vinodkc <vinod.kc.in@gmail.com>

Closes apache#19795 from vinodkc/br_Fix_SPARK-17920_branch-2.2.
MatthewRBruce added a commit to Shopify/spark that referenced this pull request Jul 31, 2018
## What changes were proposed in this pull request?

A followup of  apache#19795 , to simplify the file creation.

## How was this patch tested?

Only test case is updated

Author: vinodkc <vinod.kc.in@gmail.com>

Closes apache#19809 from vinodkc/br_FollowupSPARK-17920_branch-2.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.