Skip to content

[Bug]: Accessing Hive using JdbcIO throws HivePreparedStatement.getMetaData, method not supported #26535

@dabaizzt

Description

@dabaizzt

What happened?

My issue:

I'm trying to access hive table using beam JDBCIO. But below exception has thrown while connecting to hive:

java.lang.RuntimeException: Error while determining columns from table: test_date_output
    at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid.getFilteredFields (JdbcIO.java:2117)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid.expand (JdbcIO.java:2064)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$Write.expand (JdbcIO.java:1698)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$Write.expand (JdbcIO.java:1596)
    at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:548)
    at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:482)
    at org.apache.beam.sdk.values.PCollection.apply (PCollection.java:360)
    at com.bytedance.eprivacy.data_anonymization_pipeline.io.jdbc.JdbcWrite.WriteToJdbc (JdbcWrite.java:10)
    at com.bytedance.eprivacy.data_anonymization_pipeline.pipeline.PipelineFactory.setupOutput (PipelineFactory.java:73)
    at com.bytedance.eprivacy.data_anonymization_pipeline.pipeline.PipelineFactory.getPipeline (PipelineFactory.java:47)
    at com.bytedance.eprivacy.data_anonymization_pipeline.DataAnonymizationPipeline.main (DataAnonymizationPipeline.java:15)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:748)
Caused by: java.sql.SQLFeatureNotSupportedException: Method not supported
    at org.apache.hive.jdbc.HivePreparedStatement.getMetaData (HivePreparedStatement.java:201)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.getMetaData (DelegatingPreparedStatement.java:151)
    at org.apache.commons.dbcp2.DelegatingPreparedStatement.getMetaData (DelegatingPreparedStatement.java:151)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid.getFilteredFields (JdbcIO.java:2114)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$WriteVoid.expand (JdbcIO.java:2064)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$Write.expand (JdbcIO.java:1698)
    at org.apache.beam.sdk.io.jdbc.JdbcIO$Write.expand (JdbcIO.java:1596)
    at org.apache.beam.sdk.Pipeline.applyInternal (Pipeline.java:548)
    at org.apache.beam.sdk.Pipeline.applyTransform (Pipeline.java:482)
    at org.apache.beam.sdk.values.PCollection.apply (PCollection.java:360)
    at com.bytedance.eprivacy.data_anonymization_pipeline.io.jdbc.JdbcWrite.WriteToJdbc (JdbcWrite.java:10)
    at com.bytedance.eprivacy.data_anonymization_pipeline.pipeline.PipelineFactory.setupOutput (PipelineFactory.java:73)
    at com.bytedance.eprivacy.data_anonymization_pipeline.pipeline.PipelineFactory.getPipeline (PipelineFactory.java:47)
    at com.bytedance.eprivacy.data_anonymization_pipeline.DataAnonymizationPipeline.main (DataAnonymizationPipeline.java:15)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:748)


Below code is trying to access hive:

pipeline.apply(JdbcIO.<Row>readWithPartitions().withDataSourceConfiguration(
                    JdbcIO.DataSourceConfiguration.create("org.apache.hive.jdbc.HiveDriver", "jdbc:hive2://<ip>/mydb")
                            .withUsername("username").withPassword("password"))
                    .withTable("test_date_output").withPartitionColumn("id")
                    .withRowOutput());

beam version
<beam.version>2.43.0</beam.version>

bug analysis:
It seems that the method getFilteredFields used PreparedStatement to deal the sql query result, but hive driver has not implement PreparedStatement.getMetadata()
pic1

pic2

demand
although accessing hive table by hive metastore is ok, it is more urgent for us to access hive table by hive server2. Thanks for checking and look forward to your reply.

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions