Skip to content

[SUPPORT] query hudi table with Spark SQL on Hive return empty result #6659

@guanlisheng

Description

@guanlisheng

Tips before filing an issue

  • Have you gone through our FAQs?
    Yes

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

  1. Hudi 0.9 on EMR-5.34.0
  2. Create and generate a table with DeltaStreamer from JsonKafkaSource
  3. use zeppelin and spark SQL to query the table with the following key parameters
%spark.conf
spark.app.name query_hudi_table
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.jars.packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
spark.sql.hive.convertMetastoreParquet false
  1. Presto 0.265 and Hive work well to query the table.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

Hope spark SQL would query the data normally just like Presto and Hive SQL

Environment Description

  • Hudi version : 0.9

  • Spark version : 2.4.8

  • Hive version :2.3.8

  • Hadoop version : Amazon 2.10.1

  • Storage (HDFS/S3/GCS..) : S3

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace
no errors.

Add the stacktrace of the error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions