-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
engine:sparkSpark integrationSpark integrationtype:bugBug reports and fixesBug reports and fixestype:community-supportCommunity-relatedCommunity-related
Milestone
Description
Bug Description
What happened:
The "column stat expression index" functionality, as implemented in the provided Scala code example within the Spark Quick Start Guide, is not performing its intended optimization or yielding the expected results.
scala> // Query on ts column would prune the data using the idx_column_ts index
scala> spark.sql(s"SELECT * FROM hudi_indexed_table WHERE from_unixtime(ts, 'yyyy-MM-dd') = '2023-09-24'").show(false);
25/11/24 11:20:31 WARN CacheManager: Asked to cache already cached data.
25/11/24 11:20:32 WARN CacheManager: Asked to cache already cached data.
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+-----+------+----+----+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|ts |uuid|rider|driver|fare|city|
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+-----+------+----+----+
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+-----+------+----+----+What you expected:
I expected the Scala code to successfully implement and utilize the column stat expression index, resulting in the anticipated query optimization and improved performance (e.g., predicate pushdown or faster data filtering) as documented in the Quick Start Guide.
Steps to reproduce:
- Follow the Spark quick start guide index example (https://hudi.apache.org/docs/quick-start-guide#indexing)
- Query the table data and you will see empty results.
Environment
Hudi version: 1.1.0
Query engine: (Spark/Flink/Trino etc) Spark
Relevant configs:
Logs and Stack Trace
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
engine:sparkSpark integrationSpark integrationtype:bugBug reports and fixesBug reports and fixestype:community-supportCommunity-relatedCommunity-related