Skip to content

[native_datafusion] [Spark SQL Tests] input_file_name() returns empty string #3312

@andygrove

Description

@andygrove

Summary

3 Spark SQL tests fail because input_file_name() returns an empty string when native_datafusion scan is used.

Failing Tests

  • UDFSuite: "SPARK-8005 input_file_name"
  • ColumnExpressionSuite: "input_file_name, input_file_block_start, input_file_block_length - FileScanRDD"
  • ExtractPythonUDFsSuite: "Python UDF should not break column pruning/filter pushdown -- Parquet V1"

Root Cause

native_datafusion scan doesn't populate the input_file_name() / input_file_block_start() / input_file_block_length() metadata that Spark's FileSourceScanExec provides. This is a functional gap.

Possible Fix

In CometScanRule.nativeDataFusionScan(), detect when the query references InputFileName, InputFileBlockStart, or InputFileBlockLength expressions and fall back to native_iceberg_compat.

Related

Discovered in CI for #3307 (enable native_datafusion in auto scan mode).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions