Skip to content

Spark JMH IcebergSourceFlatParquetDataWriteBenchmark failed with AnalysisException  #5990

@dramaticlly

Description

@dramaticlly

Apache Iceberg version

0.14.1 (latest release)

Query engine

Spark

Please describe the bug 🐞

Running iceberg spark3.3 benchmark and realized issue with date_add expression

executed

./gradlew -DsparkVersions=3.3 :iceberg-spark:iceberg-spark-3.3_2.12:jmh \
    -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark \
    -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt

seeing following error

# Run progress: 0.00% complete, ETA 00:00:00
# Fork: 1 of 1
# Warmup Iteration   1: <failure>

org.apache.spark.sql.AnalysisException: cannot resolve 'date_add(current_date(), (longCol % CAST(20 AS BIGINT)))' due to data type mismatch: argument 2 requires (int or smallint or tinyint) type, however, '(longCol % CAST(20 AS BIGINT))' is of bigint type.; line 1 pos 0;
'Project [longCol#2L, intCol#4, floatCol#7, doubleCol#11, decimalCol#16, date_add(current_date(Some(America/Los_Angeles)), (longCol#2L % cast(20 as bigint))) AS dateCol#22]
+- Project [longCol#2L, intCol#4, floatCol#7, doubleCol#11, cast(longCol#2L as decimal(20,5)) AS decimalCol#16]
   +- Project [longCol#2L, intCol#4, floatCol#7, cast(longCol#2L as double) AS doubleCol#11]
      +- Project [longCol#2L, intCol#4, cast(longCol#2L as float) AS floatCol#7]
         +- Project [longCol#2L, cast(longCol#2L as int) AS intCol#4]
            +- Project [id#0L AS longCol#2L]
               +- Range (0, 5000000, step=1, splits=Some(1))

I believe this is due to DATE_ADD expect a smallint or tinyint instead of longCol which is of bigint

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions