Skip to content

Undetermined behavior when fetching from iceberg table #13873

@liurenjie1024

Description

@liurenjie1024

Apache Iceberg version

1.6.1

Query engine

Spark

Please describe the bug 🐞

  1. Run mkdir warehouse/default to create dir
  2. Download attachment for table data
  3. Run tar -xzf iceberg.tar.gz && mv tmp/* /tmp/ to put table data into dir
  4. Start spark shell
/home/ubuntu/Apps/spark-3.5.4-bin-hadoop3/bin/spark-sql \
    --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.2\
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.spark_catalog.type=hadoop \
    --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/spark-warehouse-28428
  1. Run count(*) it shows 217 rows.
spark-sql (default)> select count(*) from tmp_table_gw0_754291685_0;
217
Time taken: 2.33 seconds, Fetched 1 row(s)
  1. Run select * from tmp_table_gw0_754291685_0 it reports fetching rows not 750 rows.
spark-sql (default)> select * from tmp_table_gw0_754291685_0;
...
Time taken: 0.213 seconds, Fetched 327 row(s)

The oddest part is that the difference only happens when the first time you run select * after select count(*), after that when you run select * everything is back to normal.

I tested againts 1.6.1, 1.7.2, and 1.9.2, buth all failed.

iceberg.tar.gz

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions