Skip to content

ice --watch --force-no-copy creates a table that ClickHouse cannot read #66

@hodgesrm

Description

@hodgesrm

Problem Description

It seems to be possible to create an invalid Iceberg table using ice --watch --force-no-copy. Here's the sequence of commands.

Step 1: Set up S3 bucket and SQS queue as described in ice/examples/s3watch.

Step 2: Start ice process. Note: I use slightly different environmental variables from the ice example.

ice insert blog.tripdata_watch -p --force-no-copy --skip-duplicates \
"s3://$CATALOG_BUCKET/WATCH/blog/tripdata_watch/*.parquet"  \
--watch="$CATALOG_SQS_QUEUE_URL"

Step 3: Write parquet data to file location from ClickHouse.

INSERT INTO FUNCTION s3('s3://rhodges-ice-rest-catalog-demo/WATCH/blog/tripdata_watch/{_partition_id}.parquet', 'Parquet') PARTITION BY concat('month=', month) 
SELECT
    *,
    toYYYYMM(pickup_date) AS month
FROM tripdata
WHERE month IN (201602, 201603)

Step 4: Select from the table. An error message like the following results.

SELECT
    count(),
    avg(passenger_count),
    avg(trip_distance)
FROM ice.`blog.tripdata_watch`
SETTINGS input_format_parquet_use_native_reader_v3 = 1, object_storage_cluster = 'swarm'

Received exception from server (version 25.8.9):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: Received from chi-swarm-example-1-0-0.chi-swarm-example-1-0.antalya.svc.cluster.local:9000. DB::Exception: Failed to get object info: No response body.. HTTP response code: 404: while reading blog/tripdata_watch/month=201603.parquet: While executing ReadFromObjectStorage. (S3_ERROR)

Notes and helpful information.

  • Using ice 0.8.1. and Antalya 25.8.9.20207.
  • The table exists. You can run ice scan blog.tripdata_watch and ice describe blog.tripdata_watch.
  • This problem does not occur if you just create a different table using --force-no-copy but without --watch. The following command worked:
ice insert blog.tripdata_nocopy -p --thread-count=12 \
 --force-no-copy \
 --partition='[{"column":"month"}]' \
 "s3://$CATALOG_BUCKET/PARQUET/*.parquet"
  • Once this problem happens the table gets into a strange state and cannot recover. I tried to delete the table manually with ice delete-table --purge, rewrite the data to S3, and create it again with a command like above. Queries failed with the same error.

It's hard to tell if this is just a ClickHouse bug or if ice is somehow also involved. I logged it on the ice project for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions