-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Problem Description
It seems to be possible to create an invalid Iceberg table using ice --watch --force-no-copy. Here's the sequence of commands.
Step 1: Set up S3 bucket and SQS queue as described in ice/examples/s3watch.
Step 2: Start ice process. Note: I use slightly different environmental variables from the ice example.
ice insert blog.tripdata_watch -p --force-no-copy --skip-duplicates \
"s3://$CATALOG_BUCKET/WATCH/blog/tripdata_watch/*.parquet" \
--watch="$CATALOG_SQS_QUEUE_URL"
Step 3: Write parquet data to file location from ClickHouse.
INSERT INTO FUNCTION s3('s3://rhodges-ice-rest-catalog-demo/WATCH/blog/tripdata_watch/{_partition_id}.parquet', 'Parquet') PARTITION BY concat('month=', month)
SELECT
*,
toYYYYMM(pickup_date) AS month
FROM tripdata
WHERE month IN (201602, 201603)
Step 4: Select from the table. An error message like the following results.
SELECT
count(),
avg(passenger_count),
avg(trip_distance)
FROM ice.`blog.tripdata_watch`
SETTINGS input_format_parquet_use_native_reader_v3 = 1, object_storage_cluster = 'swarm'
Received exception from server (version 25.8.9):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: Received from chi-swarm-example-1-0-0.chi-swarm-example-1-0.antalya.svc.cluster.local:9000. DB::Exception: Failed to get object info: No response body.. HTTP response code: 404: while reading blog/tripdata_watch/month=201603.parquet: While executing ReadFromObjectStorage. (S3_ERROR)
Notes and helpful information.
- Using ice 0.8.1. and Antalya 25.8.9.20207.
- The table exists. You can run
ice scan blog.tripdata_watchandice describe blog.tripdata_watch. - This problem does not occur if you just create a different table using --force-no-copy but without --watch. The following command worked:
ice insert blog.tripdata_nocopy -p --thread-count=12 \
--force-no-copy \
--partition='[{"column":"month"}]' \
"s3://$CATALOG_BUCKET/PARQUET/*.parquet"
- Once this problem happens the table gets into a strange state and cannot recover. I tried to delete the table manually with
ice delete-table --purge, rewrite the data to S3, and create it again with a command like above. Queries failed with the same error.
It's hard to tell if this is just a ClickHouse bug or if ice is somehow also involved. I logged it on the ice project for now.
Metadata
Metadata
Assignees
Labels
No labels