-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
In #18103 @BlakeOrth implemented the ability to trace LIST object store requests
Basically you can do something like
\object_store_profiling trace
CREATE EXTERNAL TABLE overture_partitioned
STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/';
select * from overture_partitioned limit 10;And see
Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2)
2025-10-17T17:44:40.565961+00:00 operation=List path=release/2025-09-24.0/theme=addresses <--- NOTE NO DURATION ON THESE LINES
2025-10-17T17:44:40.893970+00:00 operation=List path=release/2025-09-24.0/theme=addresses
2025-10-17T17:44:41.003815+00:00 operation=List path=release/2025-09-24.0/theme=addresses
2025-10-17T17:44:41.110422+00:00 operation=Get duration=0.151821s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet
...
However, due to this code there is no way to see duration of the LIST command:
| duration: None, // list returns a stream, so the duration isn't meaningful |
Yes, not being able to easily evaluate a meaningful duration from this is a pretty big bummer honestly. I think time to first response is probably the ideal measurement to take here. I briefly looked into what it would take to make that happen within this instrumented store and I think it ends up being quite complex. I'm pretty sure we'd have to write a custom future to wrap the elements within the stream since the duration is only meaningful once elements in the stream start reporting
Poll::Ready. Hopefully there's an easier way, because that sounds pretty painful.
Originally posted by @BlakeOrth in #18103 (comment)