Skip to content

[datafusion-cli] Implement average LIST duration for object store profiling #18138

@alamb

Description

@alamb

In #18103 @BlakeOrth implemented the ability to trace LIST object store requests

Basically you can do something like

\object_store_profiling trace

CREATE EXTERNAL TABLE overture_partitioned
STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/';

select * from overture_partitioned limit 10;

And see

Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2)
2025-10-17T17:44:40.565961+00:00 operation=List path=release/2025-09-24.0/theme=addresses <--- NOTE NO DURATION ON THESE LINES
2025-10-17T17:44:40.893970+00:00 operation=List path=release/2025-09-24.0/theme=addresses
2025-10-17T17:44:41.003815+00:00 operation=List path=release/2025-09-24.0/theme=addresses
2025-10-17T17:44:41.110422+00:00 operation=Get duration=0.151821s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet
...

However, due to this code there is no way to see duration of the LIST command:

duration: None, // list returns a stream, so the duration isn't meaningful

Yes, not being able to easily evaluate a meaningful duration from this is a pretty big bummer honestly. I think time to first response is probably the ideal measurement to take here. I briefly looked into what it would take to make that happen within this instrumented store and I think it ends up being quite complex. I'm pretty sure we'd have to write a custom future to wrap the elements within the stream since the duration is only meaningful once elements in the stream start reporting Poll::Ready. Hopefully there's an easier way, because that sounds pretty painful.

Originally posted by @BlakeOrth in #18103 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions