Skip to content

Improve duration accounting in datafusion-cli's instrumented object store #18232

@BlakeOrth

Description

@BlakeOrth

Is your feature request related to a problem or challenge?

As noted in the comment chain here:

The duration statistic reported by some of the instrumented object store's methods, while technically accurate, can potentially be misleading for users. E.g. the duration reported for a put_multipart is the duration the backing object store spent initiating a multipart put session with the backing store, as opposed to the duration actually spent pushing data to the backing store. Users would likely expect the duration to be the latter since that's the portion of the process where actual "work" with the backing store is being done. Additionally, any duration based caveats are not readily apparent without understanding both the instrumentation code in datafusion as well as some understanding of how operations work in object_store.

Considering the instrumented object store is currently mostly a development/debug utility the above caveats are likely tolerable, however improving/scrutinizing the accounting for the collected and reported durations would allow the instrumented object store to be more useful in profiling operations that are strictly focused on runtime duration of operations.

Describe the solution you'd like

I would like to have additional logic added to the instrumented object store that helps the duration statistics that are collected and reported to be in line with an end-user's expectations.

Describe alternatives you've considered

If the goal is just to make sure the duration stats that are reported are not misleading duration could be omitted from various operations (and subsequently accounted for when computing summary statistics). This would help the reported statistics not be misleading, but it would also reduce the granularity of reporting which seems somewhat undesirable.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions