-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
As noted in the comment chain here:
The duration statistic reported by some of the instrumented object store's methods, while technically accurate, can potentially be misleading for users. E.g. the duration reported for a put_multipart is the duration the backing object store spent initiating a multipart put session with the backing store, as opposed to the duration actually spent pushing data to the backing store. Users would likely expect the duration to be the latter since that's the portion of the process where actual "work" with the backing store is being done. Additionally, any duration based caveats are not readily apparent without understanding both the instrumentation code in datafusion as well as some understanding of how operations work in object_store.
Considering the instrumented object store is currently mostly a development/debug utility the above caveats are likely tolerable, however improving/scrutinizing the accounting for the collected and reported durations would allow the instrumented object store to be more useful in profiling operations that are strictly focused on runtime duration of operations.
Describe the solution you'd like
I would like to have additional logic added to the instrumented object store that helps the duration statistics that are collected and reported to be in line with an end-user's expectations.
Describe alternatives you've considered
If the goal is just to make sure the duration stats that are reported are not misleading duration could be omitted from various operations (and subsequently accounted for when computing summary statistics). This would help the reported statistics not be misleading, but it would also reduce the granularity of reporting which seems somewhat undesirable.
Additional context
No response