-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
The extent of the documentation for TableProvider::statistics in version 43.0.0 is:
Get statistics for this table, if available
This offers no explanation as to how the statistics will or will not be used.
A user with experience in analytical database engines writing a custom TableProvider implementation may suspect that TableProvider::statistics is used by the datafusion query optimizer to determine join orders, perhaps among other things.
However, this conclusion is apparently incorrect, which I deduce from the following pieces of evidence:
- I am a user fitting that description and found that my custom
TableProvider::statisticswas not called in the presence of a join query cargo check --workspace --testsruns with no errors if I remove thefn statisticsdeclaration from thetrait TableProviderdefinition- having found the source code for the rule which changes join orders it is clear that it calls
ExecutionPlan::statisticsinstead.
Expectation
The documentation should set appropriate expectations for what TableProvider::statistics is used for, so that developers can make informed choices about whether or not to implement it.
Additional context
The apparent answer to what TableProvider::statistics is used for is "nothing" based on the cargo check --workspace --tests comment above, but removing the trait method is a breaking change. Based on the slack discussion prior to filing this issue, at least one user is depending on TableProvider::statistics for their custom optimizer rules and removing it would require them to find a workaround.
Short of deprecating or removing the trait method, I would personally be satisfied just with updates to the method documentation.