-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] Extract catalog functionality from the core to make it more modular #10782
Comments
If this seems like a reasonable idea to people I will file tickets to break down the work cc @andygrove @jayzhan211 @comphead @mustafasrepo for your thoughts |
I agree with this direction. But now this seems hard to achieve because |
Thanks @alamb for starting this discussion. Like @lewiszlw correctly mentioned, we got some coupling between providers and the core. I'm just trying to understand the usecase when providers needed without the core |
Is it due to the complexity of |
In my mind the real use usecase is to more easily use datafusion without having to bring in all the dependencies of LIstingTable (like parquet, avro, json, etc) So the real usecase is getting ListingTable out of the core. But since the catalog API is in the core now there is no way to get ListingTable out of the core without also first moving the catalog API
I think both the complexity of ListingTable but also because if its dependency tree (e.g. parquet-rs and avro and json and object_store and ...) For use cases like |
I agree @lewiszlw -- well put. I made a first PR to start detangling things here: #10794 (it just splits Longer term we would have to figure out where Maybe we could look into splitting out |
Hm... they probably thrive to have their own readers/writes perhaps other than arrow-rs implementation, that makes sense for me. And yes, if DF stands for extensibility we should make this happen. Not sure how difficult that can be though. We probably need to start with replacing core abstractions with traits instead of implementations to decouple it. |
Yes something like this -- I think most of the traits already exist (e.g. |
Is your feature request related to a problem or challenge?
As @goldmedal started trying to move the DynamicFileProvider so others could use it in #10745 I think it is clear that there is not a good way to add additional catalog support in the core without everything being intertwined.
Thus I think we should try and extract the different catalog providers out of datafusion core so it it easier
Describe the solution you'd like
I suggest the following final layout:
CatalogProvider
,SchemaProvider
, etc in a new cratedatafusion-catalog
(since these traits rely on table provider, etc I think this can't be indatafusion-common
ordatafusion-expr
)Memory*
providers are indatafusion-catalog
InformationSchema
providers are indatafusion-catalog
DynamicFileCatalog
indatafusion-catalog
datafusion-catalog-listing
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: