Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support registration of custom TableProviders through SQL #3310

Closed
avantgardnerio opened this issue Aug 31, 2022 · 7 comments · Fixed by #3311
Closed

Support registration of custom TableProviders through SQL #3310

avantgardnerio opened this issue Aug 31, 2022 · 7 comments · Fixed by #3311
Labels
enhancement New feature or request

Comments

@avantgardnerio
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As a DataFusion / Ballista user, one awesome feature is that I can create custom table providers and register them on the context to support datasources not built into DataFusion itself (e.g. DeltaLake). However, if I make my own server (i.e. Ballista fork or Ballista with plugins) I cannot allow my users to register arbitrary tables at run time through SQL - I need to hard-code those into my app.

Describe the solution you'd like

I would like to extend the create external table syntax to allow registration of arbitrary table types.

Describe alternatives you've considered

  • Don't allow users to register custom tables at runtime
  • Use a different syntax than create external table
@milenkovicm
Copy link
Contributor

milenkovicm commented Nov 7, 2022

Sorry for commenting on closed thread, question for @avantgardnerio and/or @alamb

question, should datafusion::datasource::datasource::TableProviderFactory::create have Optional<DFSchemaRef> or just DFSchemaRef as a parameter to support queries where schema can be provided as part of crate statement CREATE EXTERNAL TABLE demo_table (c1 integer, c2 integer) STORED AS ... ?

@avantgardnerio
Copy link
Contributor Author

@milenkovicm I think that's not a bad idea, and some versions of the PR did exactly that... however I do think there's one choice to be made:

  1. do we add an optional parameter to the existing async method?
  2. or do we make a new synchronous method (perhaps with_schema) to do the same?

I would be fine either way, but I think the purpose of the async part was to allow providers to access the network to infer the schema, so it might not be needed if the schema is provided.

@milenkovicm
Copy link
Contributor

milenkovicm commented Nov 7, 2022

thanks @avantgardnerio ,

I believe first option may make more sense, might be easier api to digest.

Other small note, as I'm working on a branch where I've added this change Optional may be redundant as empty schema will be provided by context.rs

@avantgardnerio
Copy link
Contributor Author

empty schema will be provided by context.rs

FWIW, I would advocate for the presence of an empty schema to be represented differently than the absence of a schema.

@milenkovicm
Copy link
Contributor

@avantgardnerio do you mean changing CreateExternalTable::schema from DFSchemaRef to Optional<DFSchemaRef>?

@avantgardnerio
Copy link
Contributor Author

I meant as a concept in general - not using "" or 0 or [] to mean None. But specifically here I meant

pub trait TableProviderFactory: Sync + Send {
    /// Create a TableProvider given name and url
    fn create(&self, name: &str, url: &str, schema: Option<DfSchemaRef>) -> Arc<dyn TableProvider>;
}

vs

pub trait TableProviderFactory: Sync + Send {
    /// Create a TableProvider given name and url
    /// 
    /// Arguments:
    /// schema - the schema if known, otherwise an empty schema object to be populated later
    fn create(&self, name: &str, url: &str, schema: DfSchemaRef) -> Arc<dyn TableProvider>;
}

@milenkovicm
Copy link
Contributor

@avantgardnerio please have a look at #4142 should support what we've discussed. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants