-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Allow listing tables to be created via TableFactories #4112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@andygrove and @alamb sorry for the immediate follow-on PR. I did not realize we could merge Ballista PRs pointed at git refs of Datafusion or I would have done these two as a single PR. |
datafusion-cli/src/main.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register tables on startup.
datafusion-cli/src/main.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb this is getting closer to something I think you asked for earlier: treating ListingTables and custom tables the same way and just registering factories for table types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to add SessionState to allow ListingTables to load their schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deltatables are always folders, but .csvs should have their extension removed from their table name (and unfortunately the Path method to do this is marked unstable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which method? Maybe we can contribute something back upstream to object_store 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.file_prefix
For some reason the github UI didn't let me respond in thread until now :/
|
@andygrove I think you were requesting this feature ☝️ |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me -- is there any way we can write a test for this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which method? Maybe we can contribute something back upstream to object_store 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 it would be really neat to somehow combine the logic in ListingTable and ListingTableFactory (or maybe datafusion-cli could just use the factory -- not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combine the logic
I think this is the duplicated fragment. I would love to combine those two and only have register_table instead of create_listing_table() and create_custom_table().
|
I know tests are somewhat painful to write, but if we don't have them I worry about breaking the functionality at some future time Maybe making a test as an example in datafusion-examples https://github.com/apache/arrow-datafusion/tree/master/datafusion-examples/examples would be a good way to
|
Done. I also filed this: #4114 . It's my own personal fault, so I understand if we don't want to inflict that on everyone. |
datafusion-cli/src/main.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add TableFactorys for default formats.
datafusion-cli/src/object_storage.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this test to have a better failure message because it was failing for me locally. The fact that CI passed makes me think it's not running there because datafusion-cli is excluded from the workspace - I don't know why this is, but would propose we include it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please file a ticket to do so? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this code to a common function on the context which we can use from datafusion-cli, tests, Ballista, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not found is a hard error message to ctrl-f for. Adding the word table will hopefully make this statistically more likely to be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proof is in the pudding. Can't select from a table without registering it first, so this must be auto-registered.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @avantgardnerio
I also merged this PR to master locally and all the tests still pass for me
I left a few comments, but I don't think any of them are required to merge this PR
| Err(e) => format!("{}", e), | ||
| Ok(_) => "".to_string() | ||
| }; | ||
| assert_eq!("".to_string(), msg); // Fail with error message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please file a ticket to do so? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Finds any ListSchemaProviders and instructs them to reload tables from "disk" | |
| /// Invokes `ListingSchemaProvider::reload()` for all registered providers |
| } | ||
|
|
||
| /// Finds any ListSchemaProviders and instructs them to reload tables from "disk" | ||
| pub async fn refresh_catalogs(&self) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also consider using assert_batches_eq here in this test
Then if it's alright with you, I'd love to get it merged rather than wait for another round of CI checks. Thanks for thoroughly reviewing it! |
|
Benchmark runs are scheduled for baseline = c1fc732 and contender = 7b5842b. 7b5842b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |


Which issue does this PR close?
Closes #4111.
Rationale for this change
Described in issue.
What changes are included in this PR?
Described in issue.
Are there any user-facing changes?
They can register CSVs, parquets, jsons and avros from the datafusion-cli.