Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add show external tables #3279

Merged
merged 4 commits into from
Sep 7, 2022
Merged

Add show external tables #3279

merged 4 commits into from
Sep 7, 2022

Conversation

psvri
Copy link
Contributor

@psvri psvri commented Aug 27, 2022

Which issue does this PR close?

Closes #2848.

Rationale for this change

Show create table works for external tables

What changes are included in this PR?

Show create table now works with external tables

Are there any user-facing changes?

Yes

@github-actions github-actions bot added core Core datafusion crate logical-expr Logical plan and expressions sql labels Aug 27, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2022

Codecov Report

Merging #3279 (5cc896d) into master (92110dd) will decrease coverage by 0.42%.
The diff coverage is 80.72%.

@@            Coverage Diff             @@
##           master    #3279      +/-   ##
==========================================
- Coverage   85.89%   85.47%   -0.43%     
==========================================
  Files         294      294              
  Lines       53373    54099     +726     
==========================================
+ Hits        45845    46241     +396     
- Misses       7528     7858     +330     
Impacted Files Coverage Δ
datafusion/expr/src/logical_plan/plan.rs 77.37% <ø> (-1.36%) ⬇️
datafusion/proto/src/logical_plan.rs 17.12% <0.00%> (-0.24%) ⬇️
datafusion/core/src/execution/context.rs 78.87% <71.42%> (+0.63%) ⬆️
...fusion/optimizer/src/pre_cast_lit_in_comparison.rs 94.00% <85.71%> (+10.67%) ⬆️
datafusion/core/src/datasource/listing/table.rs 90.21% <100.00%> (+1.08%) ⬆️
datafusion/core/tests/sql/information_schema.rs 98.60% <100.00%> (+0.05%) ⬆️
datafusion/sql/src/parser.rs 84.38% <100.00%> (+0.41%) ⬆️
datafusion/sql/src/planner.rs 80.36% <100.00%> (-0.30%) ⬇️
datafusion/expr/src/expr_visitor.rs 53.68% <0.00%> (-10.07%) ⬇️
datafusion/expr/src/expr.rs 75.83% <0.00%> (-9.57%) ⬇️
... and 46 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@liukun4515
Copy link
Contributor

Thanks @psvri
Is there any SQL syntax for this function?
It's better to add doc for this feature and it's friendly for reviewers

@psvri
Copy link
Contributor Author

psvri commented Aug 27, 2022

This dosent add new syntax.
Previous create show table <external_table> was not showing the table definition.
This PR is intended to fix that.

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the syntax being show create external table vs just show external table? Show Create has two verbs in one sentence and seems like a confusing choice.

Also, I'm wary of passing duplicated information (the raw SQL + the result of that raw SQL) - it seems better to store the information once (in the registered table), and regenerate the SQL from that in a normalized fashion.

@avantgardnerio
Copy link
Contributor

Is there a reason for the syntax being

Oh, interesting, I see this follows the spark precedent.

@liukun4515
Copy link
Contributor

This dosent add new syntax. Previous create show table <external_table> was not showing the table definition. This PR is intended to fix that.

Can you add some SQL level test for this fix? @psvri

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing @psvri -- I have been away. We are working on increasing the review capacity for this project as well as those who can commit PRs

All in all, I really like the feature -- I have one style suggestion

"+---------------+--------------+------------+-------------------------------------------------------------------------------------------------+",
"| table_catalog | table_schema | table_name | definition |",
"+---------------+--------------+------------+-------------------------------------------------------------------------------------------------+",
"| datafusion | public | abc | CREATE EXTERNAL TABLE abc STORED AS CSV LOCATION ../../testing/data/csv/aggregate_test_100.csv |",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this is great

pub fn try_new(config: ListingTableConfig) -> Result<Self> {
pub fn try_new(
config: ListingTableConfig,
definition: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only thing I would prefer to change in this PR. Specifically, rather than change the API of try_new I think we could add a function like

pub fn with_definition(mut self, defintion: Option<String>) -> Self {
...
}

This change I think would reduce the size of this PR significantly

Since I realize this PR has been waiting for a week for feedback I will propose this change via PR

@alamb
Copy link
Contributor

alamb commented Sep 3, 2022

Here is my proposed change (targeting this PR) : psvri#1

Use builder-style API to set ListingTable definition
@psvri
Copy link
Contributor Author

psvri commented Sep 3, 2022

NP @alamb

I too was on a break for a while. Thanks for PR. I will merge it.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @psvri

@alamb
Copy link
Contributor

alamb commented Sep 4, 2022

@avantgardnerio you had an outstanding comment on this PR, I believe -- can you please review again and see if this meets your expectations now?

@avantgardnerio
Copy link
Contributor

review again and see if this meets your expectations now?

@alamb and @psvri , my concern was regarding the implementation based upon retaining the original raw SQL string. The issue this is resolving had a suggested implementation:

This will probably require implementing Display for DFStatement to convert it back into SQL, or generating SQL in register_listing_table.

And that was more in line with my thoughts:

  • we parse it into fields, and store those fields
  • we also store the original parse string

This seems strange to me, in that it will carry over irrelevant syntax differences, and also stores redundant information. When I ask postgres to show a table, I don't get my raw SQL back, I get back generated SQL that matches the definition postgres has an internal representation for.

It bothers me, but I'm not sure if I'm being pedantic I'll leave it to you @alamb .

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

This seems strange to me, in that it will carry over irrelevant syntax differences, and also stores redundant information. When I ask postgres to show a table, I don't get my raw SQL back, I get back generated SQL that matches the definition postgres has an internal representation for.

I agree that most other systems store some sort of "canonicalized" view definition. The sqlparser-rs code offers that the output of Display can be parsed back into the same structure, so I think we could simply use that property (or potentially store the view as a serialized LogicalPlan).

It bothers me, but I'm not sure if I'm being pedantic I'll leave it to you @alamb .

I think the question of "how do we store a view definition" is somewhat orthogonal to the feature in this PR to show that view definition (though they are of course related!) and I think we can handle it in a follow on

Thus, I plan to file an issue where we can discus the "what to use / how to store a view definition" issue and then merge this PR.

This may also be related to the discussion we are having about view implementation on #3249 (comment)

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

🤔 so I actually tried out 'show create table' with a view and it already does canonicalize the SQL, it seems:

DataFusion CLI v11.0.0
❯ create table t as select * from (values (1), (2), (3)) as sq;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
| 3       |
+---------+
3 rows in set. Query took 0.050 seconds.
❯ create view v as select column1 /*my random comment*/ from t where column1 != 1;
0 rows in set. Query took 0.001 seconds.
❯ show create table v;
+---------------+--------------+------------+-----------------------------------------------------------+
| table_catalog | table_schema | table_name | definition                                                |
+---------------+--------------+------------+-----------------------------------------------------------+
| datafusion    | public       | v          | CREATE VIEW v AS SELECT column1 FROM t WHERE column1 <> 1 |
+---------------+--------------+------------+-----------------------------------------------------------+
1 row in set. Query took 0.015 seconds.

(note the lack of /*my random comment*/ and the transformation of != to <>)

I believe this means that DataFusion is storing the canonicalized SQL string from sqlparser

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

Ok, so now I am really confused. When I tried to run show create table locally for an external table in the datafusion-cli that didn't work 🤔

❯ CREATE EXTERNAL TABLE abc STORED AS /*random comment*/ CSV LOCATION '../testing/data/csv/aggregate_test_100.csv';
0 rows in set. Query took 0.063 seconds.
❯ show create table abc;
+---------------+--------------+------------+------------+
| table_catalog | table_schema | table_name | definition |
+---------------+--------------+------------+------------+
| datafusion    | public       | abc        |            |
+---------------+--------------+------------+------------+

@@ -443,6 +443,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
&self,
statement: CreateExternalTable,
) -> Result<LogicalPlan> {
let definition = Some(statement.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to_string() call converts the parsed statement into a string.

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove
Copy link
Member

Thanks @psvri. Could you rebase or upmerge to fix the conflicts?

@andygrove andygrove merged commit 0084aeb into apache:master Sep 7, 2022
@ursabot
Copy link

ursabot commented Sep 7, 2022

Benchmark runs are scheduled for baseline = 7c04964 and contender = 0084aeb. 0084aeb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions sql
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement "SHOW CREATE TABLE" for external tables
7 participants