Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHOW CREATE TABLE with initial support for views #2830

Merged
merged 8 commits into from
Jul 7, 2022

Conversation

mrob95
Copy link
Contributor

@mrob95 mrob95 commented Jul 2, 2022

Which issue does this PR close?

Closes #2529 .

Rationale for this change

What changes are included in this PR?

  • Add the "SHOW CREATE TABLE" SQL command. It was suggested in Implement some way to show the sql used to create a view #2529 to use "SHOW CREATE VIEW" as MySQL does, but this is not supported by sqlparser-rs. "SHOW CREATE TABLE" is what spark uses for both tables and views and using the same command for both seems simpler than having separate ones.
  • Pass SQL queries through SqlToRel::statement_to_plan and LogicalPlan::CreateView to be stored on ViewTable, which is registered in the catalog.
  • Add a new create_statement method to the TableProvider trait that allows table providers to optionally expose the SQL query used to create them.
  • Add LogicalPlan::ShowCreateTable and ShowCreateTableExec to look the table up, get the TableProvider, get the create_statement and output it.
  • Add create_statement to information_schema.tables
  • Rewrite "SHOW CREATE TABLE" queries to select from information_schema.tables

This PR only supports views at the moment, but is designed to hopefully make it fairly easy to extend to tables and potentially other data sources. They would just need to implement create_statement.

Passing the SQL string down through SqlToRel::statement_to_plan alongside the parsed Statement seems a little awkward. An alternative would be to use format!("{}", statement) which would produce valid SQL, but not the exact SQL that was used to create the table/view. It doesn't seem like either MySQL or spark try to guarantee that the SQL return from SHOW.. is literally the same string as was used in CREATE. Would be good to get some opinions on this.

Would be happy to take feedback on the structure of this if there is a better way of doing it - I spent a while reading through CreateView, Explain, Analyze etc to figure out where exactly the execution of this should be handled but may have missed something :). I think the main question is whether to put the creation statement in information_schema or to have "SHOW CREATE TABLE" get it from the table provider directly. I like having it in information_schema because it makes the implementation of "SHOW CREATE TABLE" simple and because exposing info in tables allows the user to slice and dice them as they wish. The original commit of this PR did it the other way though (with new logical and physical plan nodes for ShowCreateTable), so could easily revert if that is better.

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner labels Jul 2, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 2, 2022

Codecov Report

Merging #2830 (40ba555) into master (88b88d4) will decrease coverage by 0.04%.
The diff coverage is 91.88%.

❗ Current head 40ba555 differs from pull request most recent head 8f1141d. Consider uploading reports for the commit 8f1141d to get more accurate results

@@            Coverage Diff             @@
##           master    #2830      +/-   ##
==========================================
- Coverage   85.26%   85.21%   -0.05%     
==========================================
  Files         275      275              
  Lines       48830    48907      +77     
==========================================
+ Hits        41633    41676      +43     
- Misses       7197     7231      +34     
Impacted Files Coverage Δ
datafusion/core/src/catalog/schema.rs 84.48% <ø> (-0.52%) ⬇️
...afusion/core/src/physical_plan/file_format/avro.rs 0.00% <0.00%> (ø)
datafusion/core/src/physical_plan/mod.rs 88.00% <ø> (ø)
datafusion/expr/src/logical_plan/plan.rs 74.50% <ø> (+0.39%) ⬆️
datafusion/proto/src/logical_plan.rs 17.46% <0.00%> (-0.18%) ⬇️
datafusion/core/src/datasource/file_format/avro.rs 61.53% <50.00%> (-8.03%) ⬇️
datafusion/core/src/datasource/file_format/json.rs 93.75% <64.28%> (-5.13%) ⬇️
datafusion/common/src/error.rs 80.00% <66.66%> (-2.28%) ⬇️
datafusion/core/src/datasource/listing/mod.rs 55.55% <66.66%> (+10.10%) ⬆️
datafusion/sql/src/planner.rs 81.38% <73.33%> (-0.11%) ⬇️
... and 50 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88b88d4...8f1141d. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mrob95 -- this looks like a great start.

I would be happy with merging this PR in and doing any refactors as follow on PRs (like adding support for information_schema.views and removing create_statement column from information_schema.tables).

Let me know if you want to make any changes to this PR otherwise I will merge this PR in and file a follow on ticket.

👍 again 🙏

Comment 1

I don't think information_schema.tables.create_statement is the standard location for the view information. I think it is actually in a new information schema table information_table.views. For example, in mysql:

mysql> select table_catalog, table_schema, table_name, view_definition from information_schema.views where table_name = 'ff';
+---------------+--------------+------------+------------------------------------------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | VIEW_DEFINITION                                |
+---------------+--------------+------------+------------------------------------------------+
| def           | foo          | ff         | select count(0) AS `count(*)` from `foo`.`foo` |
+---------------+--------------+------------+------------------------------------------------+
1 row in set (0.00 sec)

Comment 2

Passing the SQL string down through SqlToRel::statement_to_plan alongside the parsed Statement seems a little awkward. An alternative would be to use format!("{}", statement) which would produce valid SQL, but not the exact SQL that was used to create the table/view.

I think using format is a good idea (or perhaps sql.to_string()) -- in general sqlparser-rs has fairly good coverage for round trip parsing (many of the test cases are implemented like that)

@mrob95
Copy link
Contributor Author

mrob95 commented Jul 5, 2022

I think using format is a good idea (or perhaps sql.to_string()) -- in general sqlparser-rs has fairly good coverage for round trip parsing (many of the test cases are implemented like that)

I agree, have made this change now.

I think it is actually in a new information schema table information_table.views. For example, in mysql:

This should be fairly simple to do in a separate PR. I also think it would be useful to implement "SHOW CREATE TABLE" for external tables so you can get a summary of which tables point at which files using what options.

@mrob95
Copy link
Contributor Author

mrob95 commented Jul 5, 2022

Do you think "definition" would be a better name than "create_statement"?

@@ -515,11 +515,16 @@ impl AsLogicalPlan for LogicalPlanNode {
"Protobuf deserialization error, CreateViewNode has invalid LogicalPlan input.",
)))?
.try_into_logical_plan(ctx, extension_codec)?;
let create_statement = match create_view.create_statement.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: matching on a bool could be simplified to if/else

@andygrove
Copy link
Member

Do you think "definition" would be a better name than "create_statement"?

I do find create_statement misleading because this method is not creating a statement. How about something like get_table_create_sql or maybe just get_table_sql ?

@mrob95
Copy link
Contributor Author

mrob95 commented Jul 6, 2022

I do find create_statement misleading because this method is not creating a statement. How about something like get_table_create_sql or maybe just get_table_sql ?

That makes sense for the TableProvider method, what about the name of the column in information_schema.tables? definition there to match MySQL?

@andygrove
Copy link
Member

I do find create_statement misleading because this method is not creating a statement. How about something like get_table_create_sql or maybe just get_table_sql ?

That makes sense for the TableProvider method, what about the name of the column in information_schema.tables? definition there to match MySQL?

Yes definition seems reasonable. Maybe the method name should use the same term ... get_table_definition ?

@mrob95
Copy link
Contributor Author

mrob95 commented Jul 6, 2022

Pushed an update

@alamb
Copy link
Contributor

alamb commented Jul 6, 2022

I think this looks good -- if @andygrove agrees I will merge this and file a follow on ticket to add information_schema.views and remove the definition column from tables

@alamb
Copy link
Contributor

alamb commented Jul 6, 2022

Thanks again @mrob95

@mrob95
Copy link
Contributor Author

mrob95 commented Jul 6, 2022

Thanks again @mrob95

My pleasure, thanks for the prompt and thorough reviews 👍

@andygrove
Copy link
Member

LGTM. Thanks @mrob95

@alamb
Copy link
Contributor

alamb commented Jul 8, 2022

Filed #2857 for information_schema.views

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement some way to show the sql used to create a view
4 participants