Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs:Add an example of CTAS with PARTITIONED BY (fix #3854) #4606

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/spark/spark-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,16 @@ USING iceberg
AS SELECT ...
```

The newly created table won't inherit the partition spec and table properties from the source table in SELECT, you can use PARTITIONED BY and TBLPROPERTIES in CTAS to declare partition spec and table properties for the new table.

```sql
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the paragraph below up here so it's before the sql snippet? Then I think this is good to merge 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samredai sure, here is what it looks like now

image

CREATE TABLE prod.db.sample
USING iceberg
PARTITIONED BY (part)
TBLPROPERTIES ('key'='value')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation should start with simple examples and add more complexity if needed.

In this case, I think it is valid to add a PARTITIONED BY example, but not by modifying the first simple example. Please add the new paragraph and a separate example after the existing introduction to CTAS.

AS SELECT ...
Comment on lines +107 to +109
Copy link
Contributor

@kbendick kbendick Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to either:

  1. Update the existing CTAS statement above to use partitioning plus tblproperties.
  2. Add a comment in between the two blocks indicating what's different between the two.

I would prefer (1) given that the CTAS syntax is not iceberg specific, this is just covering more Spark DML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @kbendick, thx for the advice. I made some changes base on ur suggestion, here is what it looks like now
cc @samredai

image

```

## `REPLACE TABLE ... AS SELECT`

Iceberg supports RTAS as an atomic operation when using a [`SparkCatalog`](../spark-configuration#catalog-configuration). RTAS is supported, but is not atomic when using [`SparkSessionCatalog`](../spark-configuration#replacing-the-session-catalog).
Expand Down