Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34011][SQL] Refresh cache in ALTER TABLE .. RENAME TO PARTITION #31044

Closed
wants to merge 3 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 5, 2021

What changes were proposed in this pull request?

  1. Invoke refreshTable() from AlterTableRenamePartitionCommand.run() after partitions renaming. In particular, this re-creates the cache associated with the modified table.
  2. Refresh the cache associated with tables from v2 table catalogs in the ALTER TABLE .. RENAME TO PARTITION command.

Why are the changes needed?

This fixes the issues portrayed by the example:

spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED BY (part0);
spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
spark-sql> CACHE TABLE tbl1;
spark-sql> SELECT * FROM tbl1;
0	0
1	1
spark-sql> ALTER TABLE tbl1 PARTITION (part0=0) RENAME TO PARTITION (part=2);
spark-sql> SELECT * FROM tbl1;
0	0
1	1

The last query must not return 0 2 since 0 0 was renamed by previous command.

Does this PR introduce any user-facing change?

Yes. After the changes for the example above:

...
spark-sql> ALTER TABLE tbl1 PARTITION (part=0) RENAME TO PARTITION (part=2);
spark-sql> SELECT * FROM tbl1;
0	2
1	1

How was this patch tested?

By running the affected test suite:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 5, 2021

@dongjoon-hyun @sunchao @imback82 Could you review this PR, please.

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Test build #133680 has finished for PR 31044 at commit 1922776.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (non-binding). Thanks @MaxGekk !

@github-actions github-actions bot added the SQL label Jan 5, 2021
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 6, 2021

Merged to master.

@HyukjinKwon
Copy link
Member

@MaxGekk, it has a conflict. Do you mind opening a backport PR?

MaxGekk added a commit to MaxGekk/spark that referenced this pull request Jan 6, 2021
1. Invoke `refreshTable()` from `AlterTableRenamePartitionCommand.run()` after partitions renaming. In particular, this re-creates the cache associated with the modified table.
2. Refresh the cache associated with tables from v2 table catalogs in the `ALTER TABLE .. RENAME TO PARTITION` command.

This fixes the issues portrayed by the example:
```sql
spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED BY (part0);
spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
spark-sql> CACHE TABLE tbl1;
spark-sql> SELECT * FROM tbl1;
0	0
1	1
spark-sql> ALTER TABLE tbl1 PARTITION (part0=0) RENAME TO PARTITION (part=2);
spark-sql> SELECT * FROM tbl1;
0	0
1	1
```
The last query must not return `0	2` since `0  0` was renamed by previous command.

Yes. After the changes for the example above:
```sql
...
spark-sql> ALTER TABLE tbl1 PARTITION (part=0) RENAME TO PARTITION (part=2);
spark-sql> SELECT * FROM tbl1;
0	2
1	1
```

By running the affected test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
```

Closes apache#31044 from MaxGekk/rename-partition-refresh-cache.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit b77d11d)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 6, 2021

@HyukjinKwon Here is the backport to branch-3.1 and branch-3.0: #31060 . I ran new tests in 3.0 locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants