Skip to content

Comments

[SPARK-46370][SQL] Fix bug when querying from table after changing column defaults#44302

Closed
dtenedor wants to merge 3 commits intoapache:masterfrom
dtenedor:fix-default-bug
Closed

[SPARK-46370][SQL] Fix bug when querying from table after changing column defaults#44302
dtenedor wants to merge 3 commits intoapache:masterfrom
dtenedor:fix-default-bug

Conversation

@dtenedor
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a bug when querying from table after changing defaults:

drop table if exists t;
create table t(i int, s string default 'def') using parquet;
insert into t select 1, default;
alter table t alter column s drop default;
insert into t select 2, default;
select * from t;  -- Removing this line changes the following results!
alter table t alter column s set default 'mno';
insert into t select 3, default;
select * from t;

The bug is related to the relation cache, and the fix involves adding a manual refresh to the cache to make sure we use the right table schema.

Why are the changes needed?

This PR fixes a correctness bug.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds test coverage.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Dec 11, 2023
@dtenedor
Copy link
Contributor Author

cc @cloud-fan

// TODO: support change column name/dataType/metadata/position.
override def run(sparkSession: SparkSession): Seq[Row] = {
val catalog = sparkSession.sessionState.catalog
catalog.refreshTable(tableName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add some code comments to explain the reason: this command may change column default values and we need to refresh the table relation cache so that DML commands can resolve default values correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we call catalog.refreshTable(tableName) after check the default values changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @beliefer thanks for your comment, I think we have to check it before calling catalog.getTableRawMetadata so that method performs a catalog lookup as expected rather than reading from the cache (which may be stale). There's only one call to catalog.getTableRawMetadata in this function so we should be OK here, but thanks for keeping this question in mind :)

@cloud-fan
Copy link
Contributor

Can you retrigger the test? The failed test seems unrelated.

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment.

// TODO: support change column name/dataType/metadata/position.
override def run(sparkSession: SparkSession): Seq[Row] = {
val catalog = sparkSession.sessionState.catalog
catalog.refreshTable(tableName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we call catalog.refreshTable(tableName) after check the default values changed?

@dtenedor
Copy link
Contributor Author

Can you retrigger the test? The failed test seems unrelated.

@cloud-fan I started this, it is in progress now:

image

@dtenedor
Copy link
Contributor Author

@cloud-fan all tests are passing now:
image

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in b035bb1 Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants