Skip to content

Conversation

@zhongyujiang
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the docs label Jun 9, 2022
@rdblue
Copy link
Contributor

rdblue commented Jun 12, 2022

@zhongyujiang, it sounds like this is a regression, not a documentation problem do you know what the issue is?

@szehon-ho, any ideas?

@zhongyujiang
Copy link
Contributor Author

@rdblue I think it's just a doc bug, SET NOT NULL was not supported before either due to restrictions from Spark. There is a test for this actually:

public void testAlterColumnSetNotNull() {
// no-op changes are allowed
sql("ALTER TABLE %s ALTER COLUMN id SET NOT NULL", tableName);
Types.StructType expectedSchema = Types.StructType.of(
NestedField.required(1, "id", Types.LongType.get()),
NestedField.optional(2, "data", Types.StringType.get()));
Assert.assertEquals("Schema should match expected",
expectedSchema, validationCatalog.loadTable(tableIdent).schema().asStruct());
AssertHelpers.assertThrows("Should reject adding NOT NULL constraint to an optional column",
AnalysisException.class, "Cannot change nullable column to non-nullable: data",
() -> sql("ALTER TABLE %s ALTER COLUMN data SET NOT NULL", tableName));
}

@rdblue
Copy link
Contributor

rdblue commented Jun 13, 2022

I see. The problem is that there isn't a way to override the check from SQL. Iceberg does allow you to make incompatible changes if you decide to "force" the change. But the SQL API doesn't have a way to do that an will always fail.

Looks like Postgres supports this by immediately checking the table to see if there are any null values. We should be able to use a metadata check to do the same, assuming that the column's metadata collection is not set to none.

@szehon-ho
Copy link
Member

Yea, that sounds like a cool feature to have

@zhongyujiang
Copy link
Contributor Author

+1 Sounds like a cool feature!

@zhongyujiang
Copy link
Contributor Author

Hey @szehon-ho @rdblue, could you please help review this when you have time? Thanks. (I've seen users misled by this)

@zhongyujiang
Copy link
Contributor Author

zhongyujiang commented Jul 14, 2023

Looks like Postgres supports this by immediately checking the table to see if there are any null values. We should be able to use a metadata check to do the same, assuming that the column's metadata collection is not set to none.

This is cool and I'd like to give a try on this.
I did some investigation, I think we may need to check the metadata of all files (including deleted data file), because CDC read workload may read deleted files, we also need to check the fields in the equality delete file, because the equality fields are derived from the table schema. This ensures that not-null constraint updates are totally compatible. What do you think ? @szehon-ho @rdblue

@zhongyujiang
Copy link
Contributor Author

Close this as the doc has been fixed by another PR.

@zhongyujiang zhongyujiang deleted the doc-set-not-null branch November 24, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants