Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python, rust): add column operation #2562

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ion-elgreco
Copy link
Collaborator

@ion-elgreco ion-elgreco commented Jun 2, 2024

Description

The schema evolution code is quite well abstracted from the writer, so it seemed straight forward to expose this through an add column api. This would make it easier for users to add new columns or fields in structs.

At some point we can add type widening to the schema evolution as a separate path, which also than could be used to create an alter column operation.

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Jun 2, 2024
@ion-elgreco ion-elgreco changed the title feat(python, rust): add column operation feat(python, rust): add column operation Jun 2, 2024
@ion-elgreco ion-elgreco marked this pull request as ready for review June 2, 2024 17:08
Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is overall a good idea to add 👏

@@ -137,10 +137,10 @@ pub struct Protocol {

impl Protocol {
/// Create a new protocol action
pub fn new(min_reader_version: i32, min_wrriter_version: i32) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

commit_properties: CommitProperties,
}

impl super::Operation<()> for AddColumnBuilder {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +1847 to +1854
dt = DeltaTable("test_table")
new_fields = [
Field("baz", StructType([Field("bar", PrimitiveType("integer"))])),
Field("bar", PrimitiveType("integer"))
]
dt.alter.add_column(
new_fields
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is taking a series of fields, I feel that it should be called add_columns to be more clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Comment on lines +547 to +552
let new_fields = fields
.iter()
.map(|v| v.inner.clone())
.collect::<Vec<StructField>>();

cmd = cmd.with_fields(new_fields);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fields is owned, can this just consume the value? e.g.

cmd = cmd.with_fields(fields.into_iter().map(|v| v.inner)))

(or some such nonsense 🤷 )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'll have to try, reason I cloned is because StructField didn't have the Copy Trait

}

/// Specify the fields to be added
pub fn with_fields(mut self, fields: Vec<StructField>) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend making fields an impl IntoIter sort of thing for more flexibility by the caller

Comment on lines +78 to +97
if protocol.min_reader_version == 3 && protocol.min_writer_version == 7 {
let mut new_protocol = protocol.clone();
new_protocol = new_protocol
.with_reader_features(vec![ReaderFeatures::TimestampWithoutTimezone]);
new_protocol = new_protocol
.with_writer_features(vec![WriterFeatures::TimestampWithoutTimezone]);
Some(new_protocol)
} else {
let new_protocol = Protocol {
min_reader_version: 3,
min_writer_version: 7,
writer_features: Some(hashset! {WriterFeatures::TimestampWithoutTimezone}),
reader_features: Some(hashset! {ReaderFeatures::TimestampWithoutTimezone}),
};
// Convert existing properties to features since we advance the protocol to v3,7
Some(convert_properties_to_features(
new_protocol,
&metadata.configuration,
))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks really familiar to me, don't we have something similar in update and merge code paths? Part of the middleware thinking I need to return to is to handle this sorts of protocol evolutions.

Would it be possible to at least consolidate this checking code into a shared function in operations/mod.rs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this started to bother me as well, in multiple locations we need to check and evolve the schema. I'll do another round on Thursday and see if I can do some minor refactoring across our API

fields: fields.into_iter().collect_vec(),
};

metadata.schema_string = serde_json::to_string(&new_table_schema)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud here, not feedback on this pull request, but implementing TryInto for StructType would probably be handle for us

@nadyr-mg
Copy link

nadyr-mg commented Jul 9, 2024

Hi, it's really a good idea to add such operation. Is there any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants