-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python, rust): add column
operation
#2562
base: main
Are you sure you want to change the base?
Conversation
add column
operationadd column
operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is overall a good idea to add 👏
@@ -137,10 +137,10 @@ pub struct Protocol { | |||
|
|||
impl Protocol { | |||
/// Create a new protocol action | |||
pub fn new(min_reader_version: i32, min_wrriter_version: i32) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆
commit_properties: CommitProperties, | ||
} | ||
|
||
impl super::Operation<()> for AddColumnBuilder {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
dt = DeltaTable("test_table") | ||
new_fields = [ | ||
Field("baz", StructType([Field("bar", PrimitiveType("integer"))])), | ||
Field("bar", PrimitiveType("integer")) | ||
] | ||
dt.alter.add_column( | ||
new_fields | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is taking a series of fields, I feel that it should be called add_columns
to be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
let new_fields = fields | ||
.iter() | ||
.map(|v| v.inner.clone()) | ||
.collect::<Vec<StructField>>(); | ||
|
||
cmd = cmd.with_fields(new_fields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since fields
is owned, can this just consume the value? e.g.
cmd = cmd.with_fields(fields.into_iter().map(|v| v.inner)))
(or some such nonsense 🤷 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'll have to try, reason I cloned is because StructField didn't have the Copy Trait
} | ||
|
||
/// Specify the fields to be added | ||
pub fn with_fields(mut self, fields: Vec<StructField>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend making fields an impl IntoIter
sort of thing for more flexibility by the caller
if protocol.min_reader_version == 3 && protocol.min_writer_version == 7 { | ||
let mut new_protocol = protocol.clone(); | ||
new_protocol = new_protocol | ||
.with_reader_features(vec![ReaderFeatures::TimestampWithoutTimezone]); | ||
new_protocol = new_protocol | ||
.with_writer_features(vec![WriterFeatures::TimestampWithoutTimezone]); | ||
Some(new_protocol) | ||
} else { | ||
let new_protocol = Protocol { | ||
min_reader_version: 3, | ||
min_writer_version: 7, | ||
writer_features: Some(hashset! {WriterFeatures::TimestampWithoutTimezone}), | ||
reader_features: Some(hashset! {ReaderFeatures::TimestampWithoutTimezone}), | ||
}; | ||
// Convert existing properties to features since we advance the protocol to v3,7 | ||
Some(convert_properties_to_features( | ||
new_protocol, | ||
&metadata.configuration, | ||
)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks really familiar to me, don't we have something similar in update and merge code paths? Part of the middleware thinking I need to return to is to handle this sorts of protocol evolutions.
Would it be possible to at least consolidate this checking code into a shared function in operations/mod.rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this started to bother me as well, in multiple locations we need to check and evolve the schema. I'll do another round on Thursday and see if I can do some minor refactoring across our API
fields: fields.into_iter().collect_vec(), | ||
}; | ||
|
||
metadata.schema_string = serde_json::to_string(&new_table_schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud here, not feedback on this pull request, but implementing TryInto for StructType would probably be handle for us
Hi, it's really a good idea to add such operation. Is there any update on this? |
Description
The schema evolution code is quite well abstracted from the writer, so it seemed straight forward to expose this through an
add column
api. This would make it easier for users to add new columns or fields in structs.At some point we can add type widening to the schema evolution as a separate path, which also than could be used to create an
alter column
operation.