Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement project for Transform. #264 #269

Closed

Conversation

marvinlanhenke
Copy link
Contributor

@marvinlanhenke marvinlanhenke commented Mar 14, 2024

#264
I just wanted to share this as a starting point - and to get some feedback, thoughts about the approach here.

@liurenjie1024 @Xuanwo
Please take a look if this is a viable solution at all.
I'm quite unsure about the fn transform and the handling of the arrow_array - seems kinda clunky?

Anyway, let me know what you think. best regards.

@marvinlanhenke marvinlanhenke marked this pull request as draft March 14, 2024 19:50
@marvinlanhenke marvinlanhenke changed the title wip: prototype project_transform [WIP] Implement project for Transform. #264 Mar 14, 2024
@marvinlanhenke
Copy link
Contributor Author

@liurenjie1024 @Xuanwo
the main design concerns I want to discuss:

  1. exposing op and literals from expressions. For prototyping I did this directly on the expression - it might be better to implement this on BoundPredicate itself and expose op via pub(crate) from the struct?
  2. Once I receive a PrimitiveLiteral I need to convert it to an arrow_array in order to perform the transformation. The conversion to the arrow array could also be implemented on PrimitiveLiteral itself?

I would really appreciate your thoughts on this

PrimitiveLiteral::Long(v) => func
.transform(Arc::new(Int64Array::from_value(v, 1)))?
.as_any()
.downcast_ref::<Int32Array>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Int64Array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdd
if i understood correctly the fn transform implementation of Bucket always returns arrow_array::Int32Array.
https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/transform/bucket.rs#L89

@marvinlanhenke
Copy link
Contributor Author

@liurenjie1024 @sdd @Xuanwo @ZENOTME
I went on and implemented fn project for Transform::Bucket with some design assumptions. PTAL and tell me what you think, before we can move on and implement the missing transforms.
Thank you so much for your effort

@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 18, 2024

Thanks for this job!

I'm quite unsure about the fn transform and the handling of the arrow_array - seems kinda clunky?

I think the transform can provide an interface like transform_literal later.

pub trait TransformFunction: Send {
    /// transform will take an input array and transform it into a new array.
    /// The implementation of this function will need to check and downcast the input to specific
    /// type.
    fn transform(&self, input: ArrayRef) -> Result<ArrayRef>;
+ fn transform_literal(&self, literal:Literal) -> Result<Literal>; 
}

self.op
}

pub(crate) fn literals(&self) -> &FnvHashSet<Datum> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) fn literals(&self) -> &FnvHashSet<Datum> {
pub(crate) fn literals(&self) -> impl Iterator<Item=Datum> {

How about returning iterator only? It's better not expose the inner data structure.

@liurenjie1024
Copy link
Collaborator

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 19, 2024

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

I will init the interface as soon as possible to avoid blocking this.

@marvinlanhenke
Copy link
Contributor Author

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

Thank you for providing the reason why we use arrow array in the first place.

@marvinlanhenke
Copy link
Contributor Author

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

I will init the interface as soon as possible to avoid blocking this.

Sounds good to me

@marvinlanhenke marvinlanhenke deleted the wip_project_transform branch April 23, 2024 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants