Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement project for Transform. #264

Closed
Tracked by #153
liurenjie1024 opened this issue Mar 14, 2024 · 8 comments · Fixed by #309
Closed
Tracked by #153

Implement project for Transform. #264

liurenjie1024 opened this issue Mar 14, 2024 · 8 comments · Fixed by #309
Milestone

Comments

@liurenjie1024
Copy link
Collaborator

liurenjie1024 commented Mar 14, 2024

See java implementation.

What's project used for? Say we have row filter a = 10, and we have a partition spec bucket(a, 37) as bs, if one row matches a=10, then its partition value should match bucket(10, 37) as bs, and we project a=10 to bs = bucket(10, 37).

Blocked by #283

@marvinlanhenke
Copy link
Contributor

@liurenjie1024
...just to make sure I'm on the right track - and to confirm the idea.

Here is what I would (high-level):

  • implement fn project on Transform
  • match Transform & orginal BoundPredicate
  • handle each Transform (e.g. Bucket) and Predicate according to Java Implementation
  • return Option<Predicate>

In order to implement the ProjectionEvaluator later, we can then get the PartitionSpec from the manifest, iterate over fields: Vec<PartitionField> and call fn project(...) on each field's transform...

some pseudo-implementation to illustrate:

impl Transform {
// ...
pub fn project(&self, name: String, pred: &BoundPredicate) -> Option<Predicate> {
        match self {
            Transform::Bucket(_) => match pred {
                BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new(
                    expr.op(),
                    Reference::new(name),
                ))),
                _ => unimplemented!(),
            },
            _ => unimplemented!(),
        }
    }
}
#[test]
fn test_bucket_project() {
    let trans = Transform::Bucket(8);

    let name = "projected_name".to_string();

    let field = NestedField::required(1, "a", Type::Primitive(PrimitiveType::Int));

    let pred = BoundPredicate::Unary(UnaryExpression::new(
        PredicateOperator::IsNull,
        BoundReference::new("original_name", Arc::new(field)),
    ));

    let result = trans.project(name, &pred).unwrap();

    assert_eq!(format!("{result}"), "projected_name IS NULL");
}

@liurenjie1024
Copy link
Collaborator Author

@marvinlanhenke Sorry for late reply. Yeah, this is exactly what I'm thinking about, thanks!

@marvinlanhenke
Copy link
Contributor

thank you, once #283 is ready, I will continue here.

@marvinlanhenke
Copy link
Contributor

@ZENOTME
are you still working on #287 and the proposed interface?
just looking for a quick update on this - and huge thanks for your effort on this.

@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 25, 2024

@ZENOTME are you still working on #287 and the proposed interface? just looking for a quick update on this - and huge thanks for your effort on this.

Yes. I will update it later.

@marvinlanhenke
Copy link
Contributor

marvinlanhenke commented Mar 27, 2024

@liurenjie1024 @ZENOTME
...the fn project(...) is getting kinda extensive (lengthy). I think it'd be appropriate to introduce some helper functions here instead of writing a huge match statement - to make the code more readable and reduce duplication.

Any suggestions/ preferences where to put those?

  • mod.rs in iceberg/src/transform
  • create utils.rs in iceberg/src/spec
  • create fns on impl Transform itself

thanks for your thoughts on this

EDIT:
For now, I decided to go with option no. 3, which I like since the scope of those helper functions is restricted only to Transform - putting those into some utils.rs would not be justified, since nobody else woud need those fns...

@liurenjie1024
Copy link
Collaborator Author

@liurenjie1024 @ZENOTME ...the fn project(...) is getting kinda extensive (lengthy). I think it'd be appropriate to introduce some helper functions here instead of writing a huge match statement - to make the code more readable and reduce duplication.

Any suggestions/ preferences where to put those?

  • mod.rs in iceberg/src/transform
  • create utils.rs in iceberg/src/spec
  • create fns on impl Transform itself

thanks for your thoughts on this

EDIT: For now, I decided to go with option no. 3, which I like since the scope of those helper functions is restricted only to Transform - putting those into some utils.rs would not be justified, since nobody else woud need those fns...

Hi, @marvinlanhenke Thanks for raising this. I prefer option no. 3 too with similar reason: putting related codes together helps to organize things better.

@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 28, 2024

@liurenjie1024 @ZENOTME ...the fn project(...) is getting kinda extensive (lengthy). I think it'd be appropriate to introduce some helper functions here instead of writing a huge match statement - to make the code more readable and reduce duplication.

Any suggestions/ preferences where to put those?

  • mod.rs in iceberg/src/transform
  • create utils.rs in iceberg/src/spec
  • create fns on impl Transform itself

thanks for your thoughts on this

EDIT: For now, I decided to go with option no. 3, which I like since the scope of those helper functions is restricted only to Transform - putting those into some utils.rs would not be justified, since nobody else woud need those fns...

+1 for no. 3 too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants