Skip to content

Conversation

@jayzhan211
Copy link
Contributor

Which issue does this PR close?

Closes #10703.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels May 29, 2024
pub mod unwrap_cast_in_comparison;
pub mod utils;

#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only integration test but also datafusion-example

}

#[cfg(test)]
mod tests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to slt

@jayzhan211 jayzhan211 marked this pull request as ready for review May 29, 2024 15:39
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 -- this looks very cool. THank you for breaking the dependency

Rather than adding a new argument to all optimizer sites, which will cause API churn, what do you think about adding a new (backwards compatible) API to OptimizerConfig?

Perhaps something like

trait OptimizerConfig {
...
  /// return the function registry for looking up functions
  /// returns None by default.
  /// Used for optimizations like `distinct ON` --> first_value
  fn function_registry(&self) -> Option<&dyn FunctionRegistry> {
    None
  } 
}

The only downside I can see to providing the entire registry is that it then adds an implicit dependency on whatever is registered as "first_value" that should probably be documented

A way to make that more explicit might be something like

trait OptimizerConfig {
...
  /// return the function for first_value, if known. This function will be
  /// used to rewrite DISTINCT ON if present
  fn first_value(&self) -> Option<AggregateUDF> {
    None
  } 
}

🤔

let aggr_expr = select_expr
.into_iter()
.map(|e| first_value(vec![e], false, None, sort_expr.clone(), None));
let aggr_expr = select_expr.into_iter().map(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is the reason for this PR I think -- to avoid the hard coded dependency on first_value...

45 15673
-72 -11122

# test distinct on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayzhan211
Copy link
Contributor Author

I would prefer the former to avoid adding too many function in optimize config trait

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 force-pushed the mv-funcregistry-to-common branch from 1414b15 to 62d59a2 Compare May 31, 2024 01:35
@jayzhan211 jayzhan211 marked this pull request as draft May 31, 2024 02:30
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@alamb
Copy link
Contributor

alamb commented May 31, 2024

Looks very nice 🆗

I think we can also remove the dependency from cargo now as well:

datafusion-functions-aggregate = { workspace = true }

@jayzhan211
Copy link
Contributor Author

Looks very nice 🆗

I think we can also remove the dependency from cargo now as well:

datafusion-functions-aggregate = { workspace = true }

I totally forgot my main purpose of this PR :(

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
chrono = { workspace = true }
datafusion-common = { workspace = true, default-features = true }
datafusion-expr = { workspace = true }
datafusion-functions-aggregate = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as ready for review May 31, 2024 14:45
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211

@jayzhan211
Copy link
Contributor Author

Thanks @alamb

@jayzhan211 jayzhan211 merged commit 7638a26 into apache:main Jun 1, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…pache#10714)

* mv function registry to expr

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* registry move to config trait

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fix test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fix test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* rm dependency

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fix cli cargo lock

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decouple builtin aggregate function first_value from optimize rule replace_distinct_aggregate

2 participants