Skip to content

Conversation

@2010YOUY01
Copy link
Contributor

Which issue does this PR close?

NA

Rationale for this change

I'm planning to add PostgreSQL linear regression aggregate functions. Their implementations are similar, so I'd like to implement regr_slope(y, x) first, and the remaining functions should be easy to extend.

What changes are included in this PR?

Adding regr_slope(y, x) aggregate function, mainly implementing Accumulator for RegrSlopeAccumulator
The linear regression is calculated using COVAR_POP(x,y) / VAR_POP(x)
One-pass and parallel algorithm for covar/var reference: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm

Are these changes tested?

All end-to-end tests produced the same result compared to Postgres

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 30, 2023
@alamb
Copy link
Contributor

alamb commented Jul 30, 2023

Thank you @2010YOUY01 -- I'll check this out carefully over the next few days.

One thing I wonder about is how large the datafusion library is getting. I was thinking maybe it would be useful to start splitting some of the more advanced / special purpose functionality into their own crates, as partly described in #7110. Curious what you think about this

@alamb
Copy link
Contributor

alamb commented Jul 31, 2023

I am sorry I ran out of time today to review this. It is on my list for tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @2010YOUY01 -- I found this PR easy to read and well tested and documented. I think it is a very nice contribution

Looks like it is implemented in several other databases:

https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Aggregate/REGR_SLOPE.htm

https://docs.snowflake.com/en/sql-reference/functions/regr_slope

- **expression**: Expression to operate on.
Can be a constant, column, or function, and any combination of arithmetic operators.

### `regr_slope`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Aug 1, 2023

I don't really have any suggestions or comments for this PR so merging it in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants