Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stddev missmatches postgres' #15638

Open
surister opened this issue Feb 29, 2024 · 1 comment
Open

stddev missmatches postgres' #15638

surister opened this issue Feb 29, 2024 · 1 comment

Comments

@surister
Copy link
Contributor

surister commented Feb 29, 2024

Context

In standard deviation calculation there is the Bessel's correction this mean that there are 'two ways' of calculating a deviation: population standard deviation and sample standard deviation, they are very similar but the formula and result varies a little bit.

population stddev

$\sqrt{Σxi - u}\over{N}$

sample stddev

$\sqrt{Σxi - u}\over{N-1}$

The problem.

Our stddev is a population stddev but postgres's is sample stddev so crate and postgres cannot equate stddev results.

you can quickly check this in here and running

crate:
select stddev(a) from unnest([1.0, 2.0, 3.0, 4.0, 5.0]) as a; -> 1.4142135623730951

postgres:
select stddev(a) from unnest(array[1.0, 2.0, 3.0, 4.0, 5.0]) as a; -> 1.5811388300841897

Possible Solutions

Change the implementation of stddev to match postgres' or keep it the same and add stddev_pop and stddev_samp? so we can at least polyfill/overload stddev calls.

Most databases seem to implement both functions in one way or another:
postgres -> stddev (default sample), stddev_pop and stddev_samp
mongodb -> stdDevPop and stdDevSamp
mysql -> sstdev (default pop), stddev_pop, stddev_samp
influxdb -> stddev(..., mode: 'sample' or 'population' = 'sample')

Considered Alternatives

No response

@surister surister changed the title stddev missmatch issue, stddev missmatches postgres' Feb 29, 2024
@surister
Copy link
Contributor Author

This is pretty much also the case with variance, the formula varies depending on sample vs population.

crate:
select variance(a) from unnest([1, 2, 3, 4, 5]) as a; -> 2

postgres:
select variance(a) from unnest(array[1.0, 2.0, 3.0, 4.0, 5.0]) as a; -> 2.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants