Implemented meanZTest #33354

achimbab · 2022-01-01T06:23:57Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implemented meanZTest

Detailed description / Documentation draft:
I implemented meanZTest aggregate function.

Usage: 
meanZTest(population_variance_x, population_variance_y, confidence_level)(sample_data, sample_index)

Returns:
(z_statistics, p_value, confidence_interval_low, confidence_interval_high)

nikitamikhaylov · 2022-01-13T12:08:14Z

@achimbab May I ask you to write all formulas in Latex format inside the comments? And could you leave a link to some paper there?

achimbab · 2022-01-13T12:59:02Z

@nikitamikhaylov
OK, I will do that ASAP.

achimbab · 2022-01-18T12:59:21Z

@nikitamikhaylov

I wrote all formulas in Latex format inside the comments at #a8a20b6.

Formulas and Descriptions for Mean z-test
http://statkat.com/stat-tests/two-sample-z-test.php

Online calculator for Mean z-test
https://mathcracker.com/z-test-for-two-means

nikitamikhaylov · 2022-01-19T12:47:11Z

docs/en/sql-reference/aggregate-functions/reference/meanztest.md

+```
+
+Values of both samples are in the `sample_data` column. If `sample_index` equals to 0 then the value in that row belongs to the sample from the first population. Otherwise it belongs to the sample from the second population.
+The null hypothesis is that means of populations are equal. Normal distribution is assumed. Populations may have unequal variance and the variances are known.


Is is really means that population are equal? I thought that null hypothesis is that only means of two distributions are equal.

What is the alternative? Two sided or one sided? E.g. mean1 != mean2 or mean1 >(<) mean2?

The alternative is Two sided.

I found that the null hypothesis for z-test is the below.

Null hypothesis: The two sample z test tests the following null hypothesis (H0): H0: μ1 = μ2 Here μ1 is the population mean for group 1, and μ2 is the population mean for group 2.

Could you please explain why do you think like that?

I somehow misread (I don't know how). I thought that it was written that the null hypothesis is that the two distributions are equal.. Sorry, my fault

That's OK. I dont mind. Thank you.

nikitamikhaylov · 2022-01-19T12:52:52Z

Can we also add a python test with scipy? Please take a look at 01558_ttest_scipy.python.

nikitamikhaylov · 2022-01-19T12:53:17Z

src/AggregateFunctions/AggregateFunctionMeanZTest.cpp

+            return {std::numeric_limits<Float64>::quiet_NaN(), std::numeric_limits<Float64>::quiet_NaN()};
+        }
+
+        Float64 pvalue = 2.0 * boost::math::cdf(boost::math::normal(0.0, 1.0), -1.0 * std::abs(zstat));


Ok. https://www.omnicalculator.com/statistics/p-value#how-to-find-p-value-from-z-score

achimbab · 2022-01-20T03:36:41Z

@nikitamikhaylov
I added 02158_ztest_cmp.python.
But, I couldn't find the two_sample_mean_ztest in the scipy. So I implemented simple twosample_mean_ztest() using unpooled variance in 02158_ztest_cmp.python. The test compares twosample_mean_ztest() and meanZTest().

achimbab · 2022-01-20T04:13:46Z

https://mathcracker.com/z-test-for-two-means
It is an online calculator for two sample mean z-test using unpooled variance.
The results of meanZTest and the calculator are the same.

Implemented meanZTest

9adaf94

robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jan 1, 2022

Fix meanZTest and add meanztest.md

5d95383

achimbab force-pushed the ztest_mean branch from d0209a8 to 5d95383 Compare January 1, 2022 07:15

achimbab added 4 commits January 1, 2022 19:44

Validate parameters

987299f

Remote virtual dispatch in constructor

9c6cefe

Minor fixes

eaaca88

Merge branch 'master' into ztest_mean

9f0f1ba

nikitamikhaylov self-assigned this Jan 13, 2022

Add formulars for the mean z-test in the format of latex.

a8a20b6

achimbab added 2 commits January 19, 2022 14:35

Merge remote-tracking branch 'origin' into ztest_mean

e18ed51

Merge remote-tracking branch 'origin' into ztest_mean

0edabfc

nikitamikhaylov reviewed Jan 19, 2022

View reviewed changes

Add testcases 02158_ztest_cmp.python

8fb8579

nikitamikhaylov approved these changes Jan 20, 2022

View reviewed changes

nikitamikhaylov merged commit 779538b into ClickHouse:master Jan 20, 2022

UnamedRus mentioned this pull request Jul 21, 2023

Correctess check for Math (Statistics) functions against state of art implementation (scipy) #51275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented meanZTest #33354

Implemented meanZTest #33354

achimbab commented Jan 1, 2022

nikitamikhaylov commented Jan 13, 2022

achimbab commented Jan 13, 2022

achimbab commented Jan 18, 2022

nikitamikhaylov Jan 19, 2022

nikitamikhaylov Jan 19, 2022

achimbab Jan 19, 2022

achimbab Jan 19, 2022

nikitamikhaylov Jan 19, 2022

achimbab Jan 20, 2022

nikitamikhaylov commented Jan 19, 2022

nikitamikhaylov Jan 19, 2022

achimbab commented Jan 20, 2022 •

edited

achimbab commented Jan 20, 2022 •

edited

Implemented meanZTest #33354

Implemented meanZTest #33354

Conversation

achimbab commented Jan 1, 2022

nikitamikhaylov commented Jan 13, 2022

achimbab commented Jan 13, 2022

achimbab commented Jan 18, 2022

nikitamikhaylov Jan 19, 2022

Choose a reason for hiding this comment

nikitamikhaylov Jan 19, 2022

Choose a reason for hiding this comment

achimbab Jan 19, 2022

Choose a reason for hiding this comment

achimbab Jan 19, 2022

Choose a reason for hiding this comment

nikitamikhaylov Jan 19, 2022

Choose a reason for hiding this comment

achimbab Jan 20, 2022

Choose a reason for hiding this comment

nikitamikhaylov commented Jan 19, 2022

nikitamikhaylov Jan 19, 2022

Choose a reason for hiding this comment

achimbab commented Jan 20, 2022 • edited

achimbab commented Jan 20, 2022 • edited

achimbab commented Jan 20, 2022 •

edited

achimbab commented Jan 20, 2022 •

edited