Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented meanZTest #33354

Merged
merged 10 commits into from Jan 20, 2022
Merged

Implemented meanZTest #33354

merged 10 commits into from Jan 20, 2022

Conversation

achimbab
Copy link
Contributor

@achimbab achimbab commented Jan 1, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implemented meanZTest

Detailed description / Documentation draft:
I implemented meanZTest aggregate function.

Usage: 
meanZTest(population_variance_x, population_variance_y, confidence_level)(sample_data, sample_index)

Returns:
(z_statistics, p_value, confidence_interval_low, confidence_interval_high)

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jan 1, 2022
@nikitamikhaylov nikitamikhaylov self-assigned this Jan 13, 2022
@nikitamikhaylov
Copy link
Member

@achimbab May I ask you to write all formulas in Latex format inside the comments? And could you leave a link to some paper there?

@achimbab
Copy link
Contributor Author

@nikitamikhaylov
OK, I will do that ASAP.

@achimbab
Copy link
Contributor Author

@nikitamikhaylov

I wrote all formulas in Latex format inside the comments at #a8a20b6.

Formulas and Descriptions for Mean z-test
http://statkat.com/stat-tests/two-sample-z-test.php

Online calculator for Mean z-test
https://mathcracker.com/z-test-for-two-means

```

Values of both samples are in the `sample_data` column. If `sample_index` equals to 0 then the value in that row belongs to the sample from the first population. Otherwise it belongs to the sample from the second population.
The null hypothesis is that means of populations are equal. Normal distribution is assumed. Populations may have unequal variance and the variances are known.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is really means that population are equal? I thought that null hypothesis is that only means of two distributions are equal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alternative? Two sided or one sided? E.g. mean1 != mean2 or mean1 >(<) mean2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is Two sided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the null hypothesis for z-test is the below.

Null hypothesis:

  The two sample z test tests the following null hypothesis (H0):
    H0: μ1 = μ2
  Here μ1 is the population mean for group 1, and μ2 is the population mean for group 2.

Could you please explain why do you think like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow misread (I don't know how). I thought that it was written that the null hypothesis is that the two distributions are equal.. Sorry, my fault

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK. I dont mind. Thank you.

@nikitamikhaylov
Copy link
Member

Can we also add a python test with scipy? Please take a look at 01558_ttest_scipy.python.

return {std::numeric_limits<Float64>::quiet_NaN(), std::numeric_limits<Float64>::quiet_NaN()};
}

Float64 pvalue = 2.0 * boost::math::cdf(boost::math::normal(0.0, 1.0), -1.0 * std::abs(zstat));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achimbab
Copy link
Contributor Author

achimbab commented Jan 20, 2022

@nikitamikhaylov
I added 02158_ztest_cmp.python.
But, I couldn't find the two_sample_mean_ztest in the scipy. So I implemented simple twosample_mean_ztest() using unpooled variance in 02158_ztest_cmp.python. The test compares twosample_mean_ztest() and meanZTest().

@achimbab
Copy link
Contributor Author

achimbab commented Jan 20, 2022

https://mathcracker.com/z-test-for-two-means
It is an online calculator for two sample mean z-test using unpooled variance.
The results of meanZTest and the calculator are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants