### StatTea Comprehension ☕

Stat Tea Inc., a reputed tea company is using statistical analysis to test its new brand of tea among its customers.

To understand which brand of tea works best, they surveyed 400 customers who would sample their old brand, along with other competitor brands and another 400 customers who would sample their new brand, along with other competitor brands.

Here's how the control and treatment group look lik- e:

Control Group: 400 people, sampling old - brand
Treatment Group: 400 people, sampling new

 brand
The current adoption rate is 
at 30%. They plan on using hypothesis testing to infer that on average there's an increase of 10% in adoption rate for customers who were provided with their new bran. of tea.

Here are the results of th- e survey

Control Group: Out of 400 people, 120 preferred th- e old brand
Treatment Group: Out of 400 people, 170 preferred the new brand

#### Importing Necessary libraries.

In [17]:
import statsmodels.stats.api as sms
import math

#### Given Values.

In [18]:
Pc = 0.3
Pt = 0.4
power = 0.8
alpha = 0.05

#### Question 1
The company took a sample of 400 customres for both the control and the treatment groups to test a change of 10% from the initial adoption of 30%. \
Is the sample size sufficient in this case?

In [19]:
?sms.proportion_effectsize

[1;31mSignature:[0m [0msms[0m[1;33m.[0m[0mproportion_effectsize[0m[1;33m([0m[0mprop1[0m[1;33m,[0m [0mprop2[0m[1;33m,[0m [0mmethod[0m[1;33m=[0m[1;34m'normal'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Effect size for a test comparing two proportions

for use in power function

Parameters
----------
prop1, prop2 : float or array_like
    The proportion value(s).

Returns
-------
es : float or ndarray
    effect size for (transformed) prop1 - prop2

Notes
-----
only method='normal' is implemented to match pwr.p2.test
see http://www.statmethods.net/stats/power.html

Effect size for `normal` is defined as ::

    2 * (arcsin(sqrt(prop1)) - arcsin(sqrt(prop2)))

I think other conversions to normality can be used, but I need to check.

Examples
--------
>>> import statsmodels.api as sm
>>> sm.stats.proportion_effectsize(0.5, 0.4)
0.20135792079033088
>>> sm.stats.proportion_effectsize([0.3, 0.4, 0.5], 0.4)
array([-0.21015893,  0.        ,  0.20135792

##### We wil use the `proportion_effectsize` function from statsmodels package.
The inputs will be the following

- Pc (Proportion of the control group)
- Pt (Proportion of the treatment group)

In [20]:
p_effect_size = sms.proportion_effectsize(Pc, Pt)

In [21]:
p_effect_size

-0.2101589252771574

In [22]:
?sms.NormalIndPower.solve_power

[1;31mSignature:[0m
[0msms[0m[1;33m.[0m[0mNormalIndPower[0m[1;33m.[0m[0msolve_power[0m[1;33m([0m[1;33m
[0m    [0mself[0m[1;33m,[0m[1;33m
[0m    [0meffect_size[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mnobs1[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0malpha[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mpower[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mratio[0m[1;33m=[0m[1;36m1.0[0m[1;33m,[0m[1;33m
[0m    [0malternative[0m[1;33m=[0m[1;34m'two-sided'[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
solve for any one parameter of the power of a two sample z-test

for z-test the keywords are:
    effect_size, nobs1, alpha, power, ratio

exactly one needs to be ``None``, all others need numeric values

Parameters
----------
effect_size : float
    standardized effect size, difference between the two means divided
    by the standard deviation.
  

##### We will now use `sms.NormalIndPower.solve_power` function from statsmodels package.

The inputs wil be following:
- effect_size
- alpha
- power

In [23]:
required_n = sms.NormalIndPower().solve_power(effect_size=p_effect_size, alpha=alpha, power=power)

In [24]:
print(f"The required sample size computed is {math.ceil(required_n)}")

The required sample size computed is 356


#### Question 2

Now perform the hypothesis test for testing the difference in proportions between the control group and the treatment group. \
After that make a decision on the basis of p-value, on whether to reject or fail to reject the null hypothesis.

In [25]:
from statsmodels.stats.proportion import confint_proportions_2indep

##### We will be using `confint_proportions_2indep` for getting CI of proportions.

In [26]:
Cg = 400 # control group sample.
Cn = 120 # control group member who prefer old brand.
Tg = 400 # Test group sample.
Tn = 170 # Test group member who prefer new brand.

Pc = 0.3
Pt = 0.4

power = 0.8
alpha = 0.05

In [28]:
(lower, upper) = confint_proportions_2indep(Cn, Cg, Tn, Tg, compare='diff', alpha=0.05)

In [29]:
print(f"The confidence intervals for proportion is: [{lower:.3f}, {upper:.3f}]")

The confidence intervals for proportion is: [-0.190, -0.058]


### Hypothesis Test for Difference in Proportions



Let $p_1$ and $p_2$ be the sample proportions

Our null and alternate hypotheses are as follows:

$$H_0: p_1-p_2 = 0$$
$$H_a: p_1-p_2 \neq 0$$

> This will be a two-sided test, with $\alpha$ = 0.05

We'll be performing this test directly using the methods from `statsmodels` library

In [30]:
from statsmodels.stats.proportion import test_proportions_2indep

In [31]:
?test_proportions_2indep

[1;31mSignature:[0m
[0mtest_proportions_2indep[0m[1;33m([0m[1;33m
[0m    [0mcount1[0m[1;33m,[0m[1;33m
[0m    [0mnobs1[0m[1;33m,[0m[1;33m
[0m    [0mcount2[0m[1;33m,[0m[1;33m
[0m    [0mnobs2[0m[1;33m,[0m[1;33m
[0m    [0mvalue[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mmethod[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcompare[0m[1;33m=[0m[1;34m'diff'[0m[1;33m,[0m[1;33m
[0m    [0malternative[0m[1;33m=[0m[1;34m'two-sided'[0m[1;33m,[0m[1;33m
[0m    [0mcorrection[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mreturn_results[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Hypothesis test for comparing two independent proportions

This assumes that we have two independent binomial samples.

The Null and alternative hypothesis are

for compare = 'diff'

- H0: prop1 - prop2 - value = 0
- H1: prop1 - prop2 - value != 0  if alte

## The inputs will be

- Cn (Members of control group who preferred old brand)
- Cg (Sample size of control group)
- Tn (Members of treatment group who preferred new brand)
- Tg (Sample size of treatment group)
- alternative = 'two-sided' (We want to check differece in proportions)
- compare = 'diff'  (We want to check differece in proportions )

In [34]:
z_stat, pval = test_proportions_2indep(Cn, Cg, Tn, Tg, compare='diff', alternative='two-sided')

In [35]:
z_stat

-3.6977151500196546

In [38]:
pval < alpha

True

In [39]:
pval

0.0002175488118653293