# Hypothesis Testing

Importing libraries and the dataset

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_feather('../datasets/stack_overflow.feather')
df.head()

Unnamed: 0,respondent,main_branch,hobbyist,age,age_1st_code,age_first_code_cut,comp_freq,comp_total,converted_comp,country,...,survey_length,trans,undergrad_major,webframe_desire_next_year,webframe_worked_with,welcome_change,work_week_hrs,years_code,years_code_pro,age_cat
0,36.0,"I am not primarily a developer, but I write co...",Yes,34.0,30.0,adult,Yearly,60000.0,77556.0,United Kingdom,...,Appropriate in length,No,"Computer science, computer engineering, or sof...",Express;React.js,Express;React.js,Just as welcome now as I felt last year,40.0,4.0,3.0,At least 30
1,47.0,I am a developer by profession,Yes,53.0,10.0,child,Yearly,58000.0,74970.0,United Kingdom,...,Appropriate in length,No,"A natural science (such as biology, chemistry,...",Flask;Spring,Flask;Spring,Just as welcome now as I felt last year,40.0,43.0,28.0,At least 30
2,69.0,I am a developer by profession,Yes,25.0,12.0,child,Yearly,550000.0,594539.0,France,...,Too short,No,"Computer science, computer engineering, or sof...",Django;Flask,Django;Flask,Just as welcome now as I felt last year,40.0,13.0,3.0,Under 30
3,125.0,"I am not primarily a developer, but I write co...",Yes,41.0,30.0,adult,Monthly,200000.0,2000000.0,United States,...,Appropriate in length,No,,,,Just as welcome now as I felt last year,40.0,11.0,11.0,At least 30
4,147.0,"I am not primarily a developer, but I write co...",No,28.0,15.0,adult,Yearly,50000.0,37816.0,Canada,...,Appropriate in length,No,"Another engineering discipline (such as civil,...",,Express;Flask,Just as welcome now as I felt last year,40.0,5.0,3.0,Under 30


# Hypothesis with one sample

A hypothesis is a statement about an unknown population parameter.

A hypothesis test is a test of two competing hypotheses:

* The null hypothesis (H0) is the existing idea.
* The alternative hypothesis (HA) is the new "challenger" idea of the researcher.

### Example 1

**Step 1**: Defitinion of the null and alternative hypotheses based on a parameter (in this case the mean).
* $H_0$: The proportion of data scientists starting programming as children is 35%
* $H_A$: The proportion of data scientists starting programming as children is greater than 35%

**Step 2**: Calculation of the z-score

$$z = \frac{\text{value}-\text{mean}}{\text{standard deviation}} = \frac{\text{sample stat}-\text{hypoth. param. value}}{\text{standard error}}$$

Where:

$$\text{standard error} = s_{\text{bootstrap distribution}}$$

In [10]:
# Generating a bootstrap distribution
comp_bootstrap_distribution = []
for i in range(5000):
    sample = df.sample(frac=1, replace=True)
    sample_bool = sample["age_first_code_cut"] == "child"
    comp_bootstrap_distribution.append(np.mean(sample_bool))

# Standard error - Standard deviation of the bootstrap distribution
std_error = np.std(comp_bootstrap_distribution, ddof=1)

# Mean of the sample
proportion_child_sample = (df["age_first_code_cut"] == "child").mean()

# Hypothesized value from the null hypothesis
proportion_child_hypothesis = 0.35

# Z-score
z_score = (proportion_child_sample - proportion_child_hypothesis) / std_error
print(z_score)

4.040952601350419


**Step 3:** Calculation of the p-value

In [6]:
from scipy.stats import norm

p_value = 1 - norm.cdf(z_score, loc=0, scale=1) # This is valid only when the alternative is greater than the null hypothesis.

# p_value = norm.cdf(z_score, loc=0, scale=1) # For left-tailed tests

print(p_value)

2.8421892364294266e-05


**Step 4:** Comparing the p-value against the significance level

The significance level of a hypothesis test ($\alpha$) is the threshold point for "beyond a reasonable doubt".

* If $p\le \alpha$, then we reject $H_0$
* But, if $p> \alpha$, we fail to reject $H_0$ (although it doesn't mean that $H_0$ is true)
  
$\alpha$ should be set **prior** to conducting the hypothesis test

In [7]:
alpha = 0.05
if p_value <= alpha:
    print("Null hypothesis rejected")
else:
    print("Failed to reject the null hypothesis")

Null hypothesis rejected


**Step 5:** Calculate the confidence intervals of the mean.

For a significance level of $\alpha$, it's common to choose a confidence interval level of $1-\alpha$.

* $\alpha = 0.05$ -> 95% confidence interval

In [9]:
lower = np.quantile(comp_bootstrap_distribution, 0.025)
upper = np.quantile(comp_bootstrap_distribution, 0.975)
print((lower, upper))

(0.3715170278637771, 0.41220698805838124)


### Possible errors

* Type I - False positive: we rejected the null hypothesis, when it was true.
* Type II - False negative: we failed to reject the null hypothesis, when it was false.

# Hypothesis with two samples

We are going to use the t-distribution or t-statistic, which is useful when we want to use a sample standard deviation in estimating a standard error (instead of using a bootstrapping distribution).

### Example 2

**Step 1**: Defitinion of the null and alternative hypotheses.

* $H_0$: The mean compensation (in USD) is the same for those that coded first as a child and those that coded first as an adult.

  $$H_0: \mu_{child}=\mu_{adult}$$
  $$H_0: \mu_{child}-\mu_{adult}=0$$
      
* $H_A$: The mean compensation (in USD) is greater for those that coded first as a child compared to those that coded first as an adult.
  
  $$H_A: \mu_{child}>\mu_{adult}$$
  $$H_A: \mu_{child}-\mu_{adult}>0$$

**Step 2**: Calculating the t-test

* Sample mean estimates the population mean
* $\bar x$: a sample mean
* $\bar x_{child}$: sample mean compensation for coding first as a child
* $\bar x_{adult}$: sample mean compensation for coding first as an adult
* $\bar x_{child}-\bar x_{adult}$: a test statistic

$$z=\frac{\text{sample stat}-\text{population parameter}}{\text{standard error}}$$

$$t=\frac{\text{difference in sample stats}-\text{difference in population parameters}}{\text{standard error}}$$

$$t=\frac{(\bar x_{child}-\bar x_{adult})-(\mu _{child}-\mu_{adult})}{SE(\bar x_{child}-\bar x_{adult})}$$

$$t=\frac{(\bar x_{child}-\bar x_{adult})-(0)}{SE(\bar x_{child}-\bar x_{adult})}$$

$$SE(\bar x_{child}-\bar x_{adult})\approx \sqrt{\frac{s^2_{child}}{n_{child}}+\frac{s^2_{adult}}{n_{adult}}}$$

$$t=\frac{(\bar x_{child}-\bar x_{adult})}{\sqrt{\frac{s^2_{child}}{n_{child}}+\frac{s^2_{adult}}{n_{adult}}}}$$

In [16]:
means = df.groupby("age_first_code_cut")["converted_comp"].mean()
s = df.groupby("age_first_code_cut")["converted_comp"].std()
n = df.groupby("age_first_code_cut")["converted_comp"].count()

numerator = means.loc["child"] - means.loc["adult"]
denominator = np.sqrt(s.loc["child"]**2/n.loc["child"] + s.loc["adult"]**2/n.loc["adult"])
t_stat = numerator / denominator

print(t_stat)

1.8699313316221844


**Step 3:** Calculating the p-value

Degrees of freedom: maximum number of logically independent values in the data sample

In [17]:
degrees_of_freedom = n.loc["child"] + n.loc["adult"] - 2 # -2 because we have the means of both groups as logically independent values

from scipy.stats import t
p_value = 1 - t.cdf(t_stat, df=degrees_of_freedom)
print(p_value)

0.030811302165157595


**Step 4:** Comparing the p-value against the significance level

In [None]:
alpha = 0.05
if p_value <= alpha:
    print("Null hypothesis rejected")
else:
    print("Failed to reject the null hypothesis")

# Paired t-tests