# <center>Hypothesis Testing<center/>
## <center>T-Tests<center/>

The null hypothesis for a t-test is that there is no statistically significant difference between the means of two groups or samples. In other words, __the null hypothesis assumes that the population means for the two groups are equal__. The alternative hypothesis, on the other hand, assumes that there is a statistically significant difference between the means of the two groups.

The null hypothesis for a one-sample t-test is that there is no statistically significant difference between the mean of a single sample and a known or hypothesized population mean.

It is important to note that the null hypothesis is not necessarily true, but is assumed to be true unless there is sufficient evidence to reject it. A t-test is used to evaluate whether the sample data provides sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis.

In [1]:
#t-test
import numpy as np
from scipy.stats import ttest_ind

# Generate two random samples of data
sample1 = np.random.normal(loc=0, scale=1, size=50)
sample2 = np.random.normal(loc=1, scale=1, size=50)

# Perform t-test
stat, p = ttest_ind(sample1, sample2)
print("t-test statistic:", stat)
print("p-value:", p)

t-test statistic: -5.118697228488121
p-value: 1.5349270535552999e-06


### Assumptions:

- Independence: The observations in each sample are independent of each other. That is, the values in one sample are not related to the values in the other sample.
- Normality: The populations from which the samples are drawn are normally distributed. In practice, this assumption is not always necessary if the sample sizes are sufficiently large, as the Central Limit Theorem suggests that the sampling distribution of the mean will approach normality regardless of the underlying distribution of the population.
- Homogeneity of variance: The variances of the populations from which the samples are drawn are equal. This assumption is important because it affects the standard error of the mean difference and, therefore, the t-test statistic.

Independence is typically assumed when conducting a t-test.

Homogeneity can be test with a variablity test such as levene's test. __The null hypothesis in Levene's test is that the variances of the groups being compared are equal__. In other words, the null hypothesis assumes that the population variances for all groups are equal. 

Here's what that would look like:

In [2]:
import numpy as np
from scipy.stats import levene

# Generate two random samples of data
sample1 = np.random.normal(loc=0, scale=1, size=50)
sample2 = np.random.normal(loc=1, scale=2, size=50)

# Perform Levene's test to compare the variances of the two groups
stat, p = levene(sample1, sample2)

# Print the test statistic and p-value
print("Levene's test statistic:", stat)
print("p-value:", p)

Levene's test statistic: 13.55213574231272
p-value: 0.0003797042402190668


Normality is best tested with a Shapiro-Wilk test.

__The null hypothesis of the Shapiro-Wilk test is that the data are drawn from a normally distributed population__.

In other words, the test assumes that the sample data comes from a normally distributed population, and the test is used to determine whether the sample data provides enough evidence to reject this assumption. The alternative hypothesis, in this case, is that the data do not come from a normal distribution.

If the p-value resulting from the Shapiro-Wilk test is less than the chosen significance level, it suggests that there is significant evidence to reject the null hypothesis, and conclude that the sample data does not come from a normal distribution. If the p-value is greater than the significance level, it suggests that there is not enough evidence to reject the null hypothesis, and the data can be assumed to come from a normal distribution.

It is important to note that the Shapiro-Wilk test has some limitations and may not always be the most appropriate test for assessing normality, especially when the sample size is small or the distribution has heavy tails or skewness.

In [3]:
import numpy as np
from scipy.stats import shapiro

# Generate two random samples of data
sample1 = np.random.normal(loc=0, scale=1, size=50)
sample2 = np.random.normal(loc=1, scale=1, size=50)

# Perform Shapiro-Wilk test on sample 1
stat, p = shapiro(sample1)
print("Shapiro-Wilk test statistic for sample 1:", stat)
print("p-value for sample 1:", p)

# Perform Shapiro-Wilk test on sample 2
stat, p = shapiro(sample2)
print("Shapiro-Wilk test statistic for sample 2:", stat)
print("p-value for sample 2:", p)

Shapiro-Wilk test statistic for sample 1: 0.9843613505363464
p-value for sample 1: 0.7438374161720276
Shapiro-Wilk test statistic for sample 2: 0.9683470129966736
p-value for sample 2: 0.198089137673378


### Special cases of t-testing

__Power t-test__: A power t-test is a type of t-test that is used to determine the statistical power of a hypothesis test. It involves calculating the minimum sample size required to achieve a desired level of statistical power, given a specific effect size and significance level.

In [4]:
import statsmodels.stats.power as power

# Define the effect size, alpha level, and desired power
effect_size = 0.5
alpha = 0.05
power_level = 0.8

# Calculate the required sample size
sample_size = power.tt_ind_solve_power(
    effect_size=effect_size, alpha=alpha, power=power_level, ratio=1, alternative='two-sided')

# Print the results
print("Required sample size: ", sample_size)

Required sample size:  63.765611775409525


__Proportions t-test__: A proportions t-test, also known as a z-test, is a type of statistical test that is used to compare the proportions of two categorical variables. It involves comparing the observed proportions of the two variables to the expected proportions under a null hypothesis, and calculating a test statistic and p-value to determine whether the difference between the observed and expected proportions is statistically significant.

In [5]:
import statsmodels.stats.proportion as prop

# Define the number of successes and failures in two groups
successes1 = 30
failures1 = 70
successes2 = 50
failures2 = 50

# Conduct a proportions z-test
z_stat, p_val = prop.proportions_ztest(
    count=[successes1, successes2], nobs=[successes1+failures1, successes2+failures2], alternative='two-sided')

# Print the results
print("Proportions z-test:")
print("z-statistic = ", z_stat)
print("p-value = ", p_val)

Proportions z-test:
z-statistic =  -2.886751345948129
p-value =  0.003892417122778628


## <center>ANOVA<center/>

ANOVA stands for Analysis of Variance, which is a statistical method used to compare the means of two or more groups. ANOVA tests whether the means of different groups are statistically significantly different from each other or not, by examining the variance between the groups and within the groups.

ANOVA is typically used when comparing the means of more than two groups, and it can be used for both parametric and non-parametric data. There are different types of ANOVA, such as one-way ANOVA, two-way ANOVA, and repeated measures ANOVA, each of which has its own specific application.

The basic idea of ANOVA is to compare the amount of variance in the data that can be attributed to the differences between groups (called the between-group variance) to the amount of variance that is due to the differences within groups (called the within-group variance). If the between-group variance is much larger than the within-group variance, then it suggests that the groups are significantly different from each other, and the null hypothesis of equal means can be rejected.

__The null hypothesis in ANOVA is a statement that there is no significant difference between the means of the groups being compared__. Specifically, the null hypothesis for one-way ANOVA is that the population means of all groups are equal.

In [6]:
import scipy.stats as stats
import pandas as pd

df = pd.DataFrame({'Group A': [85, 91, 76, 83, 89], 
                   'Group B': [75, 80, 82, 78, 81], 
                   'Group C': [90, 87, 92, 89, 94]})
                   
data = df.values.flatten()

fvalue, pvalue = stats.f_oneway(df['Group A'], df['Group B'], df['Group C'])

print('F-value:', fvalue)
print('p-value:', pvalue)

F-value: 9.560975609756097
p-value: 0.00328614384744518


In [7]:
import statsmodels.api as sm
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import pandas as pd

# Create example data
data = pd.DataFrame({'Group': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
                     'Values': [1, 2, 3, 4, 5, 6, 7, 8]})

# Perform ANOVA to test for differences between groups
model = sm.formula.ols('Values ~ Group', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Perform Tukey test to compare all pairs of groups
tukey_results = pairwise_tukeyhsd(data['Values'], data['Group'])

print(anova_table)  # Print ANOVA table
print(tukey_results)  # Print Tukey results

          sum_sq   df          F    PR(>F)
Group       40.0  3.0  26.666667  0.004184
Residual     2.0  4.0        NaN       NaN
Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj   lower  upper  reject
---------------------------------------------------
     A      B      2.0 0.1458 -0.8785 4.8785  False
     A      C      4.0 0.0164  1.1215 6.8785   True
     A      D      6.0 0.0037  3.1215 8.8785   True
     B      C      2.0 0.1458 -0.8785 4.8785  False
     B      D      4.0 0.0164  1.1215 6.8785   True
     C      D      2.0 0.1458 -0.8785 4.8785  False
---------------------------------------------------


## <center>Non-parametric testing<center/>

Nonparametric testing is a statistical method used to make inferences about population parameters based on data that do not satisfy the assumptions of traditional parametric tests. Parametric tests assume that the data follows a specific probability distribution, usually the normal distribution, and that the parameters of that distribution are known or estimated from the data. Nonparametric tests, on the other hand, make fewer assumptions about the underlying distribution of the data and are therefore more robust to violations of assumptions such as normality or homoscedasticity.

Nonparametric tests are often used when the data is ordinal or nominal, when the sample size is small, or when the data does not follow a normal distribution. Nonparametric tests are also useful when the research question focuses on differences in central tendency or variability rather than on specific numerical estimates of population parameters.

Here is a brief guide to some of the most common nonparametric tests:

- Wilcoxon signed-rank test: Used to compare two related samples (e.g., pre-test and post-test scores) to determine whether there is a significant difference between them.

- Mann-Whitney U test: Used to compare two independent samples to determine whether there is a significant difference between them.

- Kruskal-Wallis test: Used to compare three or more independent samples to determine whether there is a significant difference between them.

- Friedman test: Used to compare three or more related samples to determine whether there is a significant difference between them.

- Chi-square test: Used to test the association between two categorical variables.

- McNemar's test: Used to compare two related categorical variables to determine whether there is a significant difference between them.

It's worth noting that this is not an exhaustive list, and there are many other nonparametric tests available depending on the research question and type of data. Additionally, it's important to consult the documentation of the specific test being used to determine the appropriate assumptions and test conditions.