# Statistical Hypothesis testing

A statistical test is a procedure for deciding whether an assertion (e. g. a hypothesis) about a quantitative feature of a population is true or false. We test a hypothesis of<br> this sort by drawing a random sample from the population in question and calculating an appropriate statistic on its items.

They can be used to:
- determine whether a predictor variable has a statistically significant relationship with an outcome variable.
- estimate the difference between two or more groups.

Statistical tests assume a ```null hypothesis``` of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of<br> values predicted by the null hypothesis.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test:
- **Null hypothesis (H0)**: There’s no effect in the population.
- **Alternative hypothesis (Ha or H1)**: There’s an effect in the population.

The effect is usually the effect of the [independent variable](https://www.scribbr.com/methodology/independent-and-dependent-variables/#independent) on the [dependent variable](https://www.scribbr.com/methodology/independent-and-dependent-variables/#dependent).

## Answering your research question with hypotheses

The null and alternative hypotheses offer competing answers to your research question. When the research question asks “Does the independent variable<br> affect the dependent variable?”:

- The null hypothesis (H0) answers “No, there’s no effect in the population.”
- The alternative hypothesis (Ha) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample. Often,<br> we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research<br> to write strong hypotheses.

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the<br> null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

## What is a null hypothesis?

The null hypothesis is the claim that there’s no effect in the population.<br>

If the sample provides enough evidence against the claim that there’s no effect in the population (p ≤ α), then we can [reject the null hypothesis](https://www.scribbr.com/statistics/hypothesis-testing/#step-4-decide-whether-to-reject-or-fail-to-reject-your-null-hypothesis). Otherwise, we fail to reject<br> the null hypothesis.<br>

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept. Be careful not to say you “prove” or “accept” the null hypothesis.

### P-value

In statistics, a p-value is a number that indicates how likely you are to obtain a value that is at least equal to or more than the actual observation if the null hypothesis is<br> correct.

**KEY TAKEAWAYS**

- A p-value is a statistical measurement used to validate a hypothesis against observed data.
- A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true.
- The lower the p-value, the greater the statistical significance of the observed difference.
- A p-value of 0.05 or lower is generally considered statistically significant.
- P-value can serve as an alternative to—or in addition to—preselected confidence levels for hypothesis testing.

To know more about P-value, go through this [document](https://www.investopedia.com/terms/p/p-value.asp#:~:text=A%20p%2Dvalue%20less%20than,null%20hypothesis%20is%20not%20rejected.)!

**Example: Population on trial:**<br>

Think of a statistical test as being like a legal trial. The population is accused of the “crime” of having an effect, and the sample is the criminal evidence. In the United<br> States and many other countries, a person accused of a crime is assumed to be innocent until proven guilty. Similarly, we start by assuming the population is “innocent” of having<br> an effect.

In other words, the null hypothesis (i.e., that there is no effect) is assumed to be true until the sample provides enough evidence to reject it.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =,<br> but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When<br> you incorrectly reject the null hypothesis, it’s called a type I error. When you incorrectly fail to reject it, it’s a type II error.

## What is an alternative hypothesis?

The alternative hypothesis (Ha) is the other answer to your research question. It claims that there’s an effect in the population.<br>

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.<br>

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They<br> are also mutually exclusive, meaning that only one can be true at a time.

```Tip```<br>

Be careful with your words when you report the results of a statistical test in a research paper or thesis. If you reject the null hypothesis, you can say that the alternative<br> hypothesis is supported. On the other hand, if you fail to reject the null hypothesis, then you can say that the alternative hypothesis is not supported. Never say that you’ve<br> proven or disproven a hypothesis.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always<br> include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

See [here](https://www.scribbr.com/statistics/null-and-alternative-hypotheses/) for more details and examples!

If you already know what [types of variables](https://www.scribbr.com/methodology/types-of-variables/) you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

## What does a statistical test do?

Statistical tests work by calculating a [test statistic](https://www.scribbr.com/statistics/test-statistic/) – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no<br> relationship.

It then calculates a [p value](https://www.scribbr.com/statistics/p-value/) (probability value). The p-value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis<br> of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a [statistically significant](https://www.scribbr.com/statistics/statistical-significance/) relationship between the<br> predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the<br> predictor and outcome variables

## When to perform a statistical test

- You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment, or through observations made using probability sampling methods.

- For a statistical test to be valid, your sample size needs to be large enough to approximate the true distribution of the population being studied.

- To determine which statistical test to use, you need to know:

    - whether your data meets certain assumptions.
    - the types of variables that you’re dealing with.

To know more about different parametric tests that are available, please go through [this](https://www.scribbr.com/statistics/statistical-tests/) document!

**Let's move on to some hands own session, where you will see how to use some of these tests and implement it in python to get insights from the data!**

## Python libraries for statistical tests

The most famous and supported python libraries that collect the main statistical tests are:
- **Statsmodel:** a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and<br> statistical data exploration.
- **Pingouin:** an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy.
- **Scipy:** a Python-based ecosystem of open-source software for mathematics, science, and engineering.

### Testing the assumptions

As for the independence assumption, this must be known a priori by you, there is no way to extrapolate it from the data. For the other two assumptions instead: we can use Scipy<br> (data can be downloaded [here](https://www.kaggle.com/datasets/webirlab/iris-data/data)):

The [Levene test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html) tests the null hypothesis that all input samples are from populations with equal variances. Three variations of Levene’s test are possible. The possibilities and<br> their recommended usages are:

- ‘median’ : Recommended for skewed (non-normal) distributions>

- ‘mean’ : Recommended for symmetric, moderate-tailed distributions.

- ‘trimmed’ : Recommended for heavy-tailed distributions.

In [2]:
from scipy import stats
import pandas as pd
# import the data
df= pd.read_csv("data/Iris_Data.csv")
setosa = df[(df['species'] == 'Iris-setosa')]
versicolor = df[(df['species'] == 'Iris-versicolor')]
# homogeneity
print(stats.levene(setosa['sepal_width'], versicolor['sepal_width']))
# Shapiro-Wilk test for normality
print(stats.shapiro(setosa['sepal_width']))
print(stats.shapiro(versicolor['sepal_width']))

LeveneResult(statistic=0.6635459332943233, pvalue=0.4172859681296204)
ShapiroResult(statistic=0.9686915278434753, pvalue=0.20464898645877838)
ShapiroResult(statistic=0.9741330742835999, pvalue=0.33798879384994507)


The test is not significant (huge p-value), meaning that there is homogeneity of variances and we can proceed.

Neither test for normality was significant, so neither variable violates the assumption. Both tests were successful. As for independence, we can assume it a priori knowing the<br> data. We can proceed as planned.

## T-test

```t test```, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated (independent) groups?<br> Used to compare two sample means from unrelated groups. This means that there are different people providing scores for each group. The purpose of this test is to determine<br> if the samples are different from each other. See [here](https://www.investopedia.com/terms/t/t-test.asp#:~:text=A%20t%2Dtest%20is%20an,flipping%20a%20coin%20100%20times.) to know more about t test.

To conduct the Independent ```t-test```, we can use the stats.ttest_ind() method:

In [3]:
#applying on the same dataset
stats.ttest_ind(setosa['sepal_width'], versicolor['sepal_width'])

Ttest_indResult(statistic=9.282772555558111, pvalue=4.362239016010214e-15)

The Independent t-test results are significant (p-value very very small)! Therefore, we can reject the null hypothesis in support of the alternative hypothesis.

## ANOVA test

ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups. ANOVA checks the impact of one or<br> more factors by comparing the means of different samples.

When you compare ANOVA with t test, ANOVA compares means among three or more groups, whereas t-tests solely compare means between two groups. ANOVA encompasses an analysis of<br> between-group and within-group variation, whereas t-tests focus solely on within-group variation.

To know more about this test, read [here](https://www.scribbr.com/statistics/one-way-anova/#:~:text=ANOVA%20tells%20you%20if%20the,hours%20of%20sleep%20per%20night.).

To apply ANOVA, we rely on Pingouin. We use a dataset included in the library:

In [None]:
#install the pingouin package
pip install pingouin

In [10]:
import pingouin as pg
# Read an example dataset
df = pg.read_dataset('mixed_anova')

# Run the ANOVA
aov = pg.anova(data=df, dv='Scores', between='Group', detailed=True)
print(aov)

   Source          SS   DF        MS         F   p-unc       np2
0   Group    5.459963    1  5.459963  5.243656  0.0232  0.028616
1  Within  185.342729  178  1.041251       NaN     NaN       NaN


As we can see we have a p-value below the threshold, so there is a significant difference between the various groups! Unfortunately, having more than two groups, we cannot know<br> which of them is the difference. To find out, you need to apply T-test in pairs. It can be done via the method pingouin.pairwise_ttests.

## Chi-Squared Test

The chi-squared test is a widely used statistical test that is used to determine whether there is a significant difference between the expected frequencies and the observed<br> frequencies in one or more categorical variables. In other words, it is a test of independence between two categorical variables.

In summary, ANOVA is used to compare means across multiple groups with continuous dependent variables and categorical independent variables. On the other hand, Chi-Square tests<br> assess the association or independence between categorical variables. Comparing it with t-test, the t-test is used to compare the means of two groups and is suitable for continuous<br> numerical data. On the other hand, the chi-square test is used to examine the association between two categorical variables.

To know more about the mathamatics behind Chi-square, read through the following [document](https://www.simplilearn.com/tutorials/statistics-tutorial/chi-square-test). 

Now that we have the observed frequencies and the expected frequencies, we can use the ```scipy.stats.chisquare``` function to perform the chi-squared test. The function takes two<br> arguments: the observed frequencies and the expected frequencies. Here is the code:

In [11]:
from scipy import stats

# Observed frequencies
observed_frequencies = [25, 30, 20, 15, 10]

# Expected frequencies
expected_frequencies = [20, 20, 20, 20, 20]

# Perform the chi-squared test
chi2, p = stats.chisquare(observed_frequencies, f_exp=expected_frequencies)

# Print the test statistic and p-value
print("Chi-squared test statistic:", chi2)
print("p-value:", p)

Chi-squared test statistic: 12.5
p-value: 0.013995792487650894


The function returns two values: the chi-squared test statistic, and the p-value. The p-value is the probability that the test statistic would be as extreme or more extreme than<br> the one observed, assuming that the null hypothesis (that the observed frequencies are the same as the expected frequencies) is true. A small p-value (typically less than 0.05)<br> indicates that the observed frequencies are significantly different from the expected frequencies, and that the null hypothesis can be rejected.

In this example, the p-value is very small, which means that we can reject the null hypothesis and conclude that there is a significant difference between the observed<br> frequencies and the expected frequencies. This suggests that not all types of ice cream are equally popular among the individuals surveyed.

**Overwhelming? I understand. You can go through each of the test in detail later, following the links given and try to apply the test on different data and see what you observe!** 
- Each test has its own use cases and you will remember when such data sets comes up! Just use this notebook as a reference for then!
- At this point, you can go on with EDA, creating your own hypothesis and come up with solutions and explanations! Which gives you more freedom to explore the data. 

There are many more statistical tests available. Here is a cheatsheet for you to read: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/