<h1 style="text-align:center;">Parametric Test Vs Non-Parametric Test</h1>


<u>**Author</u> :** [Younes Dahami](https://www.linkedin.com/in/dahami/)

# Introduction

You want to calculate a hypothesis test, but you don't know exactly what's the difference is between **parametric** and **non-parametric** tests, and you're wondering when to use each test.

If you want to calculate a hypothesis test, you must first check the assumptions. One of the most common assumptions is the data used must show a **certain distribution**, usually the normal distribution.

To simply put it :

* If your data is **normally distributed**, use the **parametric tests** e.g. : t-test, ANOVA or Pearson Correlation.

* If your data is **not normally distributed**, use the **non-parametric tests** e.g. : Mann-WhitneyU test or a Spearman Correlation.

Of coursen you still have to check if there are further assumptions for the respective test. But in general, there are less assumptions for non-parametric tests than for parametric tests.

Parametric tests are generally mor powerful than non-parametric tests.

## 1) Parametric Tests

**<u>Parametric tests</u>** **:** are statistical tests that make assumptions about the parameters of the population distribution from which the samples are drawn. Specifically, they assume that the data follows a certain distribution (usually a normal distribution) and often involve estimating parameters such as the mean and standard deviation.

Key Characteristics :

* **Assumption of Normality :** Data should be normally distributed or approximately normally distributed.
* **Homogeneity of Variances :** Assumes equal variances among the groups being compared.
* **Scale of Measurement :** Typically requires data that is at least interval scale.

**<u>Examples</u>** **:** t-tests (independent and paired), ANOVA (Analysis of Variance), and linear regression.

## 2) Non-Parametric Tests

**<u>Non-parametric tests</u>** **:** are statistical tests that do not assume a specific distribution for the data. They are used when the assumptions of parametric tests are not met, particularly when the data does not follow a normal distribution or when dealing with ordinal data or non-quantitative data.

Key Characteristics :

* **No Assumption of Distribution :** Does not require the data to follow a normal distribution.
* **Flexible with Data Types :** Can handle ordinal data, ranks, and non-quantitative data.

**<u>Examples</u>** **:** Mann-Whitney U test, Wilcoxon signed-rank test, Kruskal-Wallis test, and Spearman's rank correlation.

Parametric tests are more powerful when their assumptions are met because they use more information from the data, while non-parametric tests are more versatile and can be applied to a wider range of data types and distributions.

|                                    | Parametric Tests        | Non-Parametric Tests         | 
| ---------------------------------- | ----------------------- | --------------------------   | 
| **One Sample**                       | Simple t-Test           | Wilcoxon Test for One Sample |   
| **Two dependent Samples**              | Paired Sample t-Test    | Wilcoxon Test                | 
| **Two independent Samples**            | Unpaired Sample t-Test  | Mann-Whitney U Test          |   
| **More than two independent Samples**  | One Factorial ANOVA     | Kruskal-Wallis Test          |   
| **More than two dependent Samples**    | Repeated Measures ANOVA | Friedman Test                |   
| **Correlation Between Two Variables**  | Pearson Correlation     | Spearman Correlation         |   



## 3) Hypothesis Testing 

Hypotheses are claims (assertions) that can be verified or refuted using statistical methods. Hypothesis testing organizes these assertions into a framework, allowing us to use statistical evidence to evaluate their validity. This process enables us to determine whether the claims hold true or not.

In this notebook, I will demonstrate hypothesis testing with Python through several step-by-step examples. But first, let me briefly explain the hypothesis testing process. If you prefer, you can skip directly to the questions.

### 3.1) Defining Hypotheses

First, we need to identify the scientific question we aim to answer and formulate it as the Null Hypothesis $(H_0)$ and the Alternative Hypothesis ($H_{1}$ or $H_{a}$). Remember that $(H_0)$ and $H_{1}$ must be mutually exclusive, and $H_{1}$ should not include equality :

$H_0 : \mu = x, H_{1}: μ \neq x$

$H_0 : \mu \leq x, H_{1}: μ > x$

$H_0 : \mu \geq x, H_{1}: μ < x$


### 3.2) Assumption Check

To determine whether to use the **parametric** or **non-parametric** version of the test, we need to verify the following requirements :

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample are normally distributed.
* Observations in each sample have the same variance.

### 3.3) Selecting the Proper Test

Next, we need to choose the appropriate test. It's crucial to analyze the number of groups being compared and whether the data are paired or not. To determine if the data are matched, consider whether the data were collected from the same individuals. Based on these factors, you can select the appropriate test using the chart below.


![alt](tests_table.png)

### 3.4) Decision and Conclusion

After performing hypothesis testing, we obtain a **p-value** that indicates the significance of the test.

If the p-value is smaller than the alpha level (the significance level $\alpha$), there is enough evidence to reject $H_0$. Otherwise, we fail to reject $H_0$. Remember, rejecting $H_0$ supports $H_{1}$, but failing to reject $H_0$ does not confirm $H_0$'s validity nor does it indicate that $H_{1}$ is incorrect.

![](pvalue.jpg)

# 4) Tests Examples

## 4.1) t-Test Independent

![](1.png)

A university professor transitioned from face-to-face classes to online lectures due to Covid-19. Subsequently, he uploaded recorded lectures to the cloud for students who chose to follow the course asynchronously. Despite this, he maintains a belief that students who attend the live classes and actively participate are more successful. To investigate this, he recorded the average grades of students at the end of the semester. The data is presented below.

ynchronous = [93.2, 85.3, 82.9, 68.5, 79.9, 80.6, 80.9, 77.2, 81.5, 79.1, 72.3, 88.1, 86.8, 94.1, 83.6, 78.7, 77.4, 70.6, 89.1, 75.6, 73.9, 81.2]

asynchronous = [76.9, 71.3, 91.4, 71.5, 75.3, 84.9, 68.1, 70.5, 75.7, 71.2, 65.9, 72.4, 71.1, 78.5]

Perform hypothesis testing to determine the statistical significance of the professor's belief, using a significance level of 0.05 to assess the null and alternative hypotheses. Before conducting hypothesis testing, verify the relevant assumptions. Provide commentary on the obtained results.

#### 1. Defining Hypotheses

Since the grades are obtained from different individuals, the data is unpaired.

* Null Hypothesis $(H_0) :$ The mean grade of students who attend live classes is less than or equal to the mean grade of students who watch recorded lectures.

$$H_0 : \mu_s \leq \mu_a$$

* Alternative Hypothesis $(H_1) :$ The mean grade of students who attend live classes is greater than the mean grade of students who watch recorded lectures.

$$H_1 : \mu_s > \mu_a$$


**Assumption Check :**

* Null Hypothesis $(H_0) :$ The data is normally distributed.
* Alternative Hypothesis $(H_1) :$ The data is not normally distributed.

Assuming $\alpha=0.05,$ if the p-value is $>0.05,$ we can conclude that the data is normally distributed.
For assessing normality, I employed the **Shapiro-Wilk test,** which is typically favored for smaller samples. However, other options such as the Kolmogorov-Smirnov and D’Agostino and Pearson’s tests are available. Please refer to https://docs.scipy.org/doc/scipy/reference/stats.html for further details.

In [9]:
import numpy as np
from scipy import stats

In [10]:
def normality_check(data):
    """Checks if the data is normally distributed"""
    test_stat_normality, p_value_normality=stats.shapiro(data)
    print("p value:%.4f" % p_value_normality)
    if p_value_normality <0.05:
        print("Reject null hypothesis => The data is not normally distributed")
    else:
        print("Fail to reject null hypothesis => The data is normally distributed")

In [11]:
sync = np.array([93.2, 85.3, 82.9, 68.5, 79.9, 80.6, 80.9, 77.2, 81.5, 79.1, 72.3, 88.1, 86.8,
                 94.1, 83.6, 78.7, 77.4, 70.6, 89.1, 75.6, 73.9, 81.2])
asyncr = np.array([76.9, 71.3, 91.4, 71.5, 75.3, 84.9, 68.1, 70.5, 75.7, 71.2, 65.9, 72.4, 71.1, 78.5])
normality_check(sync)
normality_check(asyncr)

p value:0.9515
Fail to reject null hypothesis => The data is normally distributed
p value:0.0548
Fail to reject null hypothesis => The data is normally distributed
