# Paired T-Test

## What is the t-test for paired samples?
The dependent samples T-test (or paired samples t-test) is a statistical test that determines whether there is a difference between two dependent groups or samples.

The dependent samples T-test, or also known as the dependent t-test, tests whether the mean values of two dependent groups differ significantly from each other.

![Paired T-Test](images/paired_t-Test.png)

## Why do you need the dependent t-Test?
You need the paired t-test whenever you survey the same group or sample at two points in time. For example, you might be interested in whether a rehabilitation program has a positive effect on physical fitness. Since you can't ask all the people who go to rehab, you use a random sample. You can then use the paired t-test to infer the population from the sample.

![Dependent sample T-Test](images/dependent_sample_t-test.png)

## Dependent Samples t-test
### What are dependent or paired samples?
In dependent samples, these measured values are available in pairs. The pairs result from repeated measurements, parallelization or matching. This can be the case, for example, in longitudinal studies with several measurement points (time series analyses) or in intervention studies with experimental designs (before-after measurement).

An example of dependent sampling is when the weight of a group of people is measured at two points in time. A person can then be uniquely assigned a weight at the first and second measurement time points and the difference in the measured values can be calculated in each case. If more than two measurement times are available, ANOVA with repeated measures is used.

### What is the advantage of a dependent t-test over an independent t-test?
The question of whether to use a dependent t-test or an independent t-test is, of course, already determined as part of the study design, and it is not possible to arbitrarily use either one test or the other. Therefore, the question is rather which type of study makes more sense:

Conducting a study with one group of participants who are measured twice.
To conduct a study with two separate groups of participants, each measured once.
The major advantage of a repeated-measures design that then uses the paired t-test is that individual differences between participants can be eliminated. This means that the probability of detecting a (statistically significant) difference, if one exists, is higher with the paired t-test than with the independent t-test.

![Dependent sample T-Test](images/dependent t-test-vs-independent t-test.png)

## Hypotheses
Now the hypothesis can be derived from the question. In the hypothesis, a preliminary, i.e. unsubstantiated, assumption is made which is to be tested. In the case of a t test for dependent samples, the hypotheses are:

- Null hypothesis H0: The mean value of the two dependent groups is equal.
- Alternative hypothesis H1: The mean values of the two dependent groups are different.

## Assumptions paired t-Test
Of course, the assumptions must be checked before calculating the paired t-test. If the prerequisites 2 and 3 (listed below) are not fulfilled, the Wilcoxon test must be used. The Wilcoxon test is the non-parametric counterpart of the paired t-test.

### 1. There are two dependent groups or samples
As the name paired t-test already suggests the groups must be dependent, i.e. a value of one group must relate to a value of the other group.

 The weight of one and the same person is measured before and after a diet.
 Researchers measure the weight of people who have been on a diet and people who have not.
### 2. The variables are interval scaled
In the t-test for dependent samples, the difference between the two dependent values is calculated and then the mean value. This only makes sense if the values are metric

 The salary of a person (in Euro)
 The educational level of a person
### 3. The differences of the paired values are normally distributed.
The difference between the paired values must be normally distributed.

 The difference from the weight of one person at two points in time.
 The difference in the number of points after throwing two dice.

## Problem Statement:

The Human Resources Development (HRD) manager wishes to investigate whether there has been any change in the aptitude of trainees after undergoing a specific training program. To assess this, trainees take an aptitude test both before and after the training program. The HRD manager wants to determine if there is a significant difference in the test scores before and after the training program.

The data provided includes the test scores of nine trainees before and after the training program:

| Subject | Before (x) | After (y) |
|---------|------------|-----------|
| 1       | 75         | 70        |
| 2       | 70         | 77        |
| 3       | 46         | 57        |
| 4       | 68         | 60        |
| 5       | 68         | 79        |
| 6       | 43         | 64        |
| 7       | 55         | 55        |
| 8       | 68         | 77        |
| 9       | 77         | 76        |

The HRD manager wants to perform a hypothesis test at a 5% level of significance to determine if there is a significant difference in the test scores after the training program compared to before.

## Solution 

To determine if there has been any significant change in the ability of trainees after the training program, we can use a paired samples t-test. This test will help us assess whether there is a statistically significant difference between the mean scores before and after the training program.

Let's start by calculating the differences between the scores before and after the training program for each subject:

| Subject | Before (x) | After (y) | Difference (y - x) |
|---------|------------|-----------|--------------------|
| 1       | 75         | 70        | -5                 |
| 2       | 70         | 77        | 7                  |
| 3       | 46         | 57        | 11                 |
| 4       | 68         | 60        | -8                 |
| 5       | 68         | 79        | 11                 |
| 6       | 43         | 64        | 21                 |
| 7       | 55         | 55        | 0                  |
| 8       | 68         | 77        | 9                  |
| 9       | 77         | 76        | -1                 |

First, let's compute $ \bar{d} $, the mean difference:

$ \bar{d} = \frac{{\sum (y - x)}}{n} $

$ \bar{d} = \frac{{(-5 + 7 + 11 - 8 + 11 + 21 + 0 + 9 - 1)}}{9} $

$ \bar{d} = \frac{{43}}{9} $

$ \bar{d} \approx 4.78 $

Next, let's calculate $ s_d $, the standard deviation of the differences:

$ s_d = \sqrt{\frac{{\sum (y - x - \bar{d})^2}}{n - 1}} $

$ s_d = \sqrt{\frac{{(-5 - 4.78)^2 + (7 - 4.78)^2 + (11 - 4.78)^2 + (-8 - 4.78)^2 + (11 - 4.78)^2 + (21 - 4.78)^2 + (0 - 4.78)^2 + (9 - 4.78)^2 + (-1 - 4.78)^2}}{9 - 1}} $

$ s_d = \sqrt{\frac{{37.8328 + 4.6228 + 37.8328 + 203.0832 + 37.8328 + 244.2232 + 24.3268 + 23.3928 + 51.3308}}{8}} $

$ s_d = \sqrt{\frac{{664.4372}}{8}} $

$ s_d \approx \sqrt{83.05465} $

$ s_d \approx 9.11 $

Now, let's calculate the t-value using the formula:

$ t = \frac{{\bar{d}}}{{\frac{s_d}{\sqrt{n}}}} $

$ t = \frac{{4.78}}{{\frac{9.11}{\sqrt{9}}}} $

$ t = \frac{{4.78}}{{\frac{9.11}{3}}} $

$ t = \frac{{4.78}}{{3.037}} $

$ t \approx 1.57 $

Now, we need to find the critical value from the t-distribution table with 8 degrees of freedom (since $ n = 9 $) at a 5% level of significance.

Looking up in the table, the critical value is approximately 2.306.

![Table_t-distribution](images/Table_t-distribution.png)

Since the calculated t-value (1.57) is less than the critical value (2.306), we fail to reject the null hypothesis.

Conclusion: There is insufficient evidence to conclude that there is a significant difference in the test scores after the training program at a 5% level of significance.