# Significance Tests

In the world of statistics, we often use parametric methods, which assume that our data samples follow a Gaussian distribution.
- This is particularly important in applied machine learning when we need to compare data samples, especially to evaluate whether one technique performs better than another on one or more datasets.

- To address this question and make sense of the results, we employ parametric hypothesis testing methods such as the Student's t-test and ANOVA (Analysis of Variance).

By the end of this tutorial, you'll have a clear understanding of:

- The Student's t-test: Used to quantify the difference between the means of two independent data samples.

- The paired Student's t-test: Applied when quantifying the difference between the means of two dependent data samples.

- ANOVA and repeated measures ANOVA: Useful for assessing the similarity or difference between the means of two or more data samples.

These significance tests are valuable tools for drawing meaningful conclusions from your data comparisons.


---



## Parametric Statistical Significance Tests

- Parametric statistical tests make assumptions about data samples, typically assuming they follow a Gaussian distribution.
  - Because many real-world datasets do approximate this distribution, parametric methods are widely used.
  - These tests help answer a common question: Do two or more data samples share the same underlying distribution?

- Parametric statistical significance tests work on the premise that the data is derived from the same Gaussian distribution, meaning they have the same mean and standard deviation – the defining characteristics of this distribution.

- In general, each test computes a test statistic, which requires some understanding of statistics and knowledge of the specific test being used.
  - These tests also provide a p-value, which plays a vital role in interpretation.
  - Think of the p-value as the likelihood of observing the two data samples under the assumption (null hypothesis) that they were drawn from populations with the same distribution.

- The p-value is assessed in the context of a chosen significance level, often referred to as alpha, commonly set at 5% (or 0.05). Here's how to interpret it:

  - If p-value ≤ alpha, it's a significant result, and you reject the null hypothesis (H0). This suggests that the data samples probably come from different distributions.

  - If p-value > alpha, it's not a significant result, and you fail to reject the null hypothesis (H0). In this case, it indicates that the data samples are likely drawn from populations with similar distributions.



---




## Test Data

Before we delve into specific parametric significance tests, let's start by defining a test dataset to illustrate each test.

- We'll create two samples, each drawn from different distributions.
- Both samples will be generated from Gaussian distributions using the NumPy function `randn()`.
- Each sample will consist of 100 Gaussian random numbers with a mean of 0 and a standard deviation of 1.

- We'll modify these samples as follows:
  - Observations in the first sample will be adjusted to have a mean of 50 and a standard deviation of 5.
  - Observations in the second sample will be adjusted to have a mean of 51 and a standard deviation of 5.

- Our expectation is that the statistical tests will detect that these two samples were drawn from different distributions.
  - However, keep in mind that the small sample size of 100 observations per sample may introduce some noise into this decision.

- Here's the complete code example for generating and summarizing the test data:
  - Running this code will generate the data samples, calculate the mean and standard deviation for each sample, and confirm that they come from different distributions:

In [None]:
# Import necessary libraries
from numpy.random import seed
from numpy.random import randn
from numpy import mean, std

# Seed the random number generator
seed(1)

# Generate two sets of univariate observations
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51

# Summarize the data
print('data1: mean=%.3f stdv=%.3f' % (mean(data1), std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (mean(data2), std(data2)))

data1: mean=50.303 stdv=4.426
data2: mean=51.764 stdv=4.660


# Student’s t-Test

- The Student’s t-test is a statistical hypothesis test used to compare two independent data samples that are known to have a Gaussian distribution.
  - It's named after William Gosset, who used the pseudonym "Student."

- One of the most common uses of the t-test is the independent samples t-test.
  - It comes in handy when you want to compare the means of two independent samples for a specific variable.

- In this test, the null hypothesis (H0) assumes that the means of the two populations are equal.
  - If we reject this hypothesis, it means there's enough evidence to conclude that the means of the populations are different, and consequently, the distributions are not equal.

- Here's how to interpret the results of the t-test:

  - Fail to Reject H0: Indicates no significant difference between the sample means.
  - Reject H0: Suggests a significant difference between the sample means.

- You can perform the Student’s t-test in Python using the `ttest_ind()` function from SciPy.
  - This function takes two data samples as arguments and returns the calculated statistic and p-value.
  - The test assumes that both samples have the same variance. If they don't, you can use a corrected version of the test by setting the `equal_var` parameter to False.

- Here's a code example demonstrating the Student’s t-test on a test dataset, with the expectation that the test will reveal a difference in distribution between the two independent samples:

  - Running this code will calculate the Student’s t-test on the data samples and print the statistic and p-value.
  - In the example, the interpretation reveals that the sample means are different, with a significance level of at least 5%:


In [None]:
# Student's t-test
from numpy.random import seed, randn
from scipy.stats import ttest_ind

# Seed the random number generator
seed(1)

# Generate two independent samples
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51

# Compare the samples
stat, p = ttest_ind(data1, data2)
print('Statistics=%.3f, p=%.3f' % (stat, p))

# Interpret the result
alpha = 0.05
if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

# Paired Student’s t-Test

- Sometimes, we need to compare the means of two data samples that are somehow related.
  - For instance, these data samples may represent two evaluations of the same object, making them dependent or "paired" samples.
  - When dealing with such paired samples, we cannot use the standard Student’s t-test. Instead, we turn to a modified version of the test designed for dependent data, called the paired Student’s t-test.

- A dependent samples t-test is used to compare two means on a single dependent variable.
  - Unlike the standard independent samples test, a dependent samples t-test is specifically for comparing the means of a single sample or two matched/paired samples.

- In this test, we simplify the assumption that there is variation between observations, typically made in pairs before and after a treatment on the same subjects.
  - The null hypothesis (H0) of the test assumes there is no difference in means between the samples.
  - Rejecting this null hypothesis indicates that there is sufficient evidence that the sample means are different.

- Here's how to interpret the results:

  - Fail to Reject H0: Indicates that the paired sample distributions are equal.
  - Reject H0: Suggests that the paired sample distributions are not equal.

- You can perform the paired Student’s t-test in Python using the `ttest_rel()` function from SciPy.
  - This function takes two data samples as arguments and returns the calculated statistic and p-value.
  - Below is an example of the paired Student’s t-test on a test dataset. Although the samples are independent for this demonstration, we'll pretend they are paired to calculate the statistic:
    - Running this code will calculate the paired Student’s t-test on the data samples and print the statistic and p-value.
    - In this example, the interpretation suggests that the samples have different means and, therefore, different distributions:


In [None]:
# Paired Student's t-test
from numpy.random import seed, randn
from scipy.stats import ttest_rel

# Seed the random number generator
seed(1)

# Generate two independent samples
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 51

# Compare the samples
stat, p = ttest_rel(data1, data2)
print('Statistics=%.3f, p=%.3f' % (stat, p))

# Interpret the result
alpha = 0.05
if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

Statistics=-2.372, p=0.020
Different distributions (reject H0)


# Analysis of Variance Test (ANOVA)

- In some cases, we may have multiple independent data samples and want to determine whether they all have the same distribution.
  - Rather than performing the Student’s t-test pairwise on each combination of data samples, we can use the analysis of variance test, known as ANOVA.
  - ANOVA is a statistical test that assumes the means across two or more groups are equal.
  - If the evidence suggests otherwise, the null hypothesis is rejected, indicating that at least one data sample has a different distribution.

- Here's how to interpret the results:

  - Fail to Reject H0: Indicates that all sample distributions are equal.
  - Reject H0: Suggests that one or more sample distributions are not equal.

- Importantly, ANOVA can only tell us whether all samples are the same or not; it cannot quantify which samples differ or by how much.

- The purpose of a one-way analysis of variance (one-way ANOVA) is to compare the means of two or more groups (the independent variable) on one dependent variable to see if the group means are significantly different from each other.

- The ANOVA test requires that the data samples follow a Gaussian distribution, that the samples are independent, and that all data samples have the same standard deviation.

- You can perform the ANOVA test in Python using the `f_oneway()` function from SciPy. This function takes two or more data samples as arguments and returns the test statistic and f-value.

  - Running this code calculates and prints the test statistic and the p-value.
  - In this example, the interpretation indicates that the null hypothesis is rejected, suggesting that one or more sample means are different:


In [None]:
# Analysis of Variance Test (ANOVA)
from numpy.random import seed, randn
from scipy.stats import f_oneway

# Seed the random number generator
seed(1)

# Generate three independent samples
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 50
data3 = 5 * randn(100) + 52

# Compare the samples
stat, p = f_oneway(data1, data2, data3)
print('Statistics=%.3f, p=%.3f' % (stat, p))

# Interpret the result
alpha = 0.05
if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

Statistics=3.655, p=0.027
Different distributions (reject H0)


# Repeated Measures ANOVA Test

- In some cases, we may have multiple data samples that are related or dependent.
  - For example, we might collect measurements on the same subject at different time periods.
  - In such scenarios, the data samples are no longer independent; instead, they become paired or dependent samples.
  - Instead of repeatedly performing pairwise Student’s t-tests, we can use a single test to determine if all the samples have the same mean. This test is called the repeated measures ANOVA test.

- The default assumption or null hypothesis of this test is that all paired samples have the same mean, implying they share the same distribution.
  - If the data suggests otherwise, we reject the null hypothesis, indicating that one or more paired samples have different means.

- Here's how to interpret the results:

  - Fail to Reject H0: Indicates that all paired sample distributions are equal.
  - Reject H0: Suggests that one or more paired sample distributions are not equal.

- Repeated-measures ANOVA offers several advantages over paired t-tests. With this test, we can examine differences on a dependent variable measured at more than two time points, whereas an independent t-test only allows comparisons between two time points.

- Unfortunately, as of the time of writing, SciPy does not include a version of the repeated measures ANOVA test.

---



## Further Reading

### Books
- "Statistics in Plain English, Third Edition" (2010)

### API Documentation
- [scipy.stats.ttest_ind API](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)
- [scipy.stats.ttest_rel API](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html)
- [scipy.stats.f_oneway API](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html)

### Articles
- [Statistical Significance on Wikipedia](https://en.wikipedia.org/wiki/Statistical_significance)
- [Student’s t-test on Wikipedia](https://en.wikipedia.org/wiki/Student%27s_t-test)
- [Paired Difference Test on Wikipedia](https://en.wikipedia.org/wiki/Paired_difference_test)
- [Analysis of Variance on Wikipedia](https://en.wikipedia.org/wiki/Analysis_of_variance)
- [Repeated Measures Design on Wikipedia](https://en.wikipedia.org/wiki/Repeated_measures_design)

---
