In [31]:
from scipy.stats import ks_2samp
import pandas as pd
from scipy.stats import stats
import numpy as np
from scipy.stats import shapiro

### For the results of a two sample t-test to be valid, the following assumptions should be met:
- The observations in one sample should be independent of the observations in the other sample. **Given**
- The data should be approximately normally distributed. **Will be tested**
- The two samples should have approximately the same variance. If this assumption is not met, you should instead perform Welch’s t-test. **Will be tested**
- The data in both samples was obtained using a random sampling method. **Given**

## Classification Results Real vs Synthetic

H0: The mean performance score on the real dataset is equal to the mean performance score on the synthetic dataset.

In [79]:
results_real = [47.1875,54.6875,51.40625,60.546875,52.65625,53.90625,56.015625,60.390625]
results_synthetic = [47.890625,51.25,49.84375,64.375,50.859375,53.984375,55.078125,62.5]

### Test for normality 
If the P-Value of the Shapiro Wilk Test is larger than 0.05, we assume a normal distribution

In [76]:
print(shapiro(results_real))
print(shapiro(results_synthetic))

ShapiroResult(statistic=0.95196932554245, pvalue=0.7310628294944763)
ShapiroResult(statistic=0.8799775838851929, pvalue=0.18823620676994324)


### Test for same variance

In [80]:
print(np.var(results_real))
print(np.var(results_synthetic))

17.539119720458984
31.411361694335938


Same variance is not given --> therefore a Welch's t-test is performed

In [78]:
stats.ttest_ind(results_real, results_synthetic, equal_var = False)

Ttest_indResult(statistic=0.04800803518032314, pvalue=0.9624418239297778)

Typically, a significance level (e.g., p < 0.05) is chosen, and if the p-value is below this threshold, it suggests a statistically significant difference in performance between the two conditions or models

Since the pvalue is > 0.05 we have no reason to reject the H0 hypothesis.

## Classification results of the normal dataset and reduced set

H0: The mean performance score on the whole dataset is equal to the mean performance score on the reduced dataset

In [61]:
full = [78.25,65.50,67.50,75.50,78.75,63,73,75,76.75,64.50,67,75]
reduced = [79.75,65,69.50,75,78.75,65,73,75,76.75,63,66.5,75]

### Test for normality 
If the P-Value of the Shapiro Wilk Test is larger than 0.05, we assume a normal distribution

In [77]:
print(shapiro(full))
print(shapiro(reduced))

ShapiroResult(statistic=0.8896287679672241, pvalue=0.11654853075742722)
ShapiroResult(statistic=0.9132462739944458, pvalue=0.23475410044193268)


### Test for same variance

In [66]:
print(np.var(full))
print(np.var(reduced))

30.14019097222223
30.952690972222225


Same variance is given --> normal t-test is performed

In [62]:
stats.ttest_rel(full, reduced)

Ttest_relResult(statistic=-0.670881385239266, pvalue=0.5161354818838517)

Since the pvalue is > 0.05 we have no reason to reject the H0 hypothesis