# Empirical Investigation: Statistical Test

In [1]:
import pandas as pd
from statsmodels.stats.weightstats import ttest_ind
from scipy import stats
import csv


In [23]:
with open('final_accs.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file)
    header = next(reader)
    data = [row for row in reader]

df = pd.DataFrame(data, columns=header)

In [29]:
df = df.astype(float)


In [25]:
df_runtime = df[['ff_train_time', 'ff_test_time', 'fb_train_time', 'fb_test_time']]
df_runtime

Unnamed: 0,ff_train_time,ff_test_time,fb_train_time,fb_test_time
0,106.953362,2.454056,4.020643,0.017794
1,106.817223,1.951378,4.012678,0.018571
2,106.863541,2.411181,4.040795,0.017637
3,107.069862,1.963288,4.019366,0.018249
4,107.060204,2.325379,4.039193,0.017631
5,106.90406,1.990357,4.055216,0.017784
6,107.00209,1.989709,4.028404,0.017763
7,106.97863,2.016014,4.020052,0.012162
8,106.884392,1.992291,4.040663,0.017647
9,106.964607,1.896359,4.013278,0.017724


In [31]:
#vergleich mit Ausgabe
mean_values = df_runtime.mean()
print(mean_values)

ff_train_time    106.957230
ff_test_time       2.075643
fb_train_time      4.028175
fb_test_time       0.017317
dtype: float64


In [27]:
df_accuracy = df[['ff_train_acc', 'ff_test_acc', 'fb_train_acc', 'fb_test_acc']]
df_accuracy

Unnamed: 0,ff_train_acc,ff_test_acc,fb_train_acc,fb_test_acc
0,0.9487,0.9494,0.99143,0.9736
1,0.94768,0.948,0.99122,0.9739
2,0.94838,0.9494,0.9906,0.9733
3,0.948,0.9495,0.99101,0.9714
4,0.9492,0.951,0.9906,0.9719
5,0.94848,0.9473,0.99067,0.9716
6,0.94622,0.9472,0.99155,0.9754
7,0.94736,0.9493,0.99164,0.9722
8,0.94832,0.9476,0.99157,0.9742
9,0.946,0.9447,0.9912,0.9723


In [32]:
mean_values_acc = df_accuracy.mean()
print(mean_values_acc)

ff_train_acc    0.947893
ff_test_acc     0.948491
fb_train_acc    0.991043
fb_test_acc     0.972882
dtype: float64


In [30]:
# Perform the t-test and get the p-value on the train data
p = ttest_ind(df_accuracy['ff_train_acc'], df_accuracy['fb_train_acc'])
print("p-value:", p)

p-value: (-125.75562264122729, 1.8218429834622805e-30, 20.0)


The p-value in this case appears to be a tuple of three values: -125.75562264122729, 1.8218429834622805e-30, and 20.0. These values can be interpreted as follows:

-125.75562264122729: This value is the calculated test statistic for the hypothesis test, which measures the difference between the two sample means and provides a measure of how far apart the two means are from each other. A negative value of the test statistic indicates that the first mean is lower than the second mean, while a positive value indicates that the first mean is higher than the second mean.

1.8218429834622805e-30: This value is the p-value, which is the probability of obtaining a test statistic as extreme or more extreme than the one observed, assuming that the null hypothesis is true. A small p-value, such as the one observed in this case, suggests that the null hypothesis is unlikely to be true, and provides strong evidence for rejecting the null hypothesis in favor of the alternative hypothesis.

20.0: This value could be the degrees of freedom for the t-test, which is equal to the total number of observations in both samples minus the number of parameters estimated from the sample data. The degrees of freedom determine the shape of the t-distribution used to calculate the p-value.

In this case, the p-value of 1.8218429834622805e-30 is extremely small and suggests that there is strong evidence against the null hypothesis. The researcher can therefore reject the null hypothesis and conclude that there is a statistically significant difference between the two sample means.






In [28]:
# Perform the t-test and get the p-value on the test data
p = ttest_ind(df_accuracy['ff_test_acc'], df_accuracy['fb_test_acc'])
print("p-value:", p)

p-value: (-37.407938593481234, 5.472573092936269e-20, 20.0)


The p-value in this case appears to be a tuple of three values: -37.407938593481234, 5.472573092936269e-20, and 20.0. These values can be interpreted as follows:

-37.407938593481234: This value is the calculated test statistic for the hypothesis test, which measures the difference between the two sample means and provides a measure of how far apart the two means are from each other. A negative value of the test statistic indicates that the first mean is lower than the second mean, while a positive value indicates that the first mean is higher than the second mean.

5.472573092936269e-20: This value is the p-value, which is the probability of obtaining a test statistic as extreme or more extreme than the one observed, assuming that the null hypothesis is true. A small p-value, such as the one observed in this case, suggests that the null hypothesis is unlikely to be true, and provides strong evidence for rejecting the null hypothesis in favor of the alternative hypothesis.

20.0: This value could be the degrees of freedom for the t-test, which is equal to the total number of observations in both samples minus the number of parameters estimated from the sample data. The degrees of freedom determine the shape of the t-distribution used to calculate the p-value.

In this case, the p-value of 5.472573092936269e-20 is extremely small and suggests that there is strong evidence against the null hypothesis. The researcher can therefore reject the null hypothesis and conclude that there is a statistically significant difference between the two sample means.