In [107]:
# data handling
import pandas as pd

# statistic packages
from scipy import stats
from statistics import mean, stdev
from math import sqrt
from cliffs_delta import cliffs_delta

### Preparations

First, define some constants. This includes a significance level of $\alpha = 0.05$, which defines at what p-value we reject hypothesis tests.

In [108]:
alpha = 0.05
treatments = ['MAUI', 'native']

task = "file"
metric = "CPU"
column = f'{task}_{metric}_high'

Next, load the data containing the observations.

In [109]:
data = {}
for t in treatments:
    data[t] = pd.read_excel(f'../data/observations/{t}.xlsx', sheet_name="output")

### Hypothesis testing

#### Test for Normality 

Now, perform the Shapiro-Wilk test to determine, whether *both* samples are normally distributed. If $p \lt \alpha$ the null-hypothesis of normal distribution is rejected, i.e., the Shapiro-Wilk test does not assume normal distribution of the data.

In [110]:
p_normality = []

for t in treatments:
    _, p = stats.shapiro(data[t][column])
    print(f'The {t} data is {"*not* " if p<alpha else ""}normally distributed')

    p_normality.append(p)

The MAUI data is *not* normally distributed
The native data is normally distributed


According to the result of the Shapiro-Wilk test, determine whether to use a parametric (unpaired T-test) or non-parametric (Mann-Whitney U test) hypothesis test.

In [111]:
parametric_data: bool = (p_normality[0] > alpha and p_normality[1] > alpha)

#### Null-hypothesis Test

Select the two samples for easier access.

In [112]:
s0 = data[treatments[0]][column]
s1 = data[treatments[1]][column]

Finally, perform the appropriate test and determine whether the two samples are significantly different.

In [113]:
p = 0

if parametric_data:
    _, p = stats.ttest_ind(s0, s1)
else:
    _, p = stats.mannwhitneyu(s0, s1)

print(f'According to the hypothesis test ({"Unpaired T-test" if parametric_data else "Mann-Whitney U test"}), there is {"a" if p<alpha else "no"} statistically significant difference between the {metric} usage in the \"{task}\" task (p-value: {p:.4}).')

According to the hypothesis test (Mann-Whitney U test), there is no statistically significant difference between the CPU usage in the "file" task (p-value: 0.353).


### Effect size

Additionally, calculate the effect size of the difference. In case of parametric data, calculate *Cohen's d*, otherwise use *Cliff's delta*.

In [114]:
delta = 0

if parametric_data:
    delta = (mean(s0) - mean(s1)) / (sqrt((stdev(s0) ** 2 + stdev(s1) ** 2) / 2))
else:
    delta, _ = cliffs_delta(s0, s1)

print(f'The difference has an effect size of {delta:.2} ({"Cohens d" if parametric_data else "Cliffs delta"}).')

The difference has an effect size of -0.17 (Cliffs delta).
