# Lab 5.04 - Two-sample t-test

In [3]:
# Package imports
import numpy as np                                  # "Scientific computing"
import scipy.stats as stats                         # Statistical tests

import pandas as pd                                 # Dataframe
import matplotlib.pyplot as plt                     # Basic visualisation
from statsmodels.graphics.mosaicplot import mosaic  # Mosaic plot
import seaborn as sns                               # Advanced dataviz

## Exercise 4 - Android Persistence libraries performance comparison

We analyzed the results of performance measurements for Android persistence libraries (Akin, 2016). Experiments were performed for different combinations of *DataSize* (Small, Medium, Large) and *PersistenceType* (GreenDAO, Realm, SharedPreferences, SQLite). For each data size, we were able to determine which persistence type yielded the best results.

Now we will verify if the best persistence type at first glance is also *significantly* better than the competition.

Specifically: Using a two-sample test for each data size, verify that the mean of the best persistence type is significantly lower than the mean of the second best and the worst scoring type.

Can we maintain the conclusion that for a given data size, one persistence type is best, i.e. is significantly better than any other persistence type?

In [4]:
# 1. Data inladen
df = pd.read_csv('https://raw.githubusercontent.com/HoGentTIN/dsai-labs/refs/heads/main/data/android_persistence_cpu.csv', sep=';')
df.head()

Unnamed: 0,Time,PersistenceType,DataSize
0,1.81,Sharedpreferences,Small
1,1.35,Sharedpreferences,Small
2,1.84,Sharedpreferences,Small
3,1.54,Sharedpreferences,Small
4,1.81,Sharedpreferences,Small


In [5]:


# 2. Functie om beste en tweede beste te vinden per DataSize
def best_types(data, size):
    subset = data[data['DataSize'] == size]
    means = subset.groupby('PersistenceType')['Time'].mean().sort_values()
    best = means.index[0]
    second = means.index[1]
    return best, second

# 3. Vergelijkingen en t-testen per DataSize
for size in ['Small', 'Medium', 'Large']:
    best, second = best_types(df, size)
    best_times = df[(df['DataSize'] == size) & (df['PersistenceType'] == best)]['Time']
    second_times = df[(df['DataSize'] == size) & (df['PersistenceType'] == second)]['Time']
    t_stat, p_val = stats.ttest_ind(best_times, second_times, alternative="less", equal_var=False)
    print(f"{size}: {best} vs {second} -> p-value = {p_val:.7f}")

# 4. Optioneel: tabel tonen
import warnings
warnings.filterwarnings('ignore')
results = []
for size in ['Small', 'Medium', 'Large']:
    best, second = best_types(df, size)
    best_times = df[(df['DataSize'] == size) & (df['PersistenceType'] == best)]['Time']
    second_times = df[(df['DataSize'] == size) & (df['PersistenceType'] == second)]['Time']
    t_stat, p_val = stats.ttest_ind(best_times, second_times, alternative="less", equal_var=False)
    results.append({'Data Size': size, 'Best': best, '2nd Best': second, 'p-value': round(p_val, 7)})
pd.DataFrame(results)

Small: Realm vs Sharedpreferences -> p-value = 0.1699237
Medium: Realm vs GreenDAO -> p-value = 0.0002506
Large: Realm vs SQLLite -> p-value = 0.0016999


Unnamed: 0,Data Size,Best,2nd Best,p-value
0,Small,Realm,Sharedpreferences,0.169924
1,Medium,Realm,GreenDAO,0.000251
2,Large,Realm,SQLLite,0.0017


### Answers

The table below provides an overview of the best and second best persistence type for each data size (based on the sample mean).

| Data Size | Best  | 2nd Best          | p-value   |
| :-------- | :---- | :---------------- | :-------- |
| Small     | Realm | SharedPreferences | 0.1699    |
| Medium    | Realm | GreenDAO          | 0.0002506 |
| Large     | Realm | SQLite            | 0.0017    |

The conclusion of Akin (2016), which states that Realm is the most efficient persistence type, still holds, but for the small data sets the difference is not significant.

Note that we have not explicitly selected a specific significance level in advance. However, for $\alpha$ = 0.1, 0.05 or even 0.01, the same conclusion can be drawn.