# Lab 5.04 - Two-sample t-test

In [1]:
# Package imports
import numpy as np                                  # "Scientific computing"
import scipy.stats as stats                         # Statistical tests

import pandas as pd                                 # Dataframe
import matplotlib.pyplot as plt                     # Basic visualisation
from statsmodels.graphics.mosaicplot import mosaic  # Mosaic plot
import seaborn as sns                               # Advanced dataviz

## Exercise 4 - Android Persistence libraries performance comparison

We analyzed the results of performance measurements for Android persistence libraries (Akin, 2016). Experiments were performed for different combinations of *DataSize* (Small, Medium, Large) and *PersistenceType* (GreenDAO, Realm, SharedPreferences, SQLite). For each data size, we were able to determine which persistence type yielded the best results.

Now we will verify if the best persistence type at first glance is also *significantly* better than the competition.

Specifically: Using a two-sample test for each data size, verify that the mean of the best persistence type is significantly lower than the mean of the second best and the worst scoring type.

Can we maintain the conclusion that for a given data size, one persistence type is best, i.e. is significantly better than any other persistence type?

In [2]:
android = pd.read_csv(
    "https://raw.githubusercontent.com/HoGentTIN/dsai-labs/main/data/android_persistence_cpu.csv", 
    delimiter=';'
)
android.head()

Unnamed: 0,Time,PersistenceType,DataSize
0,1.81,Sharedpreferences,Small
1,1.35,Sharedpreferences,Small
2,1.84,Sharedpreferences,Small
3,1.54,Sharedpreferences,Small
4,1.81,Sharedpreferences,Small


In [3]:
print(android['DataSize'].unique())
print(android['PersistenceType'].unique())


['Small' 'Medium' 'Large']
['Sharedpreferences' 'GreenDAO' 'SQLLite' 'Realm']


In [None]:
from scipy.stats import ttest_ind
def compare_best_vs_second(data, size):
    subset = data[data['DataSize'] == size]

    # Gemiddelde tijden per persistence type
    means = subset.groupby('PersistenceType')['Time'].mean().sort_values()
    # De means bevat zowel de index (namen) als de Time,
    # maar stel voor dat wij willen dat means enkel Time bevat dan doen wij het via het volgende: 
    # means_values = means.values

    best = means.index[0]
    second_best = means.index[1]
    
    # Selecteer Time-scores
    best_times = subset[subset['PersistenceType'] == best]['Time']
    second_times = subset[subset['PersistenceType'] == second_best]['Time']

    # Onafhankelijke t-test
    t_stat, p_value = ttest_ind(best_times, second_times, equal_var=False)
    
    return {
        'Data Size': size,
        'Best': best,
        '2nd Best': second_best,
        'p-value': round(p_value, 6)  # afronden voor mooiere tabel
    }

# Resultaten opslaan
results = []
for size in ['Small', 'Medium', 'Large']:
    results.append(compare_best_vs_second(android, size))

# Omzetten naar dataframe om overzichtelijk te tonen
result_df = pd.DataFrame(results)
print(result_df)

  Data Size   Best           2nd Best   p-value
0     Small  Realm  Sharedpreferences  0.339847
1    Medium  Realm           GreenDAO  0.000501
2     Large  Realm            SQLLite  0.003400


### Answers

The table below provides an overview of the best and second best persistence type for each data size (based on the sample mean).

| Data Size | Best  | 2nd Best          | p-value   |
| :-------- | :---- | :---------------- | :-------- |
| Small     | Realm | SharedPreferences | 0.1699    |
| Medium    | Realm | GreenDAO          | 0.0002506 |
| Large     | Realm | SQLite            | 0.0017    |

The conclusion of Akin (2016), which states that Realm is the most efficient persistence type, still holds, but for the small data sets the difference is not significant.

Note that we have not explicitly selected a specific significance level in advance. However, for $\alpha$ = 0.1, 0.05 or even 0.01, the same conclusion can be drawn.