# Lab 5.04 - Two-sample t-test

In [22]:
# Package imports
import numpy as np                                  # "Scientific computing"
import scipy.stats as stats                         # Statistical tests

import pandas as pd                                 # Dataframe
import matplotlib.pyplot as plt                     # Basic visualisation
from statsmodels.graphics.mosaicplot import mosaic  # Mosaic plot
import seaborn as sns                               # Advanced dataviz

## Exercise 4 - Android Persistence libraries performance comparison

We analyzed the results of performance measurements for Android persistence libraries (Akin, 2016). Experiments were performed for different combinations of *DataSize* (Small, Medium, Large) and *PersistenceType* (GreenDAO, Realm, SharedPreferences, SQLite). For each data size, we were able to determine which persistence type yielded the best results.

Now we will verify if the best persistence type at first glance is also *significantly* better than the competition.

Specifically: Using a two-sample test for each data size, verify that the mean of the best persistence type is significantly lower than the mean of the second best and the worst scoring type.

Can we maintain the conclusion that for a given data size, one persistence type is best, i.e. is significantly better than any other persistence type?

In [23]:
androids = pd.read_csv('https://raw.githubusercontent.com/JelleLeus1996/data_science/main/data/android_persistence_cpu.csv', sep=';')
androids.head()

Unnamed: 0,Time,PersistenceType,DataSize
0,1.81,Sharedpreferences,Small
1,1.35,Sharedpreferences,Small
2,1.84,Sharedpreferences,Small
3,1.54,Sharedpreferences,Small
4,1.81,Sharedpreferences,Small


In [24]:
androids.PersistenceType.unique()

array(['Sharedpreferences', 'GreenDAO', 'SQLLite', 'Realm'], dtype=object)

In [31]:
shared_mean_small = androids[(androids.PersistenceType == 'Sharedpreferences') & (androids.DataSize == 'Small')].Time.mean()
greenDAO_mean_small = androids[(androids.PersistenceType == 'GreenDAO')& (androids.DataSize == 'Small')].Time.mean()
SQLLite_means_small = androids[(androids.PersistenceType == 'SQLLite')& (androids.DataSize == 'Small')].Time.mean()
realm_mean_small = androids[(androids.PersistenceType == 'Realm')& (androids.DataSize == 'Small')].Time.mean()
print(f'mean of Sharedpreferences is : {shared_mean_small}')
print(f'mean of GreenDAO is : {greenDAO_mean_small}')
print(f'mean of SQLLite is : {SQLLite_means_small}')
print(f'mean of Realm is : {realm_mean_small}')
small_mean = [shared_mean_small,greenDAO_mean_small,SQLLite_means_small,realm_mean_small]
print(min(small_mean))
test_statistiek, p_value = stats.ttest_ind(a=androids[(androids.PersistenceType == 'Sharedpreferences')& (androids.DataSize == 'Small')].Time,
                                           b=androids[(androids.PersistenceType == 'Realm')& (androids.DataSize == 'Small')].Time, alternative="less")
print(p_value)
shared_mean_medium = androids[(androids.PersistenceType == 'Sharedpreferences') & (androids.DataSize == 'Medium')].Time.mean()
greenDAO_mean_medium = androids[(androids.PersistenceType == 'GreenDAO')& (androids.DataSize == 'Medium')].Time.mean()
SQLLite_means_medium = androids[(androids.PersistenceType == 'SQLLite')& (androids.DataSize == 'Medium')].Time.mean()
realm_mean_medium = androids[(androids.PersistenceType == 'Realm')& (androids.DataSize == 'Medium')].Time.mean()
print(f'mean of Sharedpreferences is : {shared_mean_medium}')
print(f'mean of GreenDAO is : {greenDAO_mean_medium}')
print(f'mean of SQLLite is : {SQLLite_means_medium}')
print(f'mean of Realm is : {realm_mean_medium}')
medium_mean = [shared_mean_medium,greenDAO_mean_medium,SQLLite_means_medium,realm_mean_medium]
filtered_medium_list = [x for x in medium_mean if not pd.isna(x)]
print(min(filtered_medium_list))
test_statistiek, p_value = stats.ttest_ind(a=androids[(androids.PersistenceType == 'GreenDAO')& (androids.DataSize == 'Medium')].Time,
                                           b=androids[(androids.PersistenceType == 'Realm')& (androids.DataSize == 'Medium')].Time, alternative="less")
print(p_value)

shared_mean = androids[(androids.PersistenceType == 'Sharedpreferences') & (androids.DataSize == 'Large')].Time.mean()
greenDAO_mean = androids[(androids.PersistenceType == 'GreenDAO')& (androids.DataSize == 'Large')].Time.mean()
SQLLite_mean = androids[(androids.PersistenceType == 'SQLLite')& (androids.DataSize == 'Large')].Time.mean()
realm_mean = androids[(androids.PersistenceType == 'Realm')& (androids.DataSize == 'Large')].Time.mean()
print(f'mean of Sharedpreferences is : {shared_mean}')
print(f'mean of GreenDAO is : {greenDAO_mean}')
print(f'mean of SQLLite is : {SQLLite_mean}')
print(f'mean of Realm is : {realm_mean}')
large_mean = [shared_mean,greenDAO_mean,SQLLite_mean,realm_mean]
filtered_large_list = [x for x in large_mean if not pd.isna(x)]
print(min(filtered_large_list))

mean of Sharedpreferences is : 1.6736666666666666
mean of GreenDAO is : 1.893666666666667
mean of SQLLite is : 1.799
mean of Realm is : 1.5989999999999998
1.5989999999999998
0.8300958070926483
mean of Sharedpreferences is : nan
mean of GreenDAO is : 7.4540000000000015
mean of SQLLite is : 7.794
mean of Realm is : 5.818000000000001
5.818000000000001
0.9997745724280098
mean of Sharedpreferences is : nan
mean of GreenDAO is : 12.110333333333333
mean of SQLLite is : 11.514999999999995
mean of Realm is : 10.651666666666667
10.651666666666667


### Answers

The table below provides an overview of the best and second best persistence type for each data size (based on the sample mean).

| Data Size | Best  | 2nd Best          | p-value   |
| :-------- | :---- | :---------------- | :-------- |
| Small     | Realm | SharedPreferences | 0.1699    |
| Medium    | Realm | GreenDAO          | 0.0002506 |
| Large     | Realm | SQLite            | 0.0017    |

The conclusion of Akin (2016), which states that Realm is the most efficient persistence type, still holds, but for the small data sets the difference is not significant.

Note that we have not explicitly selected a specific significance level in advance. However, for $\alpha$ = 0.1, 0.05 or even 0.01, the same conclusion can be drawn.