### Description

This script loads and summarizes pre-processed reference datasets used in machine learning experiments.
It outputs grouped counts of class labels by **testing fold** and **device** for three different experiments:

* `"Bias"` - Models were trained on a biased dataset were species and imaging device were associated.

* `"TrainOlymp"` - Models were trained on the Olmypus Microscope and then tested on a test set and Phone images.

* `"TrainPhone"` - Models were trained on the Phone and then tested on a test set and Microscope images.

### Input

* Pickle files named:
  `database_reference_MLREADY_<experiment>.pkl`
  (where `<experiment>` is one of the options above)

* Files should be located in: `utils/references/`

* Requires a `config.py` file in `utils/` that defines the `ROOT` project path variable.

### Output

* Prints a table of label counts grouped by:

  * `TESTING FOLD`
  * `Device`

### Usage Notes

* Change the `experiment` variable in the script to switch between datasets.


In [2]:
import pandas as pd 
import os

# Set the working directory
os.chdir("..")
from utils.config import ROOT
os.chdir(ROOT)

In [3]:
experiment = "Bias" # "Bias", "TrainOlymp", "TrainPhone"
df_ref = pd.read_pickle(os.path.join("utils", "references", "database_reference_MLREADY_{}.pkl".format(experiment)))
df_ref.groupby(["TESTING FOLD", "Device"])["LABEL"].value_counts()

TESTING FOLD  Device                       LABEL     
-1            macrolens + iphone se        aegypti       386
                                           koreicus      368
              olympus sz61 + olympus dp23  japonicus     380
                                           albopictus    373
 0            macrolens + iphone se        japonicus      79
                                           albopictus     76
              olympus sz61 + olympus dp23  aegypti        80
                                           koreicus       76
 1            macrolens + iphone se        japonicus      77
                                           albopictus     76
              olympus sz61 + olympus dp23  koreicus       77
                                           aegypti        77
 2            macrolens + iphone se        albopictus     75
                                           japonicus      75
              olympus sz61 + olympus dp23  aegypti        78
                               

In [4]:
experiment = "TrainOlymp" # "Bias", "TrainOlymp", "TrainPhone"
df_ref = pd.read_pickle(os.path.join("utils", "references", "database_reference_MLREADY_{}.pkl".format(experiment)))
df_ref.groupby(["TESTING FOLD", "Device"])["LABEL"].value_counts()

TESTING FOLD  Device                       LABEL     
-1            macrolens + iphone se        aegypti       386
                                           japonicus     378
                                           albopictus    374
                                           koreicus      368
 0            olympus sz61 + olympus dp23  aegypti        80
                                           japonicus      77
                                           albopictus     77
                                           koreicus       76
 1            olympus sz61 + olympus dp23  japonicus      78
                                           aegypti        77
                                           koreicus       77
                                           albopictus     76
 2            olympus sz61 + olympus dp23  aegypti        78
                                           albopictus     76
                                           japonicus      75
                               

In [5]:
experiment = "TrainPhone" # "Bias", "TrainOlymp", "TrainPhone"
df_ref = pd.read_pickle(os.path.join("utils", "references", "database_reference_MLREADY_{}.pkl".format(experiment)))
df_ref.groupby(["TESTING FOLD", "Device"])["LABEL"].value_counts()

TESTING FOLD  Device                       LABEL     
-1            olympus sz61 + olympus dp23  aegypti       394
                                           japonicus     380
                                           koreicus      374
                                           albopictus    373
 0            macrolens + iphone se        aegypti        79
                                           japonicus      79
                                           koreicus       77
                                           albopictus     76
 1            macrolens + iphone se        japonicus      77
                                           koreicus       77
                                           albopictus     76
                                           aegypti        75
 2            macrolens + iphone se        aegypti        77
                                           albopictus     75
                                           japonicus      75
                               