# Generate Testing Subsets

In [63]:
# TODO: find a good balance between the number of combinations and the generalization

To test the performance of the proposed methods at different sizes of the population of devices, we adopt the following strategy: for each target population size $p = 1..P − 1$, where $P$ is the maximum number of devices in the original dataset, we produce $d = 10$ different subsets by selecting p devices at random.

## Libraries and Configurations

Import configuration files

In [64]:
from configparser import ConfigParser

config = ConfigParser()
config.read("../config.ini")

['../config.ini']

Import **data libraries**

In [65]:
import pandas as pd

Import **other libraries**

In [66]:
from rich.progress import Progress
from rich import traceback

traceback.install()

<bound method InteractiveShell.excepthook of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x108848ed0>>

In [67]:
from itertools import combinations

import random

random.seed(42)

Custom helper scripts

In [68]:
%cd ..
from scripts import plotHelper, encodingHelper
%cd data_exploration_cleaning

/Users/bacci/Library/CloudStorage/SynologyDrive-giovanni/Research 🌱/Repositories/COMPACT/notebooks
/Users/bacci/Library/CloudStorage/SynologyDrive-giovanni/Research 🌱/Repositories/COMPACT/notebooks/data_exploration_cleaning


## Import Data

In [69]:
# Combined dataframe
balanced_df_csv = config["DEFAULT"]["interim_path"] + "balanced_df.csv"

In [70]:
df = pd.read_csv(balanced_df_csv, index_col=0)
df["Timestamp"] = pd.to_datetime(df["Timestamp"])

We are only interested in the labels, in order to generate

In [71]:
unique_labels = df["Label"].unique()

In [72]:
for label in unique_labels:
    print(label, " ", end="")

iPhone12Pro_C  S21Ultra_M  iPhone11_M  iPhoneXR_A  iPhone7_F  iPhone12_M  OppoFindX3Neo_A  iPhone11_F  iPhoneXR_L  iPhone11_B  iPhone11_C  OnePlusNord_O  HuaweiP10_Q  iPhoneXR_U  GooglePixel3A_V  XiaomiRedmiNote9S_T  SamsungJ6_K  iPhone7_X  XiaomiA2_E  SamsungM31_A  iPhone12_W  GooglePixel3A_L  SamsungS7_I  HuaweiL21_D  iPhoneXSMax_M  iPhone6_N  SamsungS6_H  HuaweiHonor9_R  HuaweiP20_G  SamsungS4_C  XiaomiRedmi4_B  XiaomiRedmi5_J  XiaomiRedmiNote7_S  

In [73]:
max_devices = len(unique_labels)
print("Number of devices in the dataset:", max_devices)

Number of devices in the dataset: 33


## Create Random Subsets

In [74]:
random_combinations = []

Number of subset for each cardinality (until it is possible)

In [75]:
n_subsets = 10

In [76]:
for r in range(2, max_devices + 1):
    for subset in range(n_subsets):
        random_combinations.append(random.sample(df["Label"].unique().tolist(), r))

Removing subsets that contain the same devices, and are just permutations of each other

In [77]:
unique_combinations = []
for subset in random_combinations:
    subset.sort()
    if subset not in unique_combinations:
        unique_combinations.append(subset)

unique_combinations.sort(key=len)

Number of unique combinations per cardinality

In [78]:
subset_counts = []
for i in range(2, max_devices + 1):
    count = len([x for x in unique_combinations if len(x) == i])
    subset_counts.append({"Cardinality": i, "Count": count})

df_subset_counts = pd.DataFrame(subset_counts)
df_subset_counts

Unnamed: 0,Cardinality,Count
0,2,10
1,3,10
2,4,10
3,5,10
4,6,10
5,7,10
6,8,10
7,9,10
8,10,10
9,11,10


Export combinations to file

In [79]:
# export unique_combinations to csv
reports_path = config["DEFAULT"]["reports_path"]

df_unique_combinations = pd.DataFrame(unique_combinations)
df_unique_combinations.to_csv(
    reports_path + "/CSV/subset_combinations/unique_combinations.csv", index=False
)