# Preliminary Analysis

This notebook contains a preliminary performance analysis of the two proposed pseudonymization solutions.

The performance data was generated by executing the `uc-runner.sh` for both solutions (i.e., `webid-webid` and `webid-didkey`)
using the following parameters:
- Nr. of iterations: `-n=100` per solution.
- Caching behavior of the documentloader: `-d=0`, which means that the document loaders apply no caching at all. The caching configurations can be found in `src/profiling/config.ts`.


Imports.

In [1]:
import pandas as pd
from glob import glob
import os
import json
import matplotlib.pyplot as plt
import seaborn as sns

Helpers.

In [2]:
def read_report(path: str) -> pd.DataFrame:
    data = json.load(open(path, 'r'))
    df = pd.DataFrame(data['records'])
    df['path'] = path
    return df

def add_filepath_information(df: pd.DataFrame) -> pd.DataFrame:
    df_out = df.copy()
    # basename
    df_out['basename'] = df_out.path.apply(os.path.basename)
    # parent dir
    df_out['parent_dir'] = df_out.path.apply(os.path.dirname).apply(os.path.basename)
    # grandparent dir
    df_out['grandparent_dir'] = df_out.path.apply(os.path.dirname).apply(os.path.dirname).apply(os.path.basename)
    return df_out

Global variables.

In [3]:
dir_experiment = 'n-100-dclo-0'
solution_timestamps = {
    '1732405225570': 'webid-didkey',
    '1732405664773': 'webid-webid'
}
fpaths_reports = sorted(glob(os.path.join(dir_experiment, '*', '*.json')))
len(fpaths_reports)

200

Creating a DataFrame from the performance reports.

In [4]:
dfs = pd.concat(map(read_report,  fpaths_reports))
dfs['experiment_tag'] = dfs.path.str.split('/').apply(lambda x: str(x[0]))
dfs['solution_tag'] = dfs.path.str.split('/').apply(lambda x: str(x[1])).apply(solution_timestamps.__getitem__)

print(dfs.shape)
dfs.head(3)

(2400, 11)


Unnamed: 0,index,name,start,end,delta,output,tag,className,path,experiment_tag,solution_tag
0,0,createDiplomaCredential,1732405227837,1732405227837,0,{'@context': ['https://www.w3.org/2018/credent...,university,SolidVCActor,n-100-dclo-0/1732405225570/multiactor-report-1...,n-100-dclo-0,webid-didkey
1,1,signDiplomaCredential,1732405227837,1732405227880,43,{'@context': ['https://www.w3.org/2018/credent...,university,SolidVCActor,n-100-dclo-0/1732405225570/multiactor-report-1...,n-100-dclo-0,webid-didkey
2,2,createIdentityCredential,1732405227880,1732405227880,0,{'@context': ['https://www.w3.org/2018/credent...,government,SolidVCActor,n-100-dclo-0/1732405225570/multiactor-report-1...,n-100-dclo-0,webid-didkey


Compute the nr. of records per solution, and ensure that the same amount of records have been collected for both solutions.

In [5]:
n_records_per_solution = dfs.solution_tag.value_counts(dropna=False).to_dict()
print('Nr. of records per solution', n_records_per_solution)
# Ensure that the same amount of records were registered for both solutions
assert n_records_per_solution['webid-webid'] == n_records_per_solution['webid-didkey']

Nr. of records per solution {'webid-didkey': 1200, 'webid-webid': 1200}


Compute descriptive statistics per experiment (`experiment_tag`),
solution (`solution_tag`),
actor (`tag`),
step name (`name`).

In [6]:
df_agg = dfs.groupby(['experiment_tag','solution_tag','tag','name',])['delta'].agg(['mean','std','var','count'])
df_agg = df_agg.unstack('solution_tag').swaplevel(0,1,axis=1).sort_index(axis=1)
df_agg = df_agg.round(2)
df_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,solution_tag,webid-didkey,webid-didkey,webid-didkey,webid-didkey,webid-webid,webid-webid,webid-webid,webid-webid
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count,mean,std,var,count,mean,std,var
experiment_tag,tag,name,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
n-100-dclo-0,alice,createPresentation01,100,0.0,0.0,0.0,100,0.05,0.22,0.05
n-100-dclo-0,alice,createPresentation02,100,43.27,20.94,438.38,100,42.78,10.0,99.93
n-100-dclo-0,alice,deriveDiplomaCredential,100,1863.46,103.97,10809.24,100,1865.78,97.32,9471.43
n-100-dclo-0,alice,deriveIdentityCredential,100,1877.86,99.24,9849.35,100,1892.26,118.49,14040.48
n-100-dclo-0,alice,signPresentation01,100,33.28,8.11,65.76,100,33.09,9.42,88.65
n-100-dclo-0,alice,signPresentation02,100,42.24,10.13,102.57,100,40.93,3.56,12.69
n-100-dclo-0,government,createIdentityCredential,100,0.0,0.0,0.0,100,0.01,0.1,0.01
n-100-dclo-0,government,signIdentityCredential,100,17.49,2.54,6.45,100,17.15,1.7,2.88
n-100-dclo-0,recruiter,verifyPresentation01,100,44.35,5.11,26.11,100,91.15,13.86,192.03
n-100-dclo-0,recruiter,verifyPresentation02,100,122.37,42.89,1839.49,100,118.3,10.86,118.01


In [7]:
s1 = 'webid-webid'
s2 = 'webid-didkey'

actor = 'recruiter'
step_name = 'verifyPresentation01'
metric = 'mean'

In [8]:
df_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,solution_tag,webid-didkey,webid-didkey,webid-didkey,webid-didkey,webid-webid,webid-webid,webid-webid,webid-webid
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,count,mean,std,var,count,mean,std,var
experiment_tag,tag,name,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
n-100-dclo-0,alice,createPresentation01,100,0.0,0.0,0.0,100,0.05,0.22,0.05
n-100-dclo-0,alice,createPresentation02,100,43.27,20.94,438.38,100,42.78,10.0,99.93
n-100-dclo-0,alice,deriveDiplomaCredential,100,1863.46,103.97,10809.24,100,1865.78,97.32,9471.43
n-100-dclo-0,alice,deriveIdentityCredential,100,1877.86,99.24,9849.35,100,1892.26,118.49,14040.48
n-100-dclo-0,alice,signPresentation01,100,33.28,8.11,65.76,100,33.09,9.42,88.65
n-100-dclo-0,alice,signPresentation02,100,42.24,10.13,102.57,100,40.93,3.56,12.69
n-100-dclo-0,government,createIdentityCredential,100,0.0,0.0,0.0,100,0.01,0.1,0.01
n-100-dclo-0,government,signIdentityCredential,100,17.49,2.54,6.45,100,17.15,1.7,2.88
n-100-dclo-0,recruiter,verifyPresentation01,100,44.35,5.11,26.11,100,91.15,13.86,192.03
n-100-dclo-0,recruiter,verifyPresentation02,100,122.37,42.89,1839.49,100,118.3,10.86,118.01


In [9]:
df_agg_mean = df_agg.loc(axis=1)[:, 'mean'].loc[dir_experiment, ]
df_agg_mean

Unnamed: 0_level_0,solution_tag,webid-didkey,webid-webid
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,mean
tag,name,Unnamed: 2_level_2,Unnamed: 3_level_2
alice,createPresentation01,0.0,0.05
alice,createPresentation02,43.27,42.78
alice,deriveDiplomaCredential,1863.46,1865.78
alice,deriveIdentityCredential,1877.86,1892.26
alice,signPresentation01,33.28,33.09
alice,signPresentation02,42.24,40.93
government,createIdentityCredential,0.0,0.01
government,signIdentityCredential,17.49,17.15
recruiter,verifyPresentation01,44.35,91.15
recruiter,verifyPresentation02,122.37,118.3
