# LINDDUN

In the following, we discuss the main results of applying the LINDDUN methodology  
[https://linddun.org/pro/](https://linddun.org/pro/) to conduct a privacy threat analysis on both solutions.

LINDDUN categorizes threats into seven threat types:  
- **Linking**: Associating data items or user actions to learn more about an individual or group.  
- **Identifying**: Learning the identity of an individual through leaks, deduction, or inference.  
- **Non-repudiation**: Being able to attribute a claim to an individual.  
- **Detecting**: Deducing the involvement of an individual through observation.  
- **Data disclosure**: Excessively collecting, storing, processing, or sharing personal data.  
- **Unawareness & Unintervenability**: Insufficiently informing, involving, or empowering individuals in the processing of their personal data.  
- **Non-compliance**: Deviating from security and data management best practices, standards, and legislation.  

In our analysis, we assume that the pseudonymizer:  
1. Sufficiently informs its users about how their personal data is being processed; and  
2. Is compliant with GDPR legislation.  

Therefore, we consider both solutions to yield no threats to the types **Unawareness & Unintervenability** and **Non-Compliance**.  


In [1]:
import pandas as pd
import os

In [2]:
fpath = './LINDDUN-analysis.xlsx'
xlf = pd.io.excel.ExcelFile(fpath)
print(xlf.sheet_names)

['Flow', 'LINDDUN-GO-CARDS', 'LINDDUN-GO-HOTSPOTS', 'ANALYSIS-GO-v1', 'ANALYSIS-GO-v2', 'ANALYSIS-PRO', 'LINDDUN-Threat-trees']


In [3]:
sheet_name = 'ANALYSIS-PRO'
df = pd.read_excel(xlf, sheet_name=sheet_name)
df.Phase.value_counts(dropna=False)
# Filter phase: Diploma verification
df = df[df.Phase == 'Diploma verification'].drop('Phase',axis=1)
df.dropna(axis=0, how='all', inplace=True)
print(df.shape)
df

(52, 11)


Unnamed: 0,SOLUTION,S_ID,DF_ID,D_ID,T_ID,Analysis,Source,Destination,Threat risk?,Mitigation,Impact
0,webid-webid,P1.2,-,-,L.1.1,"P1.2 leases pseudonymous WebIDs.\nHowever, the...",,,1.0,,
1,webid-didkey,P1.2,-,-,L.1.1,Key DIDs consist of the scheme-identifier (i.e...,,,0.0,,
2,webid-webid,P1.2,-,-,L.1.2,Attributes disclosed in the diploma verificati...,,,0.0,,
3,webid-didkey,P1.2,-,-,L.1.2,Attributes disclosed in the diploma verificati...,,,0.0,,
4,webid-webid,P1.2,-,-,I.1.1,"The system processes diploma data, and contain...",,,1.0,,
5,webid-webid,-,-,-,I.1.2,No additional metadata is being processed.,,,0.0,,
6,webid-didkey,-,-,-,I.1.1,"The system processes diploma data, and contain...",,,1.0,,
7,webid-didkey,-,-,-,I.1.2,No additional metadata is being processed.,,,0.0,,
8,webid-webid,-,-,-,I.2.1.1,The pseudonymizer allows identifying informati...,,,0.0,,
9,webid-didkey,-,-,-,I.2.1.1,The pseudonymizer allows identifying informati...,,,0.0,,


In [4]:
df.groupby(['SOLUTION']).size()

SOLUTION
webid-didkey    26
webid-webid     26
dtype: int64

In [5]:
df.groupby(['SOLUTION'])['Threat risk?'].sum()

SOLUTION
webid-didkey     3.0
webid-webid     10.0
Name: Threat risk?, dtype: float64

Threats can be considered from 3 different perspectives of the triple $(i, e_{i,j}, j)$. \
Where $i$ and $j$ represent the source and destination entities, and $e_{i,j}$ the edge between $i$ and $j$.

In [6]:
g = df.groupby(['SOLUTION','T_ID',])
# Count: represents the nr of threat evaluations for a particular threat.
# Sum: the number of evaluations of which a threat was identified.
A = g['Threat risk?'].agg(['count','sum']).astype(int)
#A

In [7]:
# B 
B = A.unstack('SOLUTION')
# Assert that there's the same number of threat evaluations per solution
assert (B['count'].diff(axis=1).iloc(axis=1)[-1] == 0).all()
print('B:')
display(B)
# If so, we can focus on the nr. of threats per solution
# C
print('C:')
C = B['sum'] 
# Let's only consider threats for which a solution was marked.
C = C[C.sum(axis=1) > 0]
C

B:


Unnamed: 0_level_0,count,count,sum,sum
SOLUTION,webid-didkey,webid-webid,webid-didkey,webid-webid
T_ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
D.1,1,1,0,1
D.2,1,1,0,1
D.3,1,1,0,1
DD.1.1,1,1,0,0
DD.1.2,1,1,0,0
DD.1.3,1,1,0,0
DD.2.1,1,1,0,0
DD.2.2,1,1,0,0
DD.2.3,1,1,0,0
DD.3.1,2,2,0,1


C:


SOLUTION,webid-didkey,webid-webid
T_ID,Unnamed: 1_level_1,Unnamed: 2_level_1
D.1,0,1
D.2,0,1
D.3,0,1
DD.3.1,0,1
DD.3.3,0,1
I.1.1,1,1
I.2.2,1,1
I.2.3,0,1
L.1.1,0,1
Nr.1,1,1


In [8]:
nr_threat_ids = C.index.unique().size
print('Nr. of threats: ', nr_threat_ids)

nr_threats_per_solution = C.sum()
print('\nNr. of identified threats per solution')
print(nr_threats_per_solution)

Nr. of threats:  10

Nr. of identified threats per solution
SOLUTION
webid-didkey     3
webid-webid     10
dtype: int64


In [9]:
s1, s2 = df.SOLUTION.unique().astype(list).tolist()
print(f's1: {s1}, s2: {s2}')

s1: webid-webid, s2: webid-didkey


In [10]:
# D
D_d = "Where both solutions pose a threat."
D = C[C == 1].dropna().astype(int)
print(D_d)
D

Where both solutions pose a threat.


SOLUTION,webid-didkey,webid-webid
T_ID,Unnamed: 1_level_1,Unnamed: 2_level_1
I.1.1,1,1
I.2.2,1,1
Nr.1,1,1


In [11]:
# E
E_d = f"Where {s1} has no threat, but {s2} does."
E = (C[s1] == 0) & (C[s2] == 1)
print(E_d)
print('Sum: ', E.sum())
if E.sum() > 0:
    print(E)

Where webid-webid has no threat, but webid-didkey does.
Sum:  0


In [12]:
# F
F_d = f"Where {s2} has no threat, but {s1} does."
F = (C[s2] == 0) & (C[s1] == 1)
print(F_d)
print('Sum: ', F.sum())
if F.sum() > 0:
    print(F)

Where webid-didkey has no threat, but webid-webid does.
Sum:  7
T_ID
D.1        True
D.2        True
D.3        True
DD.3.1     True
DD.3.3     True
I.1.1     False
I.2.2     False
I.2.3      True
L.1.1      True
Nr.1      False
dtype: bool
