# TODO - ROC curves, minimal equal error rate plot, evaulation of ROC over time (overlay with average?)
# Implement gridsearch to optimise the model? (Use validation set of data)
Working on this problem: https://www.cs.cmu.edu/~keystroke/.
Supporting paper: http://www.cs.cmu.edu/~keystroke/KillourhyMaxion09.pdf

Data comes from 51 subjects typing ".tie5Roanl" 400 times across multiple sessions.

Our goal is to develop a model which has a minimal equal error rate. 

(Diagram of minimal equal error rate https://api.intechopen.com/media/chapter/66135/media/F2.png).

Questions that immediately need answering:
- What type of problem is this (classification or regression)?
- Has anyone attempted this problem before?
    - If so, how did they approach it? 
        - Which detectors / feature sets / models did they use?
        - What was successful about their approach? 
        - What were their limitations?
- What do the features in the dataset represent?
- Which do we prioritise - false poitives or false negatives (aka in this context: false-alarm rates and miss rates).
    - From the literature (and common sense to be honest), we should prioritise lowering miss rates (it's better to lock out a user, than have a threat access the system).

These were largely answered through reading the aforementioned paper, and doing some background reading and research.

The aforementioned paper also detailed a method by which different detectors could be compared on the same dataset. So to evaluate how our 'new' model performs against its competitors, it makes sense to first implement a pre-existing model, then our new model, and compare performance under the same conditions.

Note: The paper implemented the techniques using R (which I've not used before). Implementation in Python _should_ be the same, but there may be some underlying differences in R/Python's mathematics libraries

# Imports and file processing
Let's import some relevant modules and see what the file's contents are.

In [6]:
# First, imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# for Manhattan detector, need cityblock distance
from scipy.spatial.distance import cityblock

In [7]:
# Read in csv file and check what's inside
df = pd.read_csv('DSL-StrongPasswordData.csv')
df.head()

Unnamed: 0,subject,sessionIndex,rep,H.period,DD.period.t,UD.period.t,H.t,DD.t.i,UD.t.i,H.i,...,H.a,DD.a.n,UD.a.n,H.n,DD.n.l,UD.n.l,H.l,DD.l.Return,UD.l.Return,H.Return
0,s002,1,1,0.1491,0.3979,0.2488,0.1069,0.1674,0.0605,0.1169,...,0.1349,0.1484,0.0135,0.0932,0.3515,0.2583,0.1338,0.3509,0.2171,0.0742
1,s002,1,2,0.1111,0.3451,0.234,0.0694,0.1283,0.0589,0.0908,...,0.1412,0.2558,0.1146,0.1146,0.2642,0.1496,0.0839,0.2756,0.1917,0.0747
2,s002,1,3,0.1328,0.2072,0.0744,0.0731,0.1291,0.056,0.0821,...,0.1621,0.2332,0.0711,0.1172,0.2705,0.1533,0.1085,0.2847,0.1762,0.0945
3,s002,1,4,0.1291,0.2515,0.1224,0.1059,0.2495,0.1436,0.104,...,0.1457,0.1629,0.0172,0.0866,0.2341,0.1475,0.0845,0.3232,0.2387,0.0813
4,s002,1,5,0.1249,0.2317,0.1068,0.0895,0.1676,0.0781,0.0903,...,0.1312,0.1582,0.027,0.0884,0.2517,0.1633,0.0903,0.2517,0.1614,0.0818


In [8]:
subjects = df["subject"].unique()
print(subjects) 
# Confirmation there are 51 unique subjects

['s002' 's003' 's004' 's005' 's007' 's008' 's010' 's011' 's012' 's013'
 's015' 's016' 's017' 's018' 's019' 's020' 's021' 's022' 's024' 's025'
 's026' 's027' 's028' 's029' 's030' 's031' 's032' 's033' 's034' 's035'
 's036' 's037' 's038' 's039' 's040' 's041' 's042' 's043' 's044' 's046'
 's047' 's048' 's049' 's050' 's051' 's052' 's053' 's054' 's055' 's056'
 's057']


# Model development

It is evident this is a classification problem, rather than a regression problem. 

Firstly, let's approach this using standard anomaly detection practices - we will train a model to recognise a certain user's typing pattern, and then test it against the remaining (i.e. imposter) samples, from which we can obtain an anomaly score.

For simplicity, let's implement the Manhattan detector first, and then later we can compare our model's performance to this.

In [11]:
for subject in subjects:
    #print('Training new model for subject {}'.format(subject))
    real_user = df.loc[df.subject == subject]
    fake_user = df.loc[df.subject != subject]

    # We train our model using a genuine user's data
    training_data = real_user[:200].loc[:, 'H.period':'H.Return']
    
    # To test our model, we need both more data from the original user, and imposter user data
    genuine_user_data = real_user[200:].loc[:, 'H.period':'H.Return']
    imposter_user_data = fake_user[:].loc[:, 'H.period':'H.Return']
    
    # Let's check dimensions of our training and testing tuples are the same...just in case
    if training_data.shape != genuine_user_data.shape:
        sys.exit("training_data and genuine_user_data shapes don't match: {} | {}".format(training_data.shape, genuine_user_data.shape))
    elif imposter_user_data.shape[0] != genuine_user_data.shape[0]*100:
        sys.exit("imposter_user_data and genuine_user_data rows aren't 20000 and 200: {} | {}".format(imposter_user_data.shape[0], genuine_user_data.shape[0]))
    
    # Train
    mean_vector = training_data.mean().values # store mean vector in a numpy array to use with cityblock func below
    
    # Test - for each row (entry), compute cityblock distance between mean vector and test vector
    for i in range(genuine_user_data.shape[0]):
        dist = cityblock(genuine_user_data.iloc[i].values, mean_vector)
        print(dist)
        

hello
1.6211135000000003
1.8518405000000004
2.8829555
1.9797515000000003
1.8415755000000005
1.6266745000000002
1.6115985000000004
1.8992815000000005
1.6201205
1.7123544999999998
1.8369985000000002
1.3861565000000005
1.9024335000000006
1.8579985000000008
1.2780105000000002
1.5196985
1.8767225000000005
1.2546085
1.4125055
2.8075784999999995
1.2402965000000004
1.8890245000000006
1.6748765000000003
1.9900785
2.3081655
2.2595295000000006
1.5969445
2.0451035
1.7627365000000004
1.7235935
1.6563475000000003
1.1922045000000003
1.7534305000000001
0.6759945000000005
1.2591755
1.2236165000000003
1.3535135
1.6620845000000002
1.5616074999999998
1.6722585
1.6794925000000003
1.6859035000000004
1.4048385000000003
1.6905014999999999
1.8820755
2.6318474999999992
1.7540245000000003
2.0542355
1.6459175000000001
1.8963185000000005
1.8644005000000001
1.8422655000000003
1.7329655000000004
1.8111675000000007
1.8544365
1.3553075000000006
1.7783995000000004
2.0703445
1.8092005000000004
1.8584675000000004
1.91454

1.7279599999999995
1.4188849999999986
1.937913999999999
1.6573559999999996
1.5766239999999994
1.3554999999999993
1.6116699999999988
1.4330369999999988
1.5452309999999985
1.6683709999999992
1.7280199999999992
1.676075999999999
1.7856649999999987
1.8987299999999983
1.989229999999999
1.9256039999999992
1.8442149999999988
1.9969819999999991
1.5310359999999998
1.5272699999999988
1.8467819999999986
1.7052219999999991
1.7348079999999986
1.840917999999999
2.045517999999999
2.107367999999999
2.2616289999999997
2.433089999999999
1.9674909999999994
1.3220629999999995
1.995444999999999
1.5698409999999992
1.5877699999999992
1.7469129999999993
1.8081519999999993
1.7388099999999989
1.6047689999999988
1.7935259999999993
2.78296
1.652342
1.8039479999999988
1.9278029999999988
1.8551019999999994
1.9591519999999996
1.374634
1.5670279999999994
1.393249999999999
2.1653539999999993
1.4858049999999994
1.769701999999999
2.5772919999999995
1.6747829999999992
1.8727589999999996
1.7134729999999996
1.5217969999999

0.885104
0.9556649999999998
0.9767729999999997
0.6643169999999997
0.7577349999999998
0.7073259999999996
0.7823269999999997
0.784415
0.843898
0.8619819999999999
0.7969090000000001
0.902164
0.9517829999999996
0.8383679999999999
0.7949629999999999
0.793819
0.8684960000000002
0.6922989999999996
0.9967310000000003
0.7210890000000001
0.7738619999999999
0.905305
hello
1.262467
1.2373439999999996
1.2816029999999998
1.057573
1.1797659999999999
1.66725
1.0696880000000002
1.582168
1.4076359999999997
1.1951909999999994
1.0854899999999998
2.6568800000000006
1.3460659999999995
1.157211
1.1127839999999996
1.0661629999999997
0.7721640000000001
1.5472929999999998
1.0389159999999997
1.2428119999999998
1.2959040000000002
1.1972969999999998
1.9042960000000004
1.0462399999999994
1.147372
1.2214029999999998
1.4624660000000003
1.2467660000000005
1.1411029999999998
0.9859650000000002
1.1780249999999999
2.1659589999999995
1.306931
1.1847299999999998
1.094532
1.4630379999999998
1.5434409999999994
1.038873
1.041

0.8479814999999999
0.9753064999999997
1.0580374999999995
1.0458035
1.0017555
1.0211035
1.0501584999999998
1.0954054999999998
1.2569734999999997
1.1282335000000001
0.9872584999999999
1.0916165
0.8088264999999999
0.8528125
1.0532835
1.0652165000000002
1.0717255
1.6820985000000004
1.6566264999999996
1.9311405000000006
0.9518015000000003
0.6764324999999997
0.7678955
1.8335985000000001
1.7854885
2.6493315000000006
0.9626604999999997
0.9218835000000003
1.7192005
1.0363155000000002
0.8334615000000003
0.9133985
1.2296824999999998
0.9246754999999999
1.3545155000000004
1.4157095000000004
0.7854814999999999
0.7095895000000002
0.8588464999999998
1.1624895000000002
1.4021605000000001
0.7906335000000001
0.8697825000000001
0.9515785
0.7723424999999999
1.4320644999999999
0.8660374999999996
1.9212645000000002
0.8656575000000001
0.9256005000000002
0.9181315000000001
0.8839034999999997
0.8070885000000001
0.9379474999999999
0.8756484999999999
0.9941854999999998
0.9216635
0.9139264999999998
1.9164565000000

1.3159930000000002
1.2371200000000004
1.1726270000000005
1.3984350000000003
1.1904640000000004
1.3059710000000004
1.7772569999999996
1.0721309999999997
1.5360860000000003
1.140711
1.2801870000000002
1.2085940000000004
5.48884
1.4952140000000005
3.028011
1.4476450000000003
1.708591
1.576646
1.6584190000000003
1.252819
1.5971360000000006
1.7008950000000003
1.569046
1.5931660000000005
1.5294790000000003
1.4056070000000003
1.5227240000000002
1.587315
1.627107
1.916433
1.680351
1.5001430000000002
1.4658410000000002
1.6086649999999996
1.5766349999999998
1.2361700000000007
1.8085360000000004
1.516454
1.857415
1.7779040000000002
1.7004420000000002
4.603130999999999
1.7634839999999998
1.7283149999999998
1.2284590000000006
1.7575339999999997
1.5584820000000001
1.6724080000000001
1.839319
1.8355389999999998
1.9740729999999997
1.9552250000000002
1.482535
1.661342
1.5567299999999997
1.5609610000000007
2.7612820000000005
1.2862859999999996
2.527954000000001
1.0199100000000003
1.3283310000000004
1.36

1.385092
1.6410950000000002
1.2067010000000005
1.7269509999999995
1.5005950000000003
1.3727239999999992
hello
5.067505
4.503924
1.9596039999999997
2.3805400000000003
4.886224999999999
7.250198999999999
2.469695
3.4022120000000013
7.783367
6.504018
3.751863
5.851026
3.039572
2.393075
2.450356
2.568324000000001
2.948648
3.1558050000000013
3.1884449999999993
2.481258
3.6038260000000015
5.553155000000001
2.6128980000000004
2.8711510000000002
3.266264
2.870521
3.2779590000000005
3.6890769999999997
1.9897310000000008
2.4172140000000004
3.811959
3.297351
3.1253399999999996
3.154717000000001
2.7193460000000003
2.695914000000001
1.5804809999999998
2.7989599999999992
3.5357810000000005
3.373266
3.622915
5.783671000000001
2.608124
3.6867830000000006
3.270574
3.560767000000001
3.637591
3.003068000000001
2.5798850000000013
2.8679
3.929344999999999
2.9258319999999998
3.236274999999999
2.92368
2.8120679999999996
2.613554000000001
3.129782000000001
3.005346
3.5607759999999984
3.332228
2.620971
2.46951

1.7337254999999994
1.9002864999999995
1.7353264999999998
1.6853844999999996
2.284890499999999
1.9402124999999995
1.9687024999999996
1.8542525
1.8230794999999995
5.3909045
1.9234654999999994
1.8562184999999995
hello
0.8173875000000004
0.7301205000000001
1.0653095000000004
1.1188665000000004
0.7911665000000003
0.7878055
1.0728075000000001
2.0277855000000002
1.0537835000000002
1.0267675000000003
0.8443825000000003
0.6072904999999998
1.707749500000001
2.7179114999999996
1.0907065000000005
1.1776495000000005
1.1853635000000007
1.9035185000000008
0.8812715000000004
1.1295665000000004
1.2039145
1.6106295000000002
1.2554485
2.0030905000000003
2.2190995
1.1140345000000003
1.7572165000000006
0.8486135000000004
1.2124255000000002
0.9118824999999998
0.7242755000000007
1.3181945000000008
0.9385735000000007
1.3975165000000005
0.6167674999999998
0.7526615000000006
1.4731704999999997
0.9389075000000001
2.4055615
0.8139425000000005
0.8054745000000004
0.8211225000000002
0.6736945000000004
1.781808500000

1.5474394999999994
1.8961794999999995
2.0175745
1.2716674999999995
2.1795614999999993
1.5442724999999988
1.4822435000000003
1.5968884999999997
1.4039114999999995
1.8943904999999988
1.3619875000000004
1.7634864999999995
2.0316504999999996
1.8938964999999992
1.4610294999999993
5.5173915000000004
1.3423465000000014
4.6260055
1.4821364999999997
4.0702195
1.4523075000000005
2.9372225
1.7112195
1.569859500000001
1.3201795000000014
1.6415965000000008
1.6731715000000014
1.3899515
1.8124445000000005
1.1569525000000007
0.9669264999999998
1.5786845
1.5550515000000014
1.5716295000000005
1.4887715000000004
1.5822985000000003
1.3727485000000001
1.0358094999999996
2.0557345000000002
1.3531214999999992
1.4043714999999994
1.3834994999999997
2.9612024999999993
1.9078785000000003
1.7317884999999993
1.0733585000000003
2.7187704999999998
1.4477335000000005
1.3014555
2.3135785000000006
1.6004664999999996
1.2487514999999996
1.7653594999999995
3.2904585
1.2677674999999997
1.1858575
0.9923794999999995
1.346309

2.5405190000000006
3.574591
2.108653999999999
2.092461
1.7158129999999998
1.787226
1.670643
2.0256740000000004
1.5640000000000003
1.6095400000000002
2.74455
1.4201830000000002
1.6275610000000007
1.4494540000000002
2.28537
1.7729489999999999
2.370351
1.7325950000000006
hello
2.7590129999999995
2.9148180000000004
3.7559920000000004
2.4371079999999994
2.4105649999999996
5.006814999999999
2.3630729999999986
3.8465160000000007
2.2273339999999995
5.868807
2.8359729999999996
3.5284069999999983
3.0095239999999994
4.371990000000001
2.063193000000002
2.8456090000000005
2.6122769999999993
4.209916
3.5804789999999995
3.215978
2.5541369999999994
3.9484199999999996
2.769396999999999
3.513881000000001
3.7985719999999996
3.6488189999999996
2.6795539999999995
3.1813569999999998
2.296299
2.754427000000001
2.249749
2.439215
8.260154000000002
3.8496619999999995
2.214344000000001
2.558758999999999
3.9008589999999996
5.537650999999999
3.134572999999999
2.080131999999999
2.3427279999999997
1.8392089999999992

1.6427904999999994
1.7960984999999996
1.7210234999999994
1.7000995
1.7728485
1.4078995
1.4590025
1.6731025000000002
1.7894975
1.8149764999999998
1.6934574999999994
1.7131115000000003
1.6641154999999996
2.3490844999999996
1.9264964999999998
1.7074605
2.376599499999999
1.7183644999999994
1.7392005000000004
1.8408604999999998
1.7373264999999998
2.4928135000000005
1.7536704999999997
1.7846315
1.7771434999999995
3.049833499999999
1.5093184999999998
1.9292325
1.5587605
1.9215604999999998
1.9723944999999996
1.6645954999999995
1.8332115
2.0003865
2.3812605
1.8684455000000002
2.6409685
2.0759604999999994
5.0956255000000015
1.7976345
2.4180565
1.6547334999999999
1.5881115
1.8171145
1.6256264999999999
1.3349955000000004
1.6496924999999998
1.4688605000000001
0.7770835000000003
1.9617134999999999
1.7855165
1.5871114999999998
1.7034364999999998
2.0691255
2.449521499999999
2.0497905000000003
1.8622794999999996
2.0957754999999993
1.8897694999999999
2.0160265
1.6892114999999999
1.9647175000000006
2.071

2.697179
3.2147650000000003
2.1098219999999994
3.3607209999999994
2.414488999999999
1.8567200000000008
3.133114
1.6228789999999997
1.9745440000000005
1.95513
2.0166829999999996
2.1790079999999996
3.1839110000000006
2.1117289999999995
3.001029
2.3780940000000004
2.3003080000000002
2.1353170000000006
1.8785299999999998
2.949858
1.8848509999999994
2.309144
2.2858659999999986
4.287137
2.7757500000000004
1.6545899999999996
2.0865700000000005
1.9886009999999994
2.007324000000001
1.2621140000000013
2.9331519999999993
1.9063380000000005
3.2067910000000017
1.5926259999999992
2.610172
3.838155
2.1310670000000003
3.749876
4.3677790000000005
1.9326329999999998
2.374243
3.3359599999999996
1.522555999999999
2.476682999999999
2.9050089999999993
3.4364060000000007
2.3902849999999995
2.3869140000000013
1.8592169999999997
2.533143000000001
2.915599
2.1064229999999995
2.9984720000000014
2.520708999999999
4.204293000000001
3.325447999999999
3.55646
2.290516
2.22551
2.949437000000001
3.2242960000000003
2.1

1.009521
1.0259429999999998
0.8062769999999999
1.088232
1.025638
1.256797
1.069839
1.0647590000000002
1.2284579999999996
0.9969379999999999
1.0548699999999998
1.1246530000000001
1.084853
1.0405849999999999
1.0542929999999997
1.1465219999999998
2.5595029999999994
0.9362250000000002
0.993175
0.9047740000000001
1.1037139999999999
0.9225039999999994
1.0406
1.1176619999999997
1.1237299999999997
1.2440779999999996
1.120315
1.938711
1.0618100000000001
1.1337560000000002
hello
2.614099
2.4858850000000006
2.623719
1.8528500000000003
2.6686549999999998
2.2372779999999994
1.9872919999999994
2.5833679999999997
2.4804850000000003
1.967517
2.617184999999999
2.435112999999999
2.072672
2.7570249999999987
2.3506029999999996
2.4370409999999993
1.5273629999999996
2.000671
2.849767999999999
3.887737
2.097575
2.2107319999999997
2.279066999999999
1.6270179999999994
1.9984319999999989
1.8176079999999997
2.5529339999999996
1.85488
2.2109839999999994
3.423657
2.1605479999999995
2.1941109999999995
2.35570799999

2.9510850000000004
2.624698
2.7088149999999995
2.826334
2.535266
2.4695359999999997
4.574591999999999
2.3321200000000006
2.476955
2.666187
2.5932869999999997
2.675193
2.277687
2.740387
2.4618450000000003
2.136338000000001
2.1431600000000004
2.172334
2.734434
2.256776
2.4578339999999996
2.3825339999999997
1.7648940000000002
2.73925
2.619234
2.4877339999999997
2.479034
2.9674090000000004
2.583834
2.632434
2.142082
2.6578339999999994
2.427034
2.8309339999999996
1.8899869999999994
1.2053570000000002
1.8374609999999998
1.5363169999999995
1.8201379999999994
1.4724509999999997
1.5856849999999998
1.8655820000000003
1.6941519999999997
1.7090210000000003
1.685988
1.6096510000000002
1.7908999999999993
2.138941
2.502824
1.4823960000000007
1.3848400000000005
1.6542859999999997
1.7380049999999996
1.960597
1.566396
1.8673249999999997
1.7646309999999996
1.784474
1.7756099999999995
1.956096
2.0318539999999996
1.8145989999999999
1.9403060000000005
2.086794
1.760596
2.2099209999999996
2.1777539999999997


1.5927345000000002
1.3670305
1.2316485
1.4780424999999997
1.4822994999999999
1.3009374999999999
1.5893815
1.2616965
1.2031324999999995
1.1555285
1.1798594999999998
1.1515995000000003
0.9501214999999996
1.4333274999999992
1.2049474999999998
1.2886625
1.2868274999999998
1.1357885
1.2763725
1.7223254999999995
1.0089414999999997
1.2179825
1.5518755
1.5637824999999996
1.3660154999999998
1.1964065
1.3721655000000001
1.6195194999999996
1.3570244999999999
2.4438905
1.2757935
1.3940525
1.3784515000000004
1.1671684999999996
1.5067804999999999
1.4402824999999997
1.2953504999999994
1.3825434999999997
1.6223525000000005
1.4558794999999995
1.1485554999999996
1.3422574999999997
1.3215014999999997
1.2804684999999996
1.2534854999999998
1.5253135000000002
1.0536124999999998
1.2382304999999998
0.9708574999999994
1.1182715
1.4562115000000002
1.2794524999999999
0.9746074999999998
1.2314945
1.2805764999999996
1.3773094999999995
1.3441584999999994
1.1891895000000001
1.5238385
1.4471505
1.5000695
1.4540775
1.

1.1861899999999996
0.6287249999999991
0.7347079999999994
0.8584719999999995
0.8487619999999997
1.0057629999999993
0.8328839999999991
0.9784859999999997
0.7642179999999991
0.7523699999999999
0.8276719999999997
0.9764759999999997
1.0895559999999997
0.9549609999999993
0.9469840000000004
1.3187509999999998
2.0595789999999994
1.6120159999999988
1.7147019999999995
0.9293139999999993
0.7908509999999994
1.0373419999999995
1.2906870000000001
1.0177109999999998
0.9173729999999993
0.8058119999999996
0.7777929999999994
1.4319070000000003
0.8132149999999994
0.6790339999999996
0.7032239999999991
1.0540049999999994
0.9073799999999994
0.7860759999999996
0.7621049999999995
0.9213849999999996
0.7940449999999992
1.5331869999999994
0.8651669999999996
0.8185479999999996
1.1494269999999998
0.7191849999999992
0.6413409999999997
hello
0.547258
0.7067219999999997
0.623416
0.9069140000000002
0.8324680000000001
0.9406709999999999
2.36089
1.0130050000000002
1.242159
0.9773630000000002
1.0649669999999998
1.191143


2.2679820000000004
1.2954530000000006
1.1620370000000009
1.6034640000000007
1.2084230000000007
1.4190120000000006
1.5147329999999999
1.5224220000000006
1.3887580000000006
1.4954130000000005
1.3999250000000005
1.2552820000000005
1.2940160000000005
1.2152900000000004
1.3420170000000005
1.2075170000000004
1.4214660000000001
1.482091000000001
2.679884000000001
1.262032000000001
1.4764890000000008
1.5175910000000008
1.3579770000000007
1.551663000000001
1.9102440000000003
1.2482529999999998
1.3111400000000006
1.1764150000000007
1.2566670000000006
1.2375660000000006
1.3311910000000002
1.2525470000000005
1.4321780000000008
2.6775150000000005
1.5443710000000008
1.482403
1.4550010000000007
1.9592240000000005
1.5136860000000008
1.4461790000000003
1.4431990000000003
1.354709000000001
1.434517000000001
1.4525320000000008
2.309771000000002
1.3628520000000002
1.2859280000000006
1.1636770000000007
1.3173320000000008
1.3447840000000009
1.2295010000000002
1.3425380000000005
1.2482250000000008
1.32858200

# Comments on Manhattan 