As every person has unique fingerprints, in the same way we all have unique behavioural patterns. 

Behavioural biometrics is the field of study related to the measure of uniquely identifying and measurable patterns in human activities. 

In this notebook I represent a sample of data that we can collect for  behavioural biometrics.

In [1]:
import pandas as pd
import numpy as np


In [2]:
# constructing data 
np.random.seed(42)

df = pd.DataFrame()

# add group labels
df['customer_id'] = [1]*5 + [2]*20 + [3]*10

df['ip_info'] = ['127.0.0.1'] * 5 + ['2.134.213.2'] * 20 + ['5.14.54.98'] * 10

# add timestamps
df['time_login'] = np.tile(
    A = pd.date_range("2020-01-01 00:00:00", periods=5, freq="45T").values, reps=7)

df['time_logout'] = np.tile(
    A = pd.date_range("2020-01-01 00:10:00", periods=5, freq="45T").values, reps=7)




## Web / IP Features


In [3]:
# android smartphone example
u1 = "Mozilla/5.0 (Linux; Android 7.0; SM-G930VC Build/NRD90M; wv)" +\
" AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36"

# apple pc example
u2 = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9" +\
"(KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"

u3 = "Safari/9.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9" +\
"(KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"

# bot/spider example
u4 = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

In [4]:
df['user_agants'] = ([u1]*5 + [u2]*15 + [u4]*2 + [u3]*10+ [u4]*3)

## Geo Features

In [5]:
from geopy.geocoders import Nominatim
# OpenStreetMap geocoder example
geolocator = Nominatim()

location = geolocator.geocode("Nairobi City Hall", language='en')
print(location.address, '\n')
print((location.latitude, location.longitude), '\n')
print(location.raw)

Nairobi City Hall, Mama Ngina Street, Ngara, Nairobi, P.O BOX 30551 – 00100 G.P.O. NAIROBI, Kenya 

(-1.2865715500000001, 36.82164399517996) 

{'place_id': 149023342, 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright', 'osm_type': 'way', 'osm_id': 241193508, 'boundingbox': ['-1.2871211', '-1.2861351', '36.8210974', '36.8223583'], 'lat': '-1.2865715500000001', 'lon': '36.82164399517996', 'display_name': 'Nairobi City Hall, Mama Ngina Street, Ngara, Nairobi, P.O BOX 30551 – 00100 G.P.O. NAIROBI, Kenya', 'class': 'amenity', 'type': 'townhall', 'importance': 0.31100000000000005}




In [6]:
g1 = [(-1.2865715500000001, 36.82164399517996)]
g2 = [(-1.2832998, 36.821811074845286)]
g3 = [(-1.283296098, 36.8218141845286)]
g4 = [(-1.27657155000001, 36.82164351996)]
g5 = [(-1.2657155000001, 36.82164351996)]

In [7]:
df['geo_loc'] = g1*4 + g2*10 + g1*3 +g4*3 +g3*5 + g2*2 + g5*5 + g1*3

In [8]:
df.head()

Unnamed: 0,customer_id,ip_info,time_login,time_logout,user_agants,geo_loc
0,1,127.0.0.1,2020-01-01 00:00:00,2020-01-01 00:10:00,Mozilla/5.0 (Linux; Android 7.0; SM-G930VC Bui...,"(-1.2865715500000001, 36.82164399517996)"
1,1,127.0.0.1,2020-01-01 00:45:00,2020-01-01 00:55:00,Mozilla/5.0 (Linux; Android 7.0; SM-G930VC Bui...,"(-1.2865715500000001, 36.82164399517996)"
2,1,127.0.0.1,2020-01-01 01:30:00,2020-01-01 01:40:00,Mozilla/5.0 (Linux; Android 7.0; SM-G930VC Bui...,"(-1.2865715500000001, 36.82164399517996)"
3,1,127.0.0.1,2020-01-01 02:15:00,2020-01-01 02:25:00,Mozilla/5.0 (Linux; Android 7.0; SM-G930VC Bui...,"(-1.2865715500000001, 36.82164399517996)"
4,1,127.0.0.1,2020-01-01 03:00:00,2020-01-01 03:10:00,Mozilla/5.0 (Linux; Android 7.0; SM-G930VC Bui...,"(-1.2832998, 36.821811074845286)"


In [9]:
test = geolocator.reverse((-1.2865715500000001, 36.82164399517996) , language='en')
print(test.address, '\n')
print((test.latitude, test.longitude), '\n')
print(test.raw)


Nairobi City Hall, Mama Ngina Street, Ngara, Nairobi, P.O BOX 30551 – 00100 G.P.O. NAIROBI, Kenya 

(-1.2865715500000001, 36.82164399517996) 

{'place_id': 149023342, 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright', 'osm_type': 'way', 'osm_id': 241193508, 'lat': '-1.2865715500000001', 'lon': '36.82164399517996', 'display_name': 'Nairobi City Hall, Mama Ngina Street, Ngara, Nairobi, P.O BOX 30551 – 00100 G.P.O. NAIROBI, Kenya', 'address': {'amenity': 'Nairobi City Hall', 'road': 'Mama Ngina Street', 'suburb': 'Ngara', 'city': 'Nairobi', 'state': 'Nairobi', 'region': 'Nairobi', 'postcode': 'P.O BOX 30551 – 00100 G.P.O. NAIROBI', 'country': 'Kenya', 'country_code': 'ke'}, 'boundingbox': ['-1.2871211', '-1.2861351', '36.8210974', '36.8223583']}


Given the above information, we can extract meta-information, do some feature engineering to get more value from the data;

This data will be used to help establish a pattern of normal/abnormal behaviour for the customer and label it (normal - [1], abnormal - [0]).

Therefore, we will prepare our dataset for ML algorithms (binary classification).



In [10]:
from google.colab import files

df.to_csv('behav_biometrics.csv')

files.download('behav_biometrics.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>