Skip to content

damianknapik97/KeystrokeBiometricDataClassifier

Repository files navigation

About the project

This application purpose is to calculate and evaluate a number od different classifiers on Keystroke Biometric or Custom provided dataset. After preparing classifiers it is possible to unlimitely reuse them to score their efficiency on any number of different scenarios. Program utilizes simple yaml configuration for ease of use and only prints the results to a console.

Currently available classifier:

  • K Nearest Neighbors (KNN)
  • Gaussian Naive Bayes (GNB)
  • Decision Tree (DT)
  • Random Forest (RF)

Running the project

Docker

Building docker image

docker build -t dknapik/keystroke-biometric-data-classifier .

Running container

Unix

docker run -it -v $(pwd)/data:/app/data -v $(pwd)/KeystrokeBiometricDataClassifier.yaml:/app/KeystrokeBiometricDataClassifier.yaml dknapik/keystroke-biometric-data-classifier

Windows

docker run -it -v $(PWD)/data:/app/data -v $(PWD)/KeystrokeBiometricDataClassifier.yaml:/app/KeystrokeBiometricDataClassifier.yaml dknapik/keystroke-biometric-data-classifier

Plain (Unix)

You're required to have Python3.8 together with PIP installed on your machine.

Installing dependencies

python3.8 -m pip install -r requirements.txt

Setting python path

export PYTHONPATH=$(pwd)

Running project

python3.8 ./python/keystroke_biometric_data_classifier.py

Configuration

All of the configurable parameters are located inside KeystrokeBiometricDataClassifier.yaml file. Currently, modyfing this file is the only way of customizing each application run.

Parameters:

  • classifier (array of objects) - a set of all available classifier
    • run (boolean) - determines if the specific classifier should be used
    • model (string) - path to already prepared classifier. If set then no training dataset will be used in order to create new classifier for this algorithm.
  • testing_dataset (object) - data on which classifier will be scored on.
  • training_dataset (object) - data on which classifier will be trained on.
    • file_path (string) - path to dataset
    • starting_index (integer) - index of the first tuple from which the data will be extracted (Indexes start from 0)
    • ending_index (integer) - index of the last tuple to which the data will be extracted (Indexes start from 0)
    • class_label_column_index (integer) - index of the column that will be utilized as class label. (Indexes start from 0)
  • save_model (boolean) - determines if newly calculated classification model should be saved to a file. (Files are saved to the data/output directory)

Exemplary configuration:

classifier:
  k_nearest_neighbors:
    run: true
    model: "/data/output/KNN_2024-02-04_22-21-32.joblib"
  gaussian_naive_bayes:
    run: true
    model: "/data/output/GNB_2024-02-04_22-21-32.joblib"
  decision_tree:
    run: true
    model: "/data/output/DT_2024-02-04_22-21-32.joblib"
  random_forest:
    run: true
    model: "/data/output/RF_2024-02-04_22-21-32.joblib"
testing_dataset:
  file_path: "/data/custom_dataset/generated_2024-02-04_21-59-30.csv"
  starting_index: 51
  ending_index: 120
  class_label_column_index: 113
training_dataset:
  file_path: "/data/keystroke_biometric_data/user_1.csv"
  starting_index: 0
  ending_index: 50
  class_label_column_index: 113
save_model: false

More on Keystroke Dynamics:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors