This application purpose is to calculate and evaluate a number od different classifiers on Keystroke Biometric or Custom provided dataset. After preparing classifiers it is possible to unlimitely reuse them to score their efficiency on any number of different scenarios. Program utilizes simple yaml configuration for ease of use and only prints the results to a console.
Currently available classifier:
- K Nearest Neighbors (KNN)
- Gaussian Naive Bayes (GNB)
- Decision Tree (DT)
- Random Forest (RF)
docker build -t dknapik/keystroke-biometric-data-classifier .
Unix
docker run -it -v $(pwd)/data:/app/data -v $(pwd)/KeystrokeBiometricDataClassifier.yaml:/app/KeystrokeBiometricDataClassifier.yaml dknapik/keystroke-biometric-data-classifier
Windows
docker run -it -v $(PWD)/data:/app/data -v $(PWD)/KeystrokeBiometricDataClassifier.yaml:/app/KeystrokeBiometricDataClassifier.yaml dknapik/keystroke-biometric-data-classifier
You're required to have Python3.8 together with PIP installed on your machine.
python3.8 -m pip install -r requirements.txt
export PYTHONPATH=$(pwd)
python3.8 ./python/keystroke_biometric_data_classifier.py
All of the configurable parameters are located inside KeystrokeBiometricDataClassifier.yaml file. Currently, modyfing this file is the only way of customizing each application run.
- classifier (array of objects) - a set of all available classifier
- run (boolean) - determines if the specific classifier should be used
- model (string) - path to already prepared classifier. If set then no training dataset will be used in order to create new classifier for this algorithm.
- testing_dataset (object) - data on which classifier will be scored on.
- training_dataset (object) - data on which classifier will be trained on.
- file_path (string) - path to dataset
- starting_index (integer) - index of the first tuple from which the data will be extracted (Indexes start from 0)
- ending_index (integer) - index of the last tuple to which the data will be extracted (Indexes start from 0)
- class_label_column_index (integer) - index of the column that will be utilized as class label. (Indexes start from 0)
- save_model (boolean) - determines if newly calculated classification model should be saved to a file. (Files are saved to the data/output directory)
classifier:
k_nearest_neighbors:
run: true
model: "/data/output/KNN_2024-02-04_22-21-32.joblib"
gaussian_naive_bayes:
run: true
model: "/data/output/GNB_2024-02-04_22-21-32.joblib"
decision_tree:
run: true
model: "/data/output/DT_2024-02-04_22-21-32.joblib"
random_forest:
run: true
model: "/data/output/RF_2024-02-04_22-21-32.joblib"
testing_dataset:
file_path: "/data/custom_dataset/generated_2024-02-04_21-59-30.csv"
starting_index: 51
ending_index: 120
class_label_column_index: 113
training_dataset:
file_path: "/data/keystroke_biometric_data/user_1.csv"
starting_index: 0
ending_index: 50
class_label_column_index: 113
save_model: false