## Benchmark

In this file we'll se performance of the models shown in ``showcase_of_models.ipynb``.
We'll use various evaluation metrics and we'll compare performance of diffrent models on the same datasets.

In [1]:
# imports
import pandas as pd

from ML import KNN_classifier, DecisionTree, RandomForest
from data_processing import preprocess_data, split_data

In [2]:
# load data
data = pd.read_csv('task_data.csv')
data = preprocess_data(data)

In [3]:
# create models
models = [KNN_classifier(k=3, name='KNN3'), 
          KNN_classifier(k=5, name='KNN5'), 
          DecisionTree(max_depth=5, name='DecisionTree'), 
          RandomForest(n_trees=100, max_depth=10, name='RandomForest')]

### LpO Cross-validation

Or *Leave p out Cross-validation* means that each time going over dataset we'll leave ``p`` randomly chosen samples to the test dataset and the rest to train dataset 

In [None]:
p = 8  # number of samples to leave out in each iteration
N = 100  # number of iterations

TEST_SIZE = p / data.shape[0]

for _ in range(N):
    # split data into random train and test sets
    X_train, y_train, X_test, y_test = split_data(data, 'Cardiomegaly', test_size=TEST_SIZE)

    # fit models
    for model in models:
        model.fit(X_train, y_train)

    # evaluate models
    eval_metrics = [model.evaluate(X_test, y_test) for model in models]

    # clear models for next iteration
    for model in models:
        model.clear()

# print evaluation results
print("-" * 40)
for model_name, summary, _ in eval_metrics:
    print(f"Model: {model_name}")
    print(summary)
    print("-" * 40)

# print confusion matrices
for model_name, _, conf_matrix in eval_metrics:
    print(f"\n{model_name}:\n{conf_matrix}\n")

----------------------------------------
Model: KNN3
Accuracy=0.78 | Precision=0.84 | Recall=0.88 | F1 Score=0.86
----------------------------------------
Model: KNN5
Accuracy=0.77 | Precision=0.82 | Recall=0.90 | F1 Score=0.86
----------------------------------------
Model: DecisionTree
Accuracy=0.71 | Precision=0.82 | Recall=0.80 | F1 Score=0.81
----------------------------------------
Model: RandomForest
Accuracy=0.72 | Precision=0.82 | Recall=0.82 | F1 Score=0.82
----------------------------------------

KNN3:
                    Actual Positive  Actual Negative
Predicted Positive              542              103
Predicted Negative               76               79


KNN5:
                    Actual Positive  Actual Negative
Predicted Positive              558              126
Predicted Negative               60               56


DecisionTree:
                    Actual Positive  Actual Negative
Predicted Positive              495              109
Predicted Negative              