## Benchmark

In this file we'll se performance of the models shown in ``showcase_of_models.ipynb``.
We'll use various evaluation metrics and we'll compare performance of diffrent models on the same datasets.

In [1]:
# imports
import pandas as pd

from ML import KNN_classifier, DecisionTree, RandomForest
from data_processing import preprocess_data, split_data

In [2]:
# load data
data = pd.read_csv('task_data.csv')
data = preprocess_data(data)

In [3]:
# create models
models = [KNN_classifier(k=3, name='KNN3'), 
          KNN_classifier(k=5, name='KNN5'), 
          DecisionTree(max_depth=5, name='DecisionTree'), 
          RandomForest(n_trees=64, max_depth=20, n_features=3, name='RandomForest')]

### LpO Cross-validation

Or *Leave p out Cross-validation* means that each time going over dataset we'll leave ``p`` randomly chosen samples to the test dataset and the rest to train dataset 

In [4]:
p = 8  # number of samples to leave out in each iteration
N = 100  # number of iterations

TEST_SIZE = p / data.shape[0]

for _ in range(N):
    # split data into random train and test sets
    X_train, y_train, X_test, y_test = split_data(data, 'Cardiomegaly', test_size=TEST_SIZE)

    # fit models
    for model in models:
        model.fit(X_train, y_train)

    # evaluate models
    eval_metrics = [model.evaluate(X_test, y_test) for model in models]

    # clear models for next iteration
    for model in models:
        model.clear()

# print evaluation results
for model_name, summary, _ in eval_metrics:
    print(f"Model: {model_name}")
    print(summary)
    print("-" * 70)

# print confusion matrices
for model_name, _, conf_matrix in eval_metrics:
    print(f"\nModel {model_name}:\n{conf_matrix}\n")

Model: KNN3
Accuracy=78.75% | Precision=82.98% | Recall=90.62% | F1 Score=86.64%
----------------------------------------------------------------------
Model: KNN5
Accuracy=78.12% | Precision=81.61% | Recall=91.94% | F1 Score=86.47%
----------------------------------------------------------------------
Model: DecisionTree
Accuracy=70.62% | Precision=79.56% | Recall=82.57% | F1 Score=81.03%
----------------------------------------------------------------------
Model: RandomForest
Accuracy=73.25% | Precision=81.27% | Recall=84.21% | F1 Score=82.71%
----------------------------------------------------------------------

Model KNN3:
                    Actual Positive  Actual Negative
Predicted Positive              551              113
Predicted Negative               57               79


Model KNN5:
                    Actual Positive  Actual Negative
Predicted Positive              559              126
Predicted Negative               49               66


Model DecisionTree:
         

### Conclusion 
The results show that all models perform about the same, ``KNN3`` model has higest accuracy and precision though