## Benchmark

In this file we'll se performance of the models shown in ``showcase_of_models.ipynb``.
We'll use various evaluation metrics and we'll compare performance of diffrent models on the same datasets.

In [1]:
# imports
import pandas as pd

from ML import KNN_classifier, DecisionTree, RandomForest
from data_processing import preprocess_data, split_data

In [2]:
# load data
data = pd.read_csv('task_data.csv')
data = preprocess_data(data)

In [3]:
# create models
models = [KNN_classifier(k=3, name='KNN3'), 
          KNN_classifier(k=5, name='KNN5'), 
          DecisionTree(max_depth=5, name='DecisionTree'), 
          RandomForest(n_trees=20, max_depth=20, n_features=3, name='RandomForest')]

### LpO Cross-validation

Or *Leave p out Cross-validation* means that each time going over dataset we'll leave ``p`` randomly chosen samples to the test dataset and the rest to train dataset 

In [4]:
p = 8  # number of samples to leave out in each iteration
N = 100  # number of iterations

TEST_SIZE = p / data.shape[0]

for _ in range(N):
    # split data into random train and test sets
    X_train, y_train, X_test, y_test = split_data(data, 'Cardiomegaly', test_size=TEST_SIZE)

    # fit models
    for model in models:
        model.fit(X_train, y_train)

    # evaluate models
    eval_metrics = [model.evaluate(X_test, y_test) for model in models]

    # clear models for next iteration
    for model in models:
        model.clear()

# print evaluation results
for model_name, summary, _ in eval_metrics:
    print(f"Model: {model_name}")
    print(summary)
    print("-" * 70)

# print confusion matrices
for model_name, _, conf_matrix in eval_metrics:
    print(f"\nModel {model_name}:\n{conf_matrix}\n")

Model: KNN3
Accuracy=75.38% | Precision=81.52% | Recall=85.84% | F1 Score=83.62%
----------------------------------------------------------------------
Model: KNN5
Accuracy=74.00% | Precision=78.21% | Recall=89.42% | F1 Score=83.44%
----------------------------------------------------------------------
Model: DecisionTree
Accuracy=69.38% | Precision=78.85% | Recall=79.52% | F1 Score=79.18%
----------------------------------------------------------------------
Model: RandomForest
Accuracy=70.50% | Precision=78.23% | Recall=82.76% | F1 Score=80.43%
----------------------------------------------------------------------

Model KNN3:
                    Actual Positive  Actual Negative
Predicted Positive              503              114
Predicted Negative               83              100


Model KNN5:
                    Actual Positive  Actual Negative
Predicted Positive              524              146
Predicted Negative               62               68


Model DecisionTree:
         

### Conclusion 
The results show that all models perform about the same, ``KNN3`` model has higest accuracy and precision though