### Model Validation and Deployment

Prepare the model for deployment, including saving the model.

In [1]:
import pickle
import pandas as pd
from sklearn.metrics import accuracy_score, classification_report

# Load the models
with open('../models/isolation_forest.pkl', 'rb') as file:
    isolation_forest_model = pickle.load(file)

with open('../models/local_outlier_factor.pkl', 'rb') as file:
    local_outlier_factor_model = pickle.load(file)

with open('../models/support_vector_machine.pkl', 'rb') as file:
    one_class_svm_model = pickle.load(file)

# Load and prepare data
data = pd.read_csv('../data/raw/creditcard.csv', sep=',')
data1 = data.sample(frac=0.1, random_state=1)
columns = [c for c in data1.columns if c not in ["Class"]]
X = data1[columns]

# Predict and evaluate
for model_name, model in {
    "Isolation Forest": isolation_forest_model,
    "Local Outlier Factor": local_outlier_factor_model,
    "Support Vector Machine": one_class_svm_model
}.items():
    if model_name == "Local Outlier Factor":
        y_pred = model.fit_predict(X)
        y_pred[y_pred == 1] = 0
        y_pred[y_pred == -1] = 1
    elif model_name == "Support Vector Machine":
        y_pred = model.predict(X)
        y_pred[y_pred == 1] = 0
        y_pred[y_pred == -1] = 1
    else:
        y_pred = model.predict(X)
        y_pred[y_pred == 1] = 0
        y_pred[y_pred == -1] = 1

    # Print results
    print(f"{model_name}:")
    print(f"Number of Errors: {(y_pred != data1['Class']).sum()}")
    print(f"Accuracy Score: {accuracy_score(data1['Class'], y_pred)}")
    print(f"Classification Report:\n{classification_report(data1['Class'], y_pred)}")


Isolation Forest:
Number of Errors: 73
Accuracy Score: 0.9974368877497279
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     28432
           1       0.26      0.27      0.26        49

    accuracy                           1.00     28481
   macro avg       0.63      0.63      0.63     28481
weighted avg       1.00      1.00      1.00     28481

Local Outlier Factor:
Number of Errors: 97
Accuracy Score: 0.9965942207085425
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     28432
           1       0.02      0.02      0.02        49

    accuracy                           1.00     28481
   macro avg       0.51      0.51      0.51     28481
weighted avg       1.00      1.00      1.00     28481

Support Vector Machine:
Number of Errors: 8515
Accuracy Score: 0.7010287560127805
Classification Report:
              precision    recall  f1-score

#### Observations :
- Isolation Forest found 73 errors, LOF found 97 errors, and SVM found 8515 errors.
- Isolation Forest is 99.74% accurate, better than LOF at 99.65% and SVM at 70.09%.
- Isolation Forest detects about 27% of fraud cases, much better than LOF's 2% and SVM's 0%.
- Overall, Isolation Forest is the best for identifying fraud, with about 30% accuracy.
- Accuracy can be improved by using larger samples or deep learning, though it will be more computationally expensive.