# Model Training

In this notebook, we will ask you a series of questions regarding model selection. Based on your responses, we will ask you to create the ML models that you've chosen. 

The bonus step is completely optional, but if you provide a sufficient third machine learning model in this project, we will add `1000` points to your Kahoot leaderboard score.

**Note**: Use the dataset that you've created in your previous data transformation step (not the original model).

## Questions
Is this a classification or regression task?  

Fraud detection in this dataset is a classification task.

Are you predicting for multiple classes or binary classes?  

It is a binary classification because it's predicting one of two classes.

Given these observations, which 2 (or possibly 3) machine learning models will you choose?  

List your models here

Logistic Regression, SVM, and KNN


## First Model

Using the first model that you've chosen, implement the following steps.

In [1]:
import pandas as pd 
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

### 1) Create a train-test split

Use your cleaned and transformed dataset to divide your features and labels into training and testing sets. Make sure you’re only using numeric or properly encoded features.  

In [3]:
from sklearn.model_selection import train_test_split
transactions = pd.read_csv("../data/bank_transactions.csv")

numeric_features = transactions.select_dtypes(include=['int64', 'float64']).drop(columns='isFraud')

X = numeric_features
y = transactions['isFraud']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)

X_train shape: (800000, 6)
y_train shape: (800000,)
X_test shape: (200000, 6)
y_test shape: (200000,)


### 2) Search for best hyperparameters
Use tools like GridSearchCV, RandomizedSearchCV, or model-specific tuning functions to find the best hyperparameters for your first model.

In [7]:
from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(
    estimator=transactions,
    param_grid='isFraud',
    cv=5,
    n_jobs=-1,
    scoring='f1',
    verbose=2
)
grid_search.fit(X_train, y_train)


InvalidParameterError: The 'estimator' parameter of GridSearchCV must be an object implementing 'fit'. Got             type      amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0        PAYMENT      983.09  C1454812978       36730.24        35747.15   
1        PAYMENT    55215.25  C1031766358       99414.00        44198.75   
2        CASH_IN   220986.01  C1451868666     7773074.97      7994060.98   
3       TRANSFER  2357394.75   C458368123           0.00            0.00   
4       CASH_OUT    67990.14  C1098978063           0.00            0.00   
...          ...         ...          ...            ...             ...   
999995   PAYMENT    13606.07   C768838592      114122.11       100516.04   
999996   PAYMENT     9139.61  C1912748675           0.00            0.00   
999997  CASH_OUT   153650.41  C1494179549       50677.00            0.00   
999998  CASH_OUT   163810.52   C116856975           0.00            0.00   
999999  CASH_OUT    51379.41  C2103541974       45503.43            0.00   

           nameDest  oldbalanceDest  newbalanceDest  isFraud  isFlaggedFraud  
0       M1491308340            0.00            0.00        0               0  
1       M2102868029            0.00            0.00        0               0  
2       C1339195526       924031.48       703045.48        0               0  
3        C620979654      4202580.45      6559975.19        0               0  
4        C142246322       625317.04       693307.19        0               0  
...             ...             ...             ...      ...             ...  
999995  M1593119373            0.00            0.00        0               0  
999996   M842968564            0.00            0.00        0               0  
999997  C1560012502            0.00       380368.36        0               0  
999998  C1348490647       357850.15       521660.67        0               0  
999999   C924733771       202760.18       254139.59        0               0  

[1000000 rows x 10 columns] instead.

### 3) Train your model
Select the model with best hyperparameters and generate predictions on your test set. Evaluate your models accuracy, precision, recall, and sensitivity.  

## Second Model

Create a second machine learning object and rerun steps (2) & (3) on this model. Compare accuracy metrics between these two models. Which handles the class imbalance more effectively?

Create as many code-blocks as needed.

### (Bonus/Optional) Third Model

Create a third machine learning model and rerun steps (2) & (3) on this model. Which model has the best predictive capabilities? 

Create as many code-blocks as needed.