# How good is ML Model?

**Class Imbalance**

In [None]:
"""

Consider a model for predicting whether a bank transaction is fraudulent, where only 1% of transactions are actually fraudulent.
We could build a model that classifies every transaction as legitimate; this model would have an accuracy of 99%!

However, it does a terrible job of actually predicting fraud, so it fails at its original purpose.
The situation where one class is more frequent is called class imbalance.
Here, the class of legitimate transactions contains way more instances than the class of fraudulent transactions.
This is a common situation in practice and requires a different approach to assessing the model's performance.

"""

In [None]:
"""

1. True Positives (TP) – Correctly Detecting Fraud
------------------------------------------------------
Meaning: The model correctly predicts a transaction as fraud when it actually is fraud.


2. True Negatives (TN) – Correctly Detecting Legitimate Transactions
-----------------------------------------------------------------------
Meaning: The model correctly predicts a transaction as legitimate when it actually is legitimate.


3. False Negatives (FN) – Missing a Fraudulent Transaction
-------------------------------------------------------------
Meaning: The model predicts a transaction is legitimate when it is actually fraud (missed fraud).


4. False Positives (FP) – Wrongly Flagging a Legitimate Transaction as Fraud
--------------------------------------------------------------------------------
Meaning: The model predicts a transaction is fraud when it is actually legitimate.


"""

**Precision**

In [None]:
"""

Precision = True Positives (TP) / (True Positives + False Positives)

Or in simple words:
--->  Precision tells us how many of the transactions labeled as "fraud" are actually fraud.



Model predicts 10 transactions as fraud:

7 are actually fraud (True Positives - TP)
3 are legitimate but wrongly flagged as fraud (False Positives - FP)
Precision is calculated as:
Precision = 7/10 = 70%



Why is Precision Important?
---------------------------------
A high precision means model rarely makes mistakes when predicting fraud.



Example of High Precision (95%) → If model says a transaction is fraud, it's almost always correct.
Example of Low Precision (50%) →  model often falsely flags normal transactions as fraud, frustrating customers.



Connection to False Positives (FP)
--------------------------------------
High Precision = Fewer False Positives (Less inconvenience for customers)
Low Precision = More False Positives (Good transactions get blocked too often)



Real-Life Example of Precision Trade-Off
If a bank wants high precision, it might only flag transactions when it’s really sure they’re fraud. This means:
--->   Fewer false alarms (False Positives go down)
--->   But it might miss some actual fraud cases (False Negatives go up)

So, precision is useful when false positives are costly, like in fraud detection or medical diagnosis (where don’t want to wrongly tell someone they have a disease).


"""

**Recall**

In [None]:
"""


Recall = True Positives (TP) / (True Positives + False Negatives)


----> Recall tells us how many actual fraud cases our model correctly identified.



A bank's fraud detection system is analyzing 100 transactions, and 20 of them are actually fraud.

The model predicts some of them as fraud:

15 fraud transactions are correctly detected  (True Positives - TP)
5 fraud transactions are missed  (False Negatives - FN)

Recall is calculated as:
Recall = 15/20 = 75%


Why is Recall Important?
-----------------------------
A high recall means the model detects most actual fraud cases.

Example of High Recall (95%) → The model catches almost every fraud case.
Example of Low Recall (50%) → The model misses many fraud cases, letting criminals go undetected.



Connection to False Negatives (FN)
-----------------------------------------
High Recall = Fewer Missed Fraud Cases (Low FN)
Low Recall = More Missed Fraud Cases (High FN)


Real-Life Example of Recall Trade-Off
--------------------------------------------
If a bank wants high recall, it might flag any slightly suspicious transaction as fraud, which means:
 Almost all fraud cases get caught (Low FN)
 But many real customers may have their transactions wrongly blocked (High FP)

So, recall is useful when missing fraud cases is costly, like in fraud detection or disease diagnosis (where don’t want to miss a serious illness).


"""

**F1 Score**

In [None]:
"""

The F1 Score is a single number that combines precision and recall into one metric. It helps us balance both precision and recall,
especially when we can’t decide which is more important.

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
​

Why Use the F1 Score?
---------------------------
If precision is high but recall is low, the model is too strict and misses many fraud cases.
If recall is high but precision is low, the model flags too many false fraud cases.
F1-score finds a balance between these two



In a fraud detection system:

Model 1 (High Precision, Low Recall)

Precision = 90% (few false fraud alarms)
Recall = 50% (misses half the fraud case)
F1-score = 2 × (0.90 × 0.50) / (0.90 + 0.50) = 0.64 (or 64%)



Model 2 (High Recall, Low Precision)

Precision = 60% (more false fraud alarms)
Recall = 90% (catches most fraud cases)
F1-score = 2 × (0.60 × 0.90) / (0.60 + 0.90) = 0.72 (or 72%)



Model 3 (Balanced Precision & Recall)

Precision = 80%
Recall = 80%
F1-score = 2 × (0.80 × 0.80) / (0.80 + 0.80) = 0.80 (or 80%)

NOTE: The F1 score is highest when precision and recall are balanced.



When to Use the F1 Score?
--------------------------
If both false positives and false negatives are costly (like fraud detection, medical diagnosis, spam filtering).
When we want a balanced model instead of choosing only precision or recall.


"""

In [None]:
"""

Import confusion_matrix and classification_report.
Fit the model to the training data.
Predict the labels of the test set, storing the results as y_pred.
Compute and print the confusion matrix and classification report for the test labels versus the predicted labels

"""

# Import confusion matrix
from sklearn.metrics import classification_report, confusion_matrix

knn = KNeighborsClassifier(n_neighbors=6)

# Fit the model to the training data
knn.fit(X_train , y_train)

# Predict the labels of the test data: y_pred
y_pred = knn.predict(X_test)

# Generate the confusion matrix and classification report
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Logistic regression and the ROC curve

**ROC Curve**

In [None]:
"""



What is the ROC Curve?
-----------------------------
The ROC (Receiver Operating Characteristic) curve helps us see how well a classification model distinguishes between two classes (e.g., fraud vs. not fraud)
at different thresholds.

It shows the trade-off between the True Positive Rate (TPR) and False Positive Rate (FPR) when we change the classification threshold.




Understanding Thresholds in Classification
----------------------------------------------
Most classification models predict probabilities instead of directly predicting labels (0 or 1).

We convert probabilities into class labels using a threshold (default is usually 0.5).
If probability > threshold → Predict 1 (fraud).
If probability <= threshold → Predict 0 (not fraud).
The ROC curve shows what happens when we change this threshold.



How Threshold Affects the Model
-----------------------------------
1)  Threshold = 0 (Model predicts all 1s)

The model predicts 1 (fraud) for every case, even non-fraud cases.
True Positive Rate (TPR) = 100% (catches all fraud cases).
False Positive Rate (FPR) = 100% (marks many non-fraud cases as fraud).
Bad for real-world use (too many false alarms).



2)  Threshold = 1 (Model predicts all 0s)

The model predicts 0 (not fraud) for every case.
True Positive Rate = 0% (misses all fraud cases).
False Positive Rate = 0% (correctly classifies all non-fraud cases).
Useless model because it never catches fraud!



3) Varying the Threshold

As we adjust the threshold, the trade-off between catching fraud and avoiding false alarms changes.
Some thresholds will be better than others, depending on how much risk we can tolerate.

What the ROC Curve Shows
The X-axis = False Positive Rate (FPR) (bad predictions).
The Y-axis = True Positive Rate (TPR) (good predictions).
The curve shows different points for different thresholds.
 A good model’s ROC curve is above the diagonal line (random guessing).
 A perfect model would have a curve that goes straight up and across (TPR = 1, FPR = 0).



"""

**ROC AUC**

In [None]:
"""

What is ROC AUC?
------------------
ROC AUC (Receiver Operating Characteristic - Area Under the Curve) is a metric that quantifies how well a classification model distinguishes between two classes (e.g., fraud vs. not fraud).

AUC (Area Under Curve) measures how much the ROC curve is above the diagonal line (random guessing).

Understanding AUC Scores
-------------------------------
AUC = 1.0 (Perfect Model)
The model always correctly separates fraud and non-fraud cases.
True Positive Rate = 100%, False Positive Rate = 0%.


AUC = 0.5 (Random Guessing)
The model is no better than flipping a coin (50% chance).
The ROC curve is a straight diagonal line (TPR = FPR).


AUC < 0.5 (Worse than Random)
The model is making opposite predictions (classifying fraud as not fraud and vice versa).


AUC = 0.67%
The model is 67% good at distinguishing between fraud and non-fraud.
This means it’s only 34%( (0.67−0.50) / (1.00−0.50) = 34% ) better than a random model (50%).
It’s not great, but better than guessing.


"""

In [None]:
"""

Import LogisticRegression.
Instantiate a logistic regression model, logreg.
Fit the model to the training data.
Predict the probabilities of each individual in the test set having a diabetes diagnosis, storing the array of positive probabilities as y_pred_probs.

"""

# Import LogisticRegression
from sklearn.linear_model import LogisticRegression

# Instantiate the model
logreg = LogisticRegression()

# Fit the model
logreg.fit(X_train , y_train)

# Predict probabilities
y_pred_probs = logreg.predict_proba(X_test)[:, 1]

print(y_pred_probs[:10])

In [None]:
"""

Import roc_curve.
Calculate the ROC curve values, using y_test and y_pred_probs, and unpacking the results into fpr, tpr, and thresholds.
Plot true positive rate against false positive rate.

"""


# Import roc_curve
from sklearn.metrics import roc_curve

# Generate ROC curve values: fpr, tpr, thresholds
fpr, tpr, thresholds = roc_curve(y_test, y_pred_probs)

plt.plot([0, 1], [0, 1], 'k--')

# Plot tpr against fpr
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for Diabetes Prediction')
plt.show()

In [None]:
"""

Now you will compute the area under the ROC curve, along with the other classification metrics you have used previously.

The confusion_matrix and classification_report functions have been preloaded for you, along with the logreg model you previously built, plus X_train, X_test, y_train, y_test.
Also, the model's predicted test set labels are stored as y_pred, and probabilities of test set observations belonging to the positive class stored as y_pred_probs.

A knn model has also been created and the performance metrics printed in the console, so you can compare the roc_auc_score, confusion_matrix, and classification_report between the two models.

"""



"""

Import roc_auc_score.
Calculate and print the ROC AUC score, passing the test labels and the predicted positive class probabilities.
Calculate and print the confusion matrix.
Call classification_report().

"""


# Import roc_auc_score
from sklearn.metrics import roc_auc_score

# Calculate roc_auc_score
print(roc_auc_score(y_test, y_pred_probs))

# Calculate the confusion matrix
print(confusion_matrix(y_test, y_pred))

# Calculate the classification report
print(classification_report(y_test, y_pred))




# 0.8002483443708608
#     [[121  30]
#      [ 30  50]]
#                   precision    recall  f1-score   support

#                0       0.80      0.80      0.80       151
#                1       0.62      0.62      0.62        80

#         accuracy                           0.74       231
#        macro avg       0.71      0.71      0.71       231
#     weighted avg       0.74      0.74      0.74       231

# logistic regression performs better than the KNN model across all the metrics you calculated.
# A ROC AUC score of 0.8002 means this model is 60% better than a chance model at correctly predicting labels!


# Hyperparameter Tuning

In [None]:
"""

Import GridSearchCV.
Set up a parameter grid for "alpha", using np.linspace() to create 20 evenly spaced values ranging from 0.00001 to 1.
Call GridSearchCV(), passing lasso, the parameter grid, and setting cv equal to kf.
Fit the grid search object to the training data to perform a cross-validated grid search.

"""


# Import GridSearchCV
from sklearn.model_selection import GridSearchCV

# Set up the parameter grid
param_grid = {"alpha": np.linspace(.00001, 1, 20)}

# Instantiate lasso_cv
lasso_cv = GridSearchCV(lasso, param_grid, cv=kf)

# Fit to the training data
lasso_cv.fit(X_train , y_train)
print("Tuned lasso paramaters: {}".format(lasso_cv.best_params_))
print("Tuned lasso score: {}".format(lasso_cv.best_score_))

In [None]:
"""

Create params, adding "l1" and "l2" as penalty values, setting C to a range of 50 float values between 0.1 and 1.0, and class_weight to either "balanced" or a dictionary containing 0:0.8, 1:0.2.
Create the Randomized Search CV object, passing the model and the parameters, and setting cv equal to kf.
Fit logreg_cv to the training data.
Print the model's best parameters and accuracy score

"""



# Create the parameter space
params = {"penalty": ["l1", "l2"],
         "tol": np.linspace(0.0001, 1.0, 50),
         "C": np.linspace(.1, 1, 50),
         "class_weight": ["balanced", {0:0.8, 1:0.2}]}

# Instantiate the RandomizedSearchCV object
logreg_cv = RandomizedSearchCV(logreg, params, cv=kf)

# Fit the data to the model
logreg_cv.fit(X_train, y_train)

# Print the tuned parameters and score
print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_))
print("Tuned Logistic Regression Best Accuracy Score: {}".format(logreg_cv.best_score_))