### Core Concept

### Model Evaluation : Evaluation Metrics and Cross-Validation Methods

### Introduction
- Model Evaluation is critical step in machine learning pipeline to assess the performace of a model and ensure
- Choosing the right evaluation metric and cross-validation method depends on type of problem being addressed

### Agenda 
-Exploring common evaluation metrics
-Cross-validation techniques
-Real time scenarios to determine the best practices for model evaluation 

### Evalution Metrics 


### 1. Classification Metrics 
- Accuracy 
    -Definition : Ratio of correctly predicted instances to the total instances
    - When to use: When the dataset is balanced [Equal number of each class] -> Balanced (unbiased) which is equal number of samples Vs. Biased Dataset 

In [4]:
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

In [10]:
# Generating Synthetic Data
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

In [6]:
X

array([[ 0.96479937, -0.06644898,  0.98676805, ..., -1.2101605 ,
        -0.62807677,  1.22727382],
       [-0.91651053, -0.56639459, -1.00861409, ..., -0.98453405,
         0.36389642,  0.20947008],
       [-0.10948373, -0.43277388, -0.4576493 , ..., -0.2463834 ,
        -1.05814521, -0.29737608],
       ...,
       [ 1.67463306,  1.75493307,  1.58615382, ...,  0.69272276,
        -1.50384972,  0.22526412],
       [-0.77860873, -0.83568901, -0.19484228, ..., -0.49735437,
         2.47213818,  0.86718741],
       [ 0.24845351, -1.0034389 ,  0.36046013, ...,  0.77323999,
         0.1857344 ,  1.41641179]])

In [7]:
y

array([0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
       1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
       0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,
       0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1,
       0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1,
       0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0,
       0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,

In [9]:
# Dividing dataset into Training set (X_train, y_train) and Testing set (X_test, y_test)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=42)

In [11]:
# Model training 
# Step-1: Create Classifier Instance
model = RandomForestClassifier(random_state=42)

# Step 2: Fit the model using X_train, y_train
model.fit(X_train, y_train)

#Step 3: Predict the output of the model using X_test and store into y_pred
y_pred = model.predict(X_test)

In [13]:
# Calculate Accuracy using Accuracy Metrics
accuracy = accuracy_score(y_test, y_pred)
#accuracy
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.88


In [None]:
Actual Positives - positives in y_test
True Positives - model has predicted positive and its positive in y_test
Predicted Positives - predicted by model

TP, FP, TN, FN

### Precision -> TP/TP +FP
- Precision: Focuses on the Proportion of True Positives among all Predicted Positives

In [16]:
from sklearn.metrics import precision_score
#Calculate Precision Score
precision = precision_score(y_test, y_pred)
#precision
print(f"Precision: {precision:.2f}")

Precision: 0.91


### Recall -> TP/TP + FP
- Definition: Focuses on proportion of True positives captured among all actual Positives

In [18]:
from sklearn.metrics import recall_score
# Calculate Recall Score 
recall = recall_score(y_test, y_pred)
#precision
print(f"Recall: {recall:.2f}")

Recall: 0.86


### F1-Score -> TP/TP + FP
- Definition: Harmonic Mean of Precision and Recall

In [20]:
from sklearn.metrics import f1_score
f1 = f1_score(y_test, y_pred)
#f1
print(f"f1: {f1:.2f}")

f1: 0.89


### Confusion Matrix

![image.png](attachment:8fbe8b7b-8f77-4239-96a0-eb2f01fd6831.png)

Accuracy -> The ratio of correctly predicted instances to the total instances
Accuracy = TP+TN/TP+FP+TN+FN 

Precision -> Focuses on the Proportion of True Postives among all Predicted Positives
Precision = TP/TP+FP

Recall -> Focuses on the Proportion of True Positives captured among all actual positives
Recall = TP/TP+FP

F1 Score = 2 * (Precision * Recall/Precision + Recall)

### ROC -AUC
- Definition : Measures the trade- off between True Positive Rate (TPR) and False Positive Rate (FPR)

In [21]:
from sklearn.metrics import roc_auc_score

In [25]:
# Probability prediction
y_prob = model.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_prob)
print(f"ROC_AUC: {roc_auc:2f}")

ROC_AUC: 0.951969


### Regression Metrics

### Mean Absolute Error (MAE) 
- Measures the average magnitude of errors without considering their direction.


In [28]:
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

In [29]:
# Synthetic Data
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=42)

In [31]:
# Model Training
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

In [32]:
mae = mean_absolute_error(y_test, y_pred)

In [34]:
#mae
print (f"Mean Absolute Error: {mae:.2f}")

Mean Absolute Error: 0.30


In [35]:
from sklearn.metrics import mean_squared_error
import numpy as np

# MSE AND RMSE
mse = mean_squared_error (y_test, y_pred)
rmse = np.sqrt(mse)

print (f"Mean Squared Error: {mse:.2f}")
print (f"Root Mean Squared Error: {mae:.2f}")

Mean Squared Error: 0.14
Root Mean Squared Error: 0.30


In [36]:
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print (f"r2: {r2:.2f}")

r2: 0.45


### Cross-Validation Method

## K-Fold Cross Validation

In [37]:
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Synthetic Data 
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=42)

# Model training 
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate Accuracy using Accuracy Metrics
accuracy = accuracy_score(y_test, y_pred)
#accuracy
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.88


In [38]:
from sklearn.model_selection import cross_val_score

# K-Fold Cross-Validation
scores = cross_val_score(model, X, y, cv=5, scoring = 'accuracy')
print (f"K-Fold Accuracy Scores: {scores}")
print (f"Mean Accuracy: {scores.mean():.2f}")

K-Fold Accuracy Scores: [0.915 0.895 0.915 0.885 0.91 ]
Mean Accuracy: 0.90
