# <a id='toc1_'></a>[Model evaluation](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Model evaluation](#toc1_)    
  - [Regression metrics](#toc1_1_)    
    - [Mean Squared Error](#toc1_1_1_)    
    - [Mean Absolute Error](#toc1_1_2_)    
    - [Mean Absolute Percentage Error (MAPE)](#toc1_1_3_)    
    - [R2 score](#toc1_1_4_)    
  - [Classification metrics](#toc1_2_)    
    - [Accuracy score](#toc1_2_1_)    
    - [Confusion matrix](#toc1_2_2_)    
    - [Precision / Positive Predictive Value](#toc1_2_3_)    
    - [Recall / Sensitivity](#toc1_2_4_)    
    - [F1 score, F-beta score](#toc1_2_5_)    
    - [ROC AUC score](#toc1_2_6_)    
    - [PR AUC score](#toc1_2_7_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [51]:
from sklearn.datasets import  fetch_california_housing, load_breast_cancer
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error, mean_absolute_percentage_error, root_mean_squared_error
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score, average_precision_score, f1_score, fbeta_score
from sklearn.metrics import confusion_matrix, roc_curve, precision_recall_curve, RocCurveDisplay, PrecisionRecallDisplay

## <a id='toc1_1_'></a>[Regression metrics](#toc0_)

- **RMSE (Mean Squared Error)**: Penalizes large errors, good for when large deviations are particularly bad (e.g., stock price prediction).
- **MAE (Mean Absolute Error)**: More interpretable, but doesn’t penalize large errors as much.
- **MAPE (Mean Absolute Percentage Error)**: Useful when errors should be interpreted relative to the magnitude of predictions.
- **R² Score**: Measures how well the model explains variance in the data.

In [None]:
california = fetch_california_housing()
print(california["DESCR"])

In [None]:
df_cali = pd.DataFrame(california["data"], columns = california["feature_names"])
df_cali["median_house_value"] = california["target"]

df_cali.head()

In [22]:
features = df_cali.drop(columns = ["median_house_value","AveOccup", "Population", "AveBedrms"])
target = df_cali["median_house_value"]

In [23]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.20, random_state=0)

In [24]:
## Training a Model
regressor = RandomForestRegressor(n_estimators=100, random_state=42)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

### <a id='toc1_1_1_'></a>[(Root) Mean Squared Error](#toc0_)

MSE is useful when large errors are particularly undesirable, as it penalizes larger errors more.

In [None]:
mse = mean_squared_error(y_test, y_pred)
print(f"MSE: {mse:.4f}")

In [None]:
rmse = root_mean_squared_error(y_test, y_pred)
print(f"RMSE: {rmse:.4f}")

### <a id='toc1_1_2_'></a>[Mean Absolute Error](#toc0_)

MAE provides a more interpretable error metric and does not penalize large errors as much as MSE.

In [None]:
mae = mean_absolute_error(y_test, y_pred)
print(f"MAE: {mae:.4f}")

### <a id='toc1_1_3_'></a>[Mean Absolute Percentage Error (MAPE)](#toc0_)

MAPE is useful when errors need to be evaluated in relative percentage terms.

In [None]:
mape = mean_absolute_percentage_error(y_test, y_pred)
print(f"MAPE: {mape:.4f}")

### <a id='toc1_1_4_'></a>[R2 score](#toc0_)

R2 indicates how well the model explains variance in the data.

In [None]:
r2 = r2_score(y_test, y_pred)
print(f"R2 Score: {r2:.4f}")

## <a id='toc1_2_'></a>[Classification metrics](#toc0_)

- **Accuracy**: Good for balanced datasets but misleading for imbalanced ones.
- **Precision**: Focus when false positives are costly (e.g., spam detection).
- **Recall (Sensitivity)**: Important when false negatives are costly (e.g., cancer diagnosis).
- **F1 Score**: Balances precision and recall.
- **F-beta Score**: Balances precision and recall based on a given weight (beta), e.g. precision is 20% important and recall is 80% important.
- **ROC-AUC**: Useful for assessing overall model performance across different thresholds.
- **PR-AUC**: Useful for assessing overall model performance across different thresholds when the data is imbalanced.

In [35]:
cancer = load_breast_cancer()

In [36]:
# Extract dataset into pandas
features = pd.DataFrame(cancer['data'], columns = cancer['feature_names'])
labels = pd.Series(cancer['target'], name = 'labels')

In [None]:
# Display features & labels
display(features)
display(labels)

In [38]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(features, labels, random_state=2)

In [39]:
## Training a Model
classifier = RandomForestClassifier(n_estimators=100, random_state=42)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
y_prob = classifier.predict_proba(X_test)[:, 1]  # Probabilities for ROC-AUC

### <a id='toc1_2_1_'></a>[Accuracy score](#toc0_)

Accuracy is useful when the classes are balanced but can be misleading for imbalanced datasets.

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

### <a id='toc1_2_2_'></a>[Confusion matrix](#toc0_)

In [None]:
conf_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Negative', 'Positive'], yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

### <a id='toc1_2_3_'></a>[Precision / Positive Predictive Value](#toc0_)

Precision is important in cases where false positives are costly, such as spam detection.

In [None]:
precision = precision_score(y_test, y_pred)
print(f"Precision: {precision:.4f}")

### <a id='toc1_2_4_'></a>[Recall / Sensitivity](#toc0_)

Recall is critical when false negatives are costly, such as in medical diagnoses.

In [None]:
recall = recall_score(y_test, y_pred)
print(f"Recall: {recall:.4f}")

### <a id='toc1_2_5_'></a>[F1 score, F-beta score](#toc0_)

F1 Score balances precision and recall, making it useful when both are important.

In [None]:
f1 = f1_score(y_test, y_pred)
print(f"F1 Score: {f1:.4f}")#

F-beta score is useful when you need to balance precision and recall but want to weigh one more than the other.

In [None]:
f_beta = fbeta_score(y_test, y_pred, beta=0.6)
print(f"F-beta Score: {f_beta:.4f}")#

### <a id='toc1_2_6_'></a>[ROC AUC score](#toc0_)

ROC-AUC is useful for assessing overall model performance across different classification thresholds.

In [None]:
RocCurveDisplay.from_predictions(y_test, y_prob)

In [None]:
roc_auc = roc_auc_score(y_test, y_prob)
print(f"ROC-AUC Score: {roc_auc:.4f}")

### <a id='toc1_2_7_'></a>[PR AUC score](#toc0_)

PR-AUC is useful for assessing overall model performance across different classification thresholds for imbalanced data.

In [None]:
pr_auc = average_precision_score(y_test, y_prob)
print(f"PR-AUC Score: {pr_auc:.4f}")

In [None]:
PrecisionRecallDisplay.from_predictions(y_test, y_prob)