#  ML Model Evaluation for Classification

Model evaluation in classification helps us understand how well a model predicts categories or classes (e.g., spam vs. not spam, disease vs. no disease).

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_breast_cancer

Let's use a real dataset (Breast Cancer Dataset from sklearn).

In [3]:
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target  # 0 = Malignant, 1 = Benign

print(df.head())  # Display first few rows

   mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   mean compactness  mean concavity  mean concave points  mean symmetry  \
0           0.27760          0.3001              0.14710         0.2419   
1           0.07864          0.0869              0.07017         0.1812   
2           0.15990          0.1974              0.12790         0.2069   
3           0.28390          0.2414              0.10520         0.2597   
4           0.13280          0.1980              0.10430         0.1809   

   mean fractal dimension  ...  worst texture  worst perimeter  worst area  \
0             

### Split Data into Train & Test Sets

Splitting data ensures the model learns from one part (train) and is tested on unseen data (test).

In [4]:
X = df.drop(columns=['target'])  # Features
y = df['target']  # Target variable (classification labels)

# Split into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### Train a Classification Model

We use Random Forest because it handles non-linearity and prevents overfitting.

In [5]:
# Train a Random Forest Classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


### Make Predictions

The model uses learned patterns to classify new, unseen data.

In [9]:
y_pred = model.predict(X_test)  # Predictions on test data

### Evaluate Model Performance

**1. Accuracy Score**

**📌 What it means? Measures how many predictions were correct.**

- High Accuracy (close to 1) → Good Model
- Low Accuracy (close to 0.5 or below) → Needs improvement

In [10]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


Accuracy: 0.96


 **2. Confusion Matrix**

 What it means? Shows correct & incorrect classifications in a table format.

In [11]:
cm = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:\n', cm)


Confusion Matrix:
 [[40  3]
 [ 1 70]]


**3. Classification Report**

Why? Provides Precision, Recall, and F1-score for each class.

In [13]:
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114



### Interpret Model Results

- If Accuracy is High (e.g., 0.95 or 95%) → Model is good
- If Precision & Recall are balanced → Model is reliable
- If Accuracy is low → Improve the model by:

- Feature Selection (remove irrelevant features).
- Hyperparameter Tuning (optimize settings).
- Try Different Models (e.g., SVM, XGBoost).

### Conclusion

Classification model evaluation ensures predictions are reliable before using them in real applications (e.g., medical diagnosis, spam detection).