|                         | Predicted Positive (1) | Predicted Negative (0) |
| ----------------------- | ---------------------- | ---------------------- |
| **Actual Positive (1)** | True Positive (TP)     | False Negative (FN)    |
| **Actual Negative (0)** | False Positive (FP)    | True Negative (TN)     |


### 1. Accuracy

- Formula:
- Accuracy= TP+TN / TP+TN+FP+FN
- Meaning: Percentage of correctly predicted passengers (both survived & not survived).
- Limitation: If data is imbalanced (e.g., most didn’t survive), accuracy can be misleading.

### 2. Precision (Positive Predictive Value)

- Formula:
- Precision= TP / TP+FP
- Meaning: Out of all passengers the model predicted as survived, how many actually survived?
- Useful when false positives are costly.

### 3. Recall (Sensitivity, True Positive Rate)

- Formula:
- Recall= TP / TP+FN
- Meaning: Out of all passengers who actually survived, how many did the model correctly identify?
- Useful when missing positives (FN) is costly.

### 4. F1-Score

- Formula:
- F1= 2× Precision×Recall / Precision+Recall
- Meaning: Balance between precision & recall.
- Good when you want a trade-off (not too many false positives or false negatives).

### 5. ROC-AUC (Receiver Operating Characteristic - Area Under Curve)

- ROC Curve plots True Positive Rate (Recall) vs False Positive Rate (FP / (FP+TN)).
- AUC (Area Under Curve) = probability that the model ranks a random positive higher than a random negative.
- Range: 0.5 (random guessing) → 1.0 (perfect model).
- Useful for comparing models, independent of threshold.

## What you should remember

- The intuition (what each metric tells you):

    - Accuracy = overall correctness
    - Precision = how many predicted positives are actually positive
    - Recall = how many actual positives were captured
    - F1 = balance between precision & recall
    - ROC-AUC = model’s ranking ability, independent of threshold

Basic fraction form of precision & recall:

    Precision → TP / (TP + FP)
    Recall → TP / (TP + FN)

<strong> These two are worth remembering because they come up a lot in interviews. </strong>

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

In [2]:
df = pd.read_csv("data/titanic.csv")
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  418 non-null    int64  
 1   Survived     418 non-null    int64  
 2   Pclass       418 non-null    int64  
 3   Name         418 non-null    object 
 4   Sex          418 non-null    object 
 5   Age          332 non-null    float64
 6   SibSp        418 non-null    int64  
 7   Parch        418 non-null    int64  
 8   Ticket       418 non-null    object 
 9   Fare         417 non-null    float64
 10  Cabin        91 non-null     object 
 11  Embarked     418 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 39.3+ KB
None


In [3]:
# Preprocessing (simple version: keep numeric + encode Sex)
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = df.dropna(subset=['Age', 'Fare', 'Sex', 'Pclass'])  # drop rows with missing values

In [4]:
X = df[['Pclass', 'Sex', 'Age', 'Fare']]  # features
y = df['Survived']  # target

In [10]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=41)

In [11]:
# Train logistic regression
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

In [12]:
# Predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]  # needed for ROC-AUC

In [13]:
# Function to calculate all metrics
def evaluate_classification(y_true, y_pred, y_prob):
    metrics = {
        "Accuracy": accuracy_score(y_true, y_pred),
        "Precision": precision_score(y_true, y_pred),
        "Recall": recall_score(y_true, y_pred),
        "F1-Score": f1_score(y_true, y_pred),
        "ROC-AUC": roc_auc_score(y_true, y_prob)
    }
    return metrics

# Run evaluation
results = evaluate_classification(y_test, y_pred, y_prob)
print(results)

{'Accuracy': 1.0, 'Precision': 1.0, 'Recall': 1.0, 'F1-Score': 1.0, 'ROC-AUC': np.float64(1.0)}


## What it Means

- Accuracy = 1.0 → The model predicted every passenger’s survival status correctly.
- Precision = 1.0 → Of all the people predicted to survive, 100% actually survived.
- Recall = 1.0 → Of all the actual survivors, 100% were correctly predicted.
- F1 = 1.0 → Perfect balance of precision & recall.
- ROC-AUC = 1.0 → The model perfectly separates survivors from non-survivors at all thresholds.

<strong>In short: your model is too perfect 🎯.</strong>

## ⚠️ Why This Is Suspicious

- Getting all 1.0 metrics is very rare in real life. A few possible reasons:

- Data leakage 🔥
    - If the model accidentally has access to the answer (target info) inside the features.
    - Example: if you mistakenly included Survived column in X.
    - Or if features strongly correlate with survival (e.g., "Cabin" had missing values mostly for non-survivors).

- Too simple dataset split
    - If your train/test split didn’t really shuffle data well, you might end up testing on rows that are “too easy”.

- Overfitting
    - If your model just memorized the dataset (can happen if features are too strong or train/test split is not done properly).

In [14]:
print(X_train.shape, X_test.shape)
print(y_train.shape, y_test.shape)

(264, 4) (67, 4)
(264,) (67,)
