<a href="https://colab.research.google.com/github/dvtran63/ai-learning-notebooks/blob/main_b1/titanic_evaluation_crossval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚢 Titanic Dataset: Model Evaluation & Cross-Validation

In this notebook, you'll:
- Load and clean the Titanic dataset
- Train models (Decision Tree, KNN)
- Evaluate them using cross-validation and classification metrics

## 📥 Load Titanic Dataset

In [2]:
import seaborn as sns
import pandas as pd

df = sns.load_dataset('titanic')
df = df.dropna(subset=['age', 'embarked', 'fare', 'sex'])

df['sex'] = df['sex'].map({'male': 0, 'female': 1})
df['embarked'] = df['embarked'].map({'S': 0, 'C': 1, 'Q': 2})

features = ['pclass', 'sex', 'age', 'fare', 'embarked']
X = df[features]
y = df['survived']

## 🔍 Train/Test Split and Scaling

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## 🌳 Decision Tree: Evaluation

In [4]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix

tree = DecisionTreeClassifier(max_depth=4, random_state=42)
tree.fit(X_train, y_train)
y_pred_tree = tree.predict(X_test)

print("Decision Tree Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_tree))
print("\nClassification Report:")
print(classification_report(y_test, y_pred_tree))

Decision Tree Confusion Matrix:
[[86 13]
 [42 37]]

Classification Report:
              precision    recall  f1-score   support

           0       0.67      0.87      0.76        99
           1       0.74      0.47      0.57        79

    accuracy                           0.69       178
   macro avg       0.71      0.67      0.67       178
weighted avg       0.70      0.69      0.68       178



## 🌀 KNN: Evaluation

In [5]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)

print("KNN Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_knn))
print("\nClassification Report:")
print(classification_report(y_test, y_pred_knn))

KNN Confusion Matrix:
[[86 13]
 [27 52]]

Classification Report:
              precision    recall  f1-score   support

           0       0.76      0.87      0.81        99
           1       0.80      0.66      0.72        79

    accuracy                           0.78       178
   macro avg       0.78      0.76      0.77       178
weighted avg       0.78      0.78      0.77       178



## 🔁 Cross-Validation on Entire Dataset

In [13]:
from sklearn.model_selection import cross_val_score

tree_cv_scores = cross_val_score(tree, X, y, cv=5, scoring='accuracy')
knn_cv_scores = cross_val_score(knn, X, y, cv=5, scoring='accuracy')

print("Tree CV scores:", tree_cv_scores)
print("Tree CV accuracy:", tree_cv_scores.mean())
print("\nKNN CV scores:", knn_cv_scores)
print("KNN CV accuracy:", knn_cv_scores.mean())

Tree CV scores: [0.74825175 0.79020979 0.80985915 0.78169014 0.81690141]
Tree CV accuracy: 0.789382448537378

KNN CV scores: [0.58741259 0.66433566 0.66901408 0.64788732 0.72535211]
KNN CV accuracy: 0.6588003545750024
