 Scenario Question: Predicting Titanic Survival
 Researchers are studying the Titanic disaster and want to build models that predict whether a
  passenger would survive or not survive based on their information.
 - Features used:
 - Passenger class (pclass)
 - Gender (sex)
 - Age (age)
 - Number of siblings/spouses aboard (sibsp)
 - Number of parents/children aboard (parch)
 - Ticket fare (fare)
 - Label:
 - 1 = Survived
 - 0 = Died
 The researchers train three different models:
 - Logistic Regression
 - K-Nearest Neighbors (KNN) with k=5
 - Decision Tree with max depth = 4
 They then evaluate each model using a classification report (precision, recall, F1-score, accuracy).

In [3]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, accuracy_score

# Load dataset
df = sns.load_dataset('titanic')

# Select required columns
df = df[['pclass','sex','age','sibsp','parch','fare','survived']]

# Remove missing values
df = df.dropna()

# Convert categorical column
df = pd.get_dummies(df, drop_first=True)

# Separate X and y
X = df.drop('survived', axis=1)
y = df['survived']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Scaling (for LR & KNN)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression
lr = LogisticRegression()
lr.fit(X_train_scaled, y_train)
lr_pred = lr.predict(X_test_scaled)

# KNN
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)
knn_pred = knn.predict(X_test_scaled)

# Decision Tree
dt = DecisionTreeClassifier(max_depth=4, random_state=42)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)

# Create comparison table
results = pd.DataFrame({
    "Model": ["Logistic Regression", "KNN (k=5)", "Decision Tree (depth=4)"],
    "Accuracy": [
        accuracy_score(y_test, lr_pred),
        accuracy_score(y_test, knn_pred),
        accuracy_score(y_test, dt_pred)
    ],
    "Precision (Survived)": [
        classification_report(y_test, lr_pred, output_dict=True)['1']['precision'],
        classification_report(y_test, knn_pred, output_dict=True)['1']['precision'],
        classification_report(y_test, dt_pred, output_dict=True)['1']['precision']
    ],
    "Recall (Survived)": [
        classification_report(y_test, lr_pred, output_dict=True)['1']['recall'],
        classification_report(y_test, knn_pred, output_dict=True)['1']['recall'],
        classification_report(y_test, dt_pred, output_dict=True)['1']['recall']
    ],
    "F1-Score (Survived)": [
        classification_report(y_test, lr_pred, output_dict=True)['1']['f1-score'],
        classification_report(y_test, knn_pred, output_dict=True)['1']['f1-score'],
        classification_report(y_test, dt_pred, output_dict=True)['1']['f1-score']
    ]
})

results

Unnamed: 0,Model,Accuracy,Precision (Survived),Recall (Survived),F1-Score (Survived)
0,Logistic Regression,0.748252,0.692308,0.642857,0.666667
1,KNN (k=5),0.762238,0.683333,0.732143,0.706897
2,Decision Tree (depth=4),0.755245,0.684211,0.696429,0.690265
