# Titanic – Machine Learning Model

This notebook loads the cleaned Titanic dataset, builds a simple machine learning model, and evaluates its performance.

## 1. Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay, roc_auc_score, roc_curve

sns.set(style='whitegrid')

## 2. Load Cleaned Dataset

In [None]:
# Adjust the path if needed
# If the file is in the same folder:
df = pd.read_csv('titanic_cleaned.csv')
df.head()

## 3. Feature Selection

In [None]:
X = df.drop('Survived', axis=1)
y = df['Survived']

## 4. Train/Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 5. Train Logistic Regression Model

In [None]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

## 6. Evaluate Model

In [None]:
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

## 7. Confusion Matrix

In [None]:
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)
plt.title('Confusion Matrix')
plt.show()

## 8. ROC Curve & AUC

In [None]:
y_probs = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_probs)
auc_score = roc_auc_score(y_test, y_probs)

plt.figure(figsize=(8,6))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {auc_score:.2f})')
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid()
plt.show()

## 9. Conclusion

We trained a logistic regression model to predict survival on the Titanic dataset. Evaluation metrics and visualizations give us insight into the model's performance. In future steps, we could explore more advanced models such as Random Forests or XGBoost.