# 🚢 Titanic Survival Prediction Project
Predict survival on the Titanic using Python, Pandas, and Logistic Regression.
Dataset Source: [Kaggle Titanic Dataset](https://www.kaggle.com/competitions/titanic/data)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load dataset
df = pd.read_csv('train.csv')
df.head()

## 🔍 Step 1: Understand the Data

In [None]:
df.info()
df.describe()
df.isnull().sum()

## 🧹 Step 2: Clean the Data

In [None]:
# Fill missing Age values with median
df['Age'].fillna(df['Age'].median(), inplace=True)

# Fill missing Embarked values with mode
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

# Drop Cabin column due to too many missing values
df.drop(columns=['Cabin'], inplace=True)

# Drop irrelevant columns
df.drop(columns=['PassengerId', 'Name', 'Ticket'], inplace=True)

# Confirm no missing values
df.isnull().sum()

## 📊 Step 3: Exploratory Data Analysis (EDA)

In [None]:
# Survival Count
sns.countplot(x='Survived', data=df)
plt.title('Survival Count')
plt.show()

# Survival by Sex
sns.countplot(x='Survived', hue='Sex', data=df)
plt.title('Survival by Sex')
plt.show()

# Survival by Pclass
sns.countplot(x='Survived', hue='Pclass', data=df)
plt.title('Survival by Passenger Class')
plt.show()

## 🛠️ Step 4: Feature Engineering

In [None]:
# Convert categorical to numerical
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df['Embarked'] = df['Embarked'].map({'S': 0, 'C': 1, 'Q': 2})

# Define features and label
X = df.drop('Survived', axis=1)
y = df['Survived']

## 🤖 Step 5: Model Building

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluation
print('Accuracy:', accuracy_score(y_test, y_pred))
print('\nConfusion Matrix:\n', confusion_matrix(y_test, y_pred))
print('\nClassification Report:\n', classification_report(y_test, y_pred))

## ✅ Conclusion
- You built a full ML pipeline using the Titanic dataset
- Performed data cleaning, visualization, and logistic regression
- Achieved basic classification with accuracy, confusion matrix and report

Try improving this by testing other models or adding more features like title extraction from names.