# Titanic Survival Prediction

This notebook uses the Titanic dataset to predict passenger survival using machine learning.

---

## 1. Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

ModuleNotFoundError: No module named 'sklearn'

## 2. Load Dataset

Replace `'train.csv'` with your actual Titanic dataset filename if different.

In [None]:
# Load the Titanic dataset
df = pd.read_csv('Titanic-Dataset.csv')
df.head()

## 3. Exploratory Data Analysis (EDA)

In [None]:
df.info()
df.describe()
df.isnull().sum()

In [None]:
# Visualize survival count
sns.countplot(x='Survived', data=df)
plt.title('Survival Count')
plt.show()

## 4. Data Preprocessing
- Handle missing values
- Encode categorical variables
- Feature selection

In [None]:
# Fill missing Age with median, Embarked with mode
df['Age'].fillna(df['Age'].median(), inplace=True)
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
df.drop(['Cabin', 'Ticket', 'Name', 'PassengerId'], axis=1, inplace=True)

# Encode categorical variables
df = pd.get_dummies(df, columns=['Sex', 'Embarked'], drop_first=True)
df.head()

## 5. Model Training

In [None]:
# Split data
X = df.drop('Survived', axis=1)
y = df['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)

# Evaluate
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.show()

## 6. Feature Importance

In [None]:
# Plot feature importances
feat_importances = pd.Series(clf.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
plt.title('Feature Importances')
plt.show()

## 7. Next Steps
- Try other models (Logistic Regression, SVM, etc.)
- Tune hyperparameters
- Engineer new features
- Analyze misclassifications

---

**Upload your Titanic dataset CSV (e.g., `train.csv`) to this folder before running the notebook.**