   Explanation of the Code:
1. **Data Loading**: Load the Titanic dataset.
2. **Data Preprocessing**:
   - Fill missing values in the `Age` column with the median age.
   - Fill missing values in the `Embarked` column with the most common port 'S'.
   - Fill missing values in the `Fare` column with the median fare.
   - Drop the `Cabin` column due to too many missing values.
   - Encode categorical variables (`Sex` and `Embarked`) using label encoding.
3. **Feature Engineering**:
   - Extract the title from the `Name` column and map rare titles to 'Rare'.
   - Encode the `Title` feature using label encoding.
   - Drop unnecessary columns (`Name`, `Ticket`, `PassengerId`).
4. **Model Building**:
   - Define features (`X`) and target variable (`y`).
   - Split the data into training and testing sets.
   - Train a RandomForestClassifier model on the training set.
5. **Model Evaluation**:
   - Make predictions on the test set.
   - Evaluate the model's performance using accuracy, confusion matrix, and classification report.


In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the dataset
file_path = 'Titanic-Dataset.csv'
titanic_df = pd.read_csv(file_path)

# Data Preprocessing
# Fill missing Age values with the median age
imputer = SimpleImputer(strategy='median')
titanic_df['Age'] = imputer.fit_transform(titanic_df[['Age']])

# Fill missing Embarked values with the most common port 'S'
titanic_df['Embarked'].fillna('S', inplace=True)

# Fill missing Fare values with the median fare
titanic_df['Fare'] = imputer.fit_transform(titanic_df[['Fare']])

# Drop the Cabin column due to too many missing values
titanic_df.drop(columns=['Cabin'], inplace=True)

# Encode categorical variables
label_encoder = LabelEncoder()
titanic_df['Sex'] = label_encoder.fit_transform(titanic_df['Sex'])
titanic_df['Embarked'] = label_encoder.fit_transform(titanic_df['Embarked'])

# Feature Engineering: Create new features
# Extract Title from Name
titanic_df['Title'] = titanic_df['Name'].apply(lambda name: name.split(',')[1].split('.')[0].strip())
# Map rare titles to 'Rare'
title_counts = titanic_df['Title'].value_counts()
rare_titles = title_counts[title_counts < 10].index
titanic_df['Title'] = titanic_df['Title'].apply(lambda title: 'Rare' if title in rare_titles else title)
titanic_df['Title'] = label_encoder.fit_transform(titanic_df['Title'])

# Drop unnecessary columns
titanic_df.drop(columns=['Name', 'Ticket', 'PassengerId'], inplace=True)

# Define features and target variable
X = titanic_df.drop(columns=['Survived'])
y = titanic_df['Survived']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(class_report)


Accuracy: 0.8324022346368715
Confusion Matrix:
[[91 14]
 [16 58]]
Classification Report:
              precision    recall  f1-score   support

           0       0.85      0.87      0.86       105
           1       0.81      0.78      0.79        74

    accuracy                           0.83       179
   macro avg       0.83      0.83      0.83       179
weighted avg       0.83      0.83      0.83       179

