# Titanic Survival Prediction

---

## Author Information

- **Author**: Rahul Kumar
- **Batch**: March - April
- **Domain**: Data Science

---

## Task Information

- **Task**: TITANIC SURVIVAL PREDICTION
- **Description**: Use the Titanic dataset to build a model that predicts whether a passenger on the Titanic survived or not.

---

## Introduction

The sinking of the RMS Titanic is one of the most infamous tragedies in maritime history. This project aims to delve into the Titanic dataset and develop a predictive model that can determine the likelihood of a passenger surviving the disaster based on various features such as age, gender, ticket class, and more.

---

In [None]:
## Data Exploration

### Loading and Overview of the Dataset


import pandas as pd

# Load the Titanic dataset
titanic_data = pd.read_csv("tested.csv")

# Display the first few rows of the dataset
titanic_data.head()

# Data Summary and Missing Values


In [None]:
# Summary of the dataset
titanic_data.info()

# Check for missing values
missing_values = titanic_data.isnull().sum()
missing_percentage = (missing_values / len(titanic_data)) * 100
missing_info = pd.DataFrame({'Missing Values': missing_values, 'Percentage': missing_percentage})
print(missing_info)


# Data Visualization


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Count of survival
plt.figure(figsize=(8, 6))
sns.countplot(data=titanic_data, x='Survived')
plt.title('Survival Count')
plt.show()


# Data Preprocessing
# Handling Missing Values

In [None]:
# Impute missing values for 'Age' and 'Embarked'
titanic_data['Age'] = titanic_data['Age'].fillna(titanic_data['Age'].median())
titanic_data['Embarked'] = titanic_data['Embarked'].fillna(titanic_data['Embarked'].mode()[0])

# Drop unnecessary columns
titanic_data.drop(columns="Cabin", inplace=True)


# Feature Engineering
# Encoding Categorical Variables

In [None]:
from sklearn.preprocessing import LabelEncoder

# Encode 'Sex' feature
labelencoder = LabelEncoder()
titanic_data['Sex'] = labelencoder.fit_transform(titanic_data['Sex'])


 # Feature Selection


In [None]:
# Drop irrelevant columns
titanic_data.drop(columns=["PassengerId", "Name", "SibSp", "Parch", "Ticket", "Fare"], inplace=True)


# Modeling
# Splitting Data

In [None]:
from sklearn.model_selection import train_test_split

# Splitting data into features and target variable
X = titanic_data.drop(columns=['Survived'])
y = titanic_data['Survived']

# Splitting data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Model Training and Evaluation

---


# Predicting Passenger Survival

## Introduction

In this section, we will predict whether a passenger survived or not using a logistic regression model. We'll handle warnings, create the model, fit it with the training data, and make predictions for a given set of features.

---


In [None]:
# Importing necessary libraries
import warnings
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# Ignore warnings to keep the output clean
warnings.filterwarnings("ignore")

# Define columns to be one-hot encoded
categorical_cols = ['Embarked']

# Create a column transformer
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), categorical_cols)], remainder='passthrough')

# Fit the transformer and transform the training and test data
X_train_encoded = ct.fit_transform(X_train)
X_test_encoded = ct.transform(X_test)

# Initialize and train logistic regression model
log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train_encoded, y_train)

# Make predictions
y_pred = log_reg.predict(X_test_encoded)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n {conf_matrix}")

# Predict the survival result for a given set of features
predicted_result = log_reg.predict(X_test_encoded[:1])  # Considering the first row of the encoded test data

# Display the prediction
if predicted_result == 0:
    print("Sorry, the passenger did not survive.")
else:
    print("The passenger survived.")


# Saving Report

In [None]:
final = pd.DataFrame()
final["Sex"]= X['Sex']
final["survived"]=y_final

final.to_csv("FinalReport.csv",index=False)

