Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions Machine Learning/Titanic_Survival_Prediction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Titanic Survival Prediction

This project aims to predict the survival of passengers aboard the Titanic using a Logistic Regression model. The model is trained on a dataset of passenger information and can predict whether a passenger would survive based on user-provided input features.

## Project Structure

- `train.csv`: The dataset containing information about Titanic passengers.
- `titanic_survival_prediction.py`: The main Python script that preprocesses the data, trains the model, and predicts survival based on user input.

## Requirements

- Python 3.x
- numpy
- pandas
- scikit-learn

## Setup

1. Ensure you have Python 3.x installed on your system.
2. Install the necessary Python packages using pip:
```sh
pip install numpy pandas scikit-learn
```

## Usage

1. Place the `train.csv` file in the same directory as `titanic_survival_prediction.py`.
2. Run the `titanic_survival_prediction.py` script:
```sh
python titanic_survival_prediction.py
```
3. Follow the prompts to enter passenger details:
- Passenger class (1st, 2nd, or 3rd)
- Gender (Male/Female)
- Age
- Number of siblings or spouses aboard
- Number of parents or children aboard
- Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
- Fare

4. The model will predict whether the passenger would survive or not and display the model accuracy.

## Script Details

### Data Preprocessing

The following preprocessing steps are applied to the dataset:
- Drop the `Cabin` column due to a large number of missing values.
- Fill missing `Age` values with the mean age.
- Fill missing `Embarked` values with the mode.
- Fill missing `Fare` values with the mean fare.
- Convert categorical variables `Sex` and `Embarked` to numerical values.

### Model Training

- The features are defined by dropping irrelevant columns (`PassengerId`, `Name`, `Ticket`, `Survived`).
- The target variable is `Survived`.
- The dataset is split into training and testing sets (80-20 split).
- A Logistic Regression model is trained on the training set.

### Prediction Function

- Prompts the user for passenger details.
- Converts user input into a format suitable for the model.
- Predicts survival based on user input.
- Displays whether the passenger is predicted to survive or not.
- Prints the model's accuracy on the test set.

Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the dataset
dataset = pd.read_csv('train.csv')

# Data preprocessing
dataset = dataset.drop(columns='Cabin', axis=1)
dataset['Age'].fillna(dataset['Age'].mean(), inplace=True)
dataset['Embarked'].fillna(dataset['Embarked'].mode()[0], inplace=True)
dataset['Fare'].fillna(dataset['Fare'].mean(), inplace=True) # Add this line to handle missing Fare values
dataset.replace({'Sex': {'male': 0, 'female': 1}, 'Embarked': {'S': 0, 'C': 1, 'Q': 2}}, inplace=True)

# Define features (X) and target (y)
X = dataset.drop(columns=['PassengerId', 'Name', 'Ticket', 'Survived'], axis=1)
y = dataset['Survived']

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Function to get user input and make predictions
def predict_survival():
user_input = {}
user_input['Pclass'] = int(input("Enter passenger class (1st, 2nd, or 3rd): "))
user_input['Sex'] = 1 if input("Enter passenger gender (Male/Female): ").lower() == 'female' else 0
user_input['Age'] = float(input("Enter passenger age: "))
user_input['SibSp'] = int(input("Enter number of siblings or spouses aboard: "))
user_input['Parch'] = int(input("Enter number of parents or children aboard: "))
user_input['Embarked'] = input("Enter port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton): ")
user_input['Embarked'] = {'C': 1, 'Q': 2, 'S': 3}.get(user_input['Embarked'].upper(), 3)
user_input['Fare'] = float(input("Enter passenger fare: "))

user_df = pd.DataFrame([user_input], columns=X.columns)


prediction = model.predict(user_df)


if prediction[0] == 1:
print("The passenger is predicted to survive.")
else:
print("The passenger is predicted not to survive.")


y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

predict_survival()

Loading