PrajaktaSathe · harshaparida · May 25, 2024
diff --git a/Machine Learning/Titanic_Survival_Prediction/README.md b/Machine Learning/Titanic_Survival_Prediction/README.md
@@ -0,0 +1,68 @@
+# Titanic Survival Prediction
+
+This project aims to predict the survival of passengers aboard the Titanic using a Logistic Regression model. The model is trained on a dataset of passenger information and can predict whether a passenger would survive based on user-provided input features.
+
+## Project Structure
+
+- `train.csv`: The dataset containing information about Titanic passengers.
+- `titanic_survival_prediction.py`: The main Python script that preprocesses the data, trains the model, and predicts survival based on user input.
+
+## Requirements
+
+- Python 3.x
+- numpy
+- pandas
+- scikit-learn
+
+## Setup
+
+1. Ensure you have Python 3.x installed on your system.
+2. Install the necessary Python packages using pip:
+    ```sh
+    pip install numpy pandas scikit-learn
+    ```
+
+## Usage
+
+1. Place the `train.csv` file in the same directory as `titanic_survival_prediction.py`.
+2. Run the `titanic_survival_prediction.py` script:
+    ```sh
+    python titanic_survival_prediction.py
+    ```
+3. Follow the prompts to enter passenger details:
+    - Passenger class (1st, 2nd, or 3rd)
+    - Gender (Male/Female)
+    - Age
+    - Number of siblings or spouses aboard
+    - Number of parents or children aboard
+    - Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
+    - Fare
+
+4. The model will predict whether the passenger would survive or not and display the model accuracy.
+
+## Script Details
+
+### Data Preprocessing
+
+The following preprocessing steps are applied to the dataset:
+- Drop the `Cabin` column due to a large number of missing values.
+- Fill missing `Age` values with the mean age.
+- Fill missing `Embarked` values with the mode.
+- Fill missing `Fare` values with the mean fare.
+- Convert categorical variables `Sex` and `Embarked` to numerical values.
+
+### Model Training
+
+- The features are defined by dropping irrelevant columns (`PassengerId`, `Name`, `Ticket`, `Survived`).
+- The target variable is `Survived`.
+- The dataset is split into training and testing sets (80-20 split).
+- A Logistic Regression model is trained on the training set.
+
+### Prediction Function
+
+- Prompts the user for passenger details.
+- Converts user input into a format suitable for the model.
+- Predicts survival based on user input.
+- Displays whether the passenger is predicted to survive or not.
+- Prints the model's accuracy on the test set.
+
diff --git a/Machine Learning/Titanic_Survival_Prediction/Titanic_Survival_Prediction.py b/Machine Learning/Titanic_Survival_Prediction/Titanic_Survival_Prediction.py
@@ -0,0 +1,58 @@
+# Import necessary libraries
+import numpy as np
+import pandas as pd
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import accuracy_score
+
+# Load the dataset
+dataset = pd.read_csv('train.csv')
+
+# Data preprocessing
+dataset = dataset.drop(columns='Cabin', axis=1)
+dataset['Age'].fillna(dataset['Age'].mean(), inplace=True)
+dataset['Embarked'].fillna(dataset['Embarked'].mode()[0], inplace=True)
+dataset['Fare'].fillna(dataset['Fare'].mean(), inplace=True)  # Add this line to handle missing Fare values
+dataset.replace({'Sex': {'male': 0, 'female': 1}, 'Embarked': {'S': 0, 'C': 1, 'Q': 2}}, inplace=True)
+
+# Define features (X) and target (y)
+X = dataset.drop(columns=['PassengerId', 'Name', 'Ticket', 'Survived'], axis=1)
+y = dataset['Survived']
+
+# Splitting the dataset into training and testing sets
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+
+# Train the model
+model = LogisticRegression()
+model.fit(X_train, y_train)
+
+# Function to get user input and make predictions
+def predict_survival():
+    user_input = {}
+    user_input['Pclass'] = int(input("Enter passenger class (1st, 2nd, or 3rd): "))
+    user_input['Sex'] = 1 if input("Enter passenger gender (Male/Female): ").lower() == 'female' else 0
+    user_input['Age'] = float(input("Enter passenger age: "))
+    user_input['SibSp'] = int(input("Enter number of siblings or spouses aboard: "))
+    user_input['Parch'] = int(input("Enter number of parents or children aboard: "))
+    user_input['Embarked'] = input("Enter port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton): ")
+    user_input['Embarked'] = {'C': 1, 'Q': 2, 'S': 3}.get(user_input['Embarked'].upper(), 3)
+    user_input['Fare'] = float(input("Enter passenger fare: "))  
+
+    user_df = pd.DataFrame([user_input], columns=X.columns)  
+
+
+    prediction = model.predict(user_df)
+
+
+    if prediction[0] == 1:
+        print("The passenger is predicted to survive.")
+    else:
+        print("The passenger is predicted not to survive.")
+
+
+    y_pred = model.predict(X_test)
+    accuracy = accuracy_score(y_test, y_pred)
+    print("Model Accuracy:", accuracy)
+
+predict_survival()
+