# Titanic Survival Prediction using Logistic Regression

## Objective
The goal of this project is to predict whether a passenger survived the Titanic disaster
based on basic demographic and ticket-related features.

## Dataset
- Source: Kaggle Titanic Dataset
- Rows: 891 passengers
- Target variable: `Survived` (0 = No, 1 = Yes)

## Model Used
- Logistic Regression (probabilistic classification model)


## Import Required Libraries

We use Pandas for data handling and Scikit-Learn for building and evaluating
the machine learning model.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder





                   

## Load Dataset

The dataset is loaded directly from Kaggle’s input directory.

In [2]:
 
data = pd.read_csv('/kaggle/input/titanic-dataset/Titanic-Dataset.csv')

data.head()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Data Preprocessing

Steps performed:
- Handle missing values in the `Age` column using the median
- Encode categorical variable `Sex` into numerical format
- Select relevant features for modeling


In [3]:
 #Part 1 - Logistic Regression ()
#FIill missing age with median
data.fillna({'Age': data['Age'].median()}, inplace=True)

#Encode sex
data['Sex'] = data ['Sex'].map({'male':0, 'female':1})


## Feature Selection

The following features are used:
- `Pclass`: Passenger class
- `Sex`: Gender (encoded)
- `Age`: Passenger age
- `Fare`: Ticket fare

In [4]:
#Select features
X_log = data[['Pclass', 'Sex', 'Age', 'Fare']]
y = data['Survived']

X_train,X_test, y_train,y_test = train_test_split (X_log,y, test_size=0.2, random_state=42 )




## Model Training

The dataset is split into training and testing sets.
A Logistic Regression model is trained to predict survival probability.


In [5]:
#LR
log_model = LogisticRegression()
log_model.fit(X_train, y_train)

#Predict
y_pred_log = log_model.predict(X_test)
y_pred_proba = log_model.predict_proba(X_test)[:,1]  # probability of survival



## Predictions

The model outputs:
- Binary survival prediction (0 or 1)
- Probability of survival

Below are sample predictions for test passengers.

In [6]:
# Combine PassengerId with predictions
predictions_log = pd.DataFrame({
    'PassengerId': data.loc[X_test.index, 'PassengerId'],
    'PredictedSurvival': y_pred_log,
    'SurvivalProbability': y_pred_proba
})

print(predictions_log.head(10))

     PassengerId  PredictedSurvival  SurvivalProbability
709          710                  0             0.102069
439          440                  0             0.221114
840          841                  0             0.121128
720          721                  1             0.865548
39            40                  1             0.654471
290          291                  1             0.917544
300          301                  1             0.570115
333          334                  0             0.133632
208          209                  1             0.642055
136          137                  1             0.925769


## Conclusion

- Logistic Regression successfully predicts passenger survival probability
- The model is interpretable and suitable as a baseline classifier
- Gender and passenger class play an important role in survival prediction

This notebook demonstrates a complete end-to-end machine learning workflow:
{data preprocessing → modeling → prediction.}
