# **Task 1: Building a Model to Predict Whether passenger on the Titanic survived or not.**

### **Importing Libraries**

In [None]:
import numpy  as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score


### **Importing Titanic dataset**

In [None]:
data=pd.read_csv("/content/Titanic-Dataset.csv")

### **Data Preprocessing**

Drobpping all unwanted columns from the dataset

In [None]:
df=data.drop(["PassengerId","Name","Ticket","SibSp","Parch","Embarked","Cabin"],axis=1)
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,Fare
0,0,3,male,22.0,7.25
1,1,1,female,38.0,71.2833
2,1,3,female,26.0,7.925
3,1,1,female,35.0,53.1
4,0,3,male,35.0,8.05


In [None]:
missing_values = df.isnull().sum()
missing_values

Unnamed: 0,0
Survived,0
Pclass,0
Sex,0
Age,177
Fare,0


*There are 177 missing values in the age*

In [None]:
df["Age"].fillna(df["Age"].median(), inplace=True)
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})



The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Age"].fillna(df["Age"].median(), inplace=True)


*Filled the blank ages with median and mapped male ino 0 and female into 1*

In [None]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,Fare
0,0,3,0,22.0,7.25
1,1,1,1,38.0,71.2833
2,1,3,1,26.0,7.925
3,1,1,1,35.0,53.1
4,0,3,0,35.0,8.05


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  891 non-null    int64  
 1   Pclass    891 non-null    int64  
 2   Sex       891 non-null    int64  
 3   Age       891 non-null    float64
 4   Fare      891 non-null    float64
dtypes: float64(2), int64(3)
memory usage: 34.9 KB


### **Exploratory Data Analysis**

In [None]:
df.describe()

Unnamed: 0,Survived,Pclass,Sex,Age,Fare
count,891.0,891.0,891.0,891.0,891.0
mean,0.383838,2.308642,0.352413,29.361582,32.204208
std,0.486592,0.836071,0.47799,13.019697,49.693429
min,0.0,1.0,0.0,0.42,0.0
25%,0.0,2.0,0.0,22.0,7.9104
50%,0.0,3.0,0.0,28.0,14.4542
75%,1.0,3.0,1.0,35.0,31.0
max,1.0,3.0,1.0,80.0,512.3292


*These are the summary measures of the parameters in r the dataset*

In [None]:
Survived = df['Survived'].value_counts()
Survived


Unnamed: 0_level_0,count
Survived,Unnamed: 1_level_1
0,549
1,342


*In this dataset,  0 and 1 representing the non survivers and survivers in the titanic. Most of them are non survivers(569), only few of them are survived (342)*

### **Creating a Model**

In [None]:
X = df.drop("Survived", axis=1)
y = df["Survived"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
print(f'Precision: {precision_score(y_test, y_pred)}')
print(f'Recall: {recall_score(y_test, y_pred)}')

Accuracy: 0.8044692737430168
Precision: 0.782608695652174
Recall: 0.7297297297297297


###**Interpretaion of Result**

*1. An accuracy of 0.8045 means that the model correctly predicted whether a passenger survived or not about 80.45% of the time.*

*2. precision of 0.7826 means that when the model predicts a passenger survived, it is correct 78.26% of the time.*

*3.recall of 0.7297 means that the model correctly identified 72.97% of the actual survivors.*

### **Conclusion**

*This model achieved an accuracy of approximately 80.45%, precision of 78.26%, and recall of 72.97%. So, we can say the model is a good model.*