# DATASET DESCRIPTION
 The dataset contains demographics of passengers who were onboard on the titanic. It consists of both qualitative and quantitative features              which will be used for Descriptive and diagnostic, data analyis with respect to appropriate **KPIs** to answer analytical questions, building a machine  learning  model to predict survival chances.
## Number of Features and observations:
 - The dataset consists of a total number of 8 features including both qualitative and quantitative features, 714 observations.
## Qualitative Features: **Categorical Variables**
 - **name**: Contains the name of the passengers onboard.
 - **sex**: whether a passenger is male or female.
## Quantitative Features: **Numerical Variables**
 - **survived**: whether a passenger survived or not.
 - **pclass**: The class in which a passenger traveled in.
 - **age**: How old a passenger was at that time(The age of a passenger).
 - **fare**: The cost of the fare for each passenger.
 - **sibsp**: The number of siblings each passenger was traveling with.
 - **parch**: The number of parents each passenger was traveling with.
## üìå Overview
This project explores survival trends on the Titanic based on class, family status, and gender ‚Äî using feature engineering and visual data analysis in Pytho,
Building a Machine Learning model to predict survival chances.
## üîç Key Analyses
- Survival by passenger class
- Impact of traveling with family
- Gender and class correlations
- Outlier detection (age)
- Family size distribution
.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
titanic = pd.read_csv('titanic2.csv')
titanic.head(10)

Unnamed: 0,survived,pclass,name,sex,age,fare,sibsp,parch
0,0,3,"Braund, Mr. Owen Harris",male,22.0,7.25,1,0
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,71.2833,1,0
2,1,3,"Heikkinen, Miss. Laina",female,26.0,7.925,0,0
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,53.1,1,0
4,0,3,"Allen, Mr. William Henry",male,35.0,8.05,0,0
5,0,1,"McCarthy, Mr. Timothy J",male,54.0,51.8625,0,0
6,0,3,"Palsson, Master. Gosta Leonard",male,2.0,21.075,3,1
7,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,11.1333,0,2
8,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,30.0708,1,0
9,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,16.7,1,1


In [5]:
titanic.shape

(714, 8)

In [6]:
titanic.describe()

Unnamed: 0,survived,pclass,age,fare,sibsp,parch
count,714.0,714.0,714.0,714.0,714.0,714.0
mean,0.406162,2.236695,29.699118,34.694514,0.512605,0.431373
std,0.49146,0.83825,14.526497,52.91893,0.929783,0.853289
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,1.0,20.125,8.05,0.0,0.0
50%,0.0,2.0,28.0,15.7417,0.0,0.0
75%,1.0,3.0,38.0,33.375,1.0,1.0
max,1.0,3.0,80.0,512.3292,5.0,6.0


In [7]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 714 entries, 0 to 713
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   survived  714 non-null    int64  
 1   pclass    714 non-null    int64  
 2   name      714 non-null    object 
 3   sex       714 non-null    object 
 4   age       714 non-null    float64
 5   fare      714 non-null    float64
 6   sibsp     714 non-null    int64  
 7   parch     714 non-null    int64  
dtypes: float64(2), int64(4), object(2)
memory usage: 44.8+ KB


### Feature Engineering and Data Transformation


In [3]:
titanic['familysize'] = titanic['sibsp']+titanic['parch']+1

In [4]:
titanic.head(5)

Unnamed: 0,survived,pclass,name,sex,age,fare,sibsp,parch,familysize
0,0,3,"Braund, Mr. Owen Harris",male,22.0,7.25,1,0,2
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,71.2833,1,0,2
2,1,3,"Heikkinen, Miss. Laina",female,26.0,7.925,0,0,1
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,53.1,1,0,2
4,0,3,"Allen, Mr. William Henry",male,35.0,8.05,0,0,1


In [5]:
titanic['sex']= titanic['sex'].map({'male':1,'female':0})

In [6]:
titanic.head(5)

Unnamed: 0,survived,pclass,name,sex,age,fare,sibsp,parch,familysize
0,0,3,"Braund, Mr. Owen Harris",1,22.0,7.25,1,0,2
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",0,38.0,71.2833,1,0,2
2,1,3,"Heikkinen, Miss. Laina",0,26.0,7.925,0,0,1
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",0,35.0,53.1,1,0,2
4,0,3,"Allen, Mr. William Henry",1,35.0,8.05,0,0,1


## Model Selection and Training&Testin.(MACHINE LEARNING)

In [7]:
#FEATURES AND TARGET
features = ["pclass","sex","age","fare","sibsp","parch","familysize"]
target = "survived"

X = titanic[features]
y= titanic[target]
# TRAIN/ TEST SPLIT
X_train,X_test,y_train,y_test= train_test_split(X,y,test_size = 0.2,random_state = 42)

#LINEAR REGRESSION MODEL
model = LinearRegression()
model.fit(X_train,y_train)
#predidct(continous values)
y_pred_continous = model.predict(X_test)
#Convert continous predictions to binary classes
y_pred = np.where(y_pred_continous >= 0.5,1,0)

#===EVALUATION===
print("Accuracy:", accuracy_score(y_test,y_pred))


Accuracy: 0.7552447552447552


In [8]:
joblib.dump(model, "titanic_model.joblib")

['titanic_model.joblib']