In [3]:
import pandas as pd

In [4]:
titanic_data = pd.read_csv("Titanic-Dataset.csv")

# Basic Information

In [63]:
titanic_data.shape


(891, 12)

In [64]:
titanic_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


This DataFrame contains 891 passenger records with 12 features capturing demographic, travel, and survival information. Most columns are complete, but Age, Cabin, and Embarked have missing values, with Cabin being largely sparse. The dataset includes a mix of numerical and categorical variables, making it suitable for exploratory analysis and survival prediction tasks.

In [65]:
titanic_data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


# Special information


In [66]:
titanic_data[["Age","Fare"]].mean()

Age     29.699118
Fare    32.204208
dtype: float64

In [67]:
titanic_data[["Age","Fare"]].median()

Age     28.0000
Fare    14.4542
dtype: float64

In [68]:
titanic_data[["Age","Fare"]].mode()

Unnamed: 0,Age,Fare
0,24.0,8.05


In [69]:
titanic_data[["Age","Fare"]].describe()

Unnamed: 0,Age,Fare
count,714.0,891.0
mean,29.699118,32.204208
std,14.526497,49.693429
min,0.42,0.0
25%,20.125,7.9104
50%,28.0,14.4542
75%,38.0,31.0
max,80.0,512.3292


Description:

Average age of passengers is approximately 29.7 years, indicating that most passengers were relatively young.

Age distribution is fairly symmetric, as the mean (29.7) and median (28.0) are close.

The minimum age is 0.42 years (about 5 months), and the oldest passenger is 80 years old.

There are missing values in the Age column (714 entries vs. 891 for Fare), suggesting some age data is not available.

For Fare, the mean is 32.20 and the median is 14.45 â€” a large gap indicating a right-skewed (positively skewed) distribution.

Fare values range from 0 (free tickets) to over 512, showing a wide spread and presence of outliers or high-paying passengers.

The high standard deviation (49.69) also supports the presence of large variability in fare prices

In [70]:
titanic_data.value_counts("Survived")

Survived
0    549
1    342
Name: count, dtype: int64

In [71]:
titanic_data['Embarked'].value_counts(dropna = False, normalize=True) *100

Embarked
S      72.278339
C      18.855219
Q       8.641975
NaN     0.224467
Name: proportion, dtype: float64

In [72]:
titanic_data.value_counts(["Sex","Survived"],normalize = True) *100

Sex     Survived
male    0           52.525253
female  1           26.150393
male    1           12.233446
female  0            9.090909
Name: proportion, dtype: float64

Interestingly, death rate of male population is double the survival rate of female counterparts.

In [73]:
titanic_data[titanic_data["Sex"]=="female"].value_counts("Survived",normalize=True) *100

Survived
1    74.203822
0    25.796178
Name: proportion, dtype: float64

In [9]:
# Embarked from S and gender Male , Survival rate ?
titanic_data[
    (titanic_data.Embarked=="S") & 
    (titanic_data["Sex"]=="male")
    ].value_counts("Survived")

Survived
0    364
1     77
Name: count, dtype: int64

In [11]:
pd.crosstab(
    titanic_data.Sex,
    titanic_data.Survived,
    margins= True,
    margins_name= "Total"
        
)

Survived,0,1,Total
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,81,233,314
male,468,109,577
Total,549,342,891


In [16]:
pd.crosstab(
    [titanic_data.Embarked,titanic_data.Pclass],
    [titanic_data.Sex,titanic_data.Survived],
    margins = True
)

Unnamed: 0_level_0,Sex,female,female,male,male,All
Unnamed: 0_level_1,Survived,0,1,0,1,Unnamed: 6_level_1
Embarked,Pclass,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
C,1.0,1,42,25,17,85
C,2.0,0,7,8,2,17
C,3.0,8,15,33,10,66
Q,1.0,0,1,1,0,2
Q,2.0,0,2,1,0,3
Q,3.0,9,24,36,3,72
S,1.0,2,46,51,28,127
S,2.0,6,61,82,15,164
S,3.0,55,33,231,34,353
All,,81,231,468,109,889
