# Titanic Dataset Project

# 

### PassengerId Unique ID of the passenger

#### Survived --> Survived (1) or died (0)

#### Pclass --> Passenger's class (1st, 2nd, or 3rd)

#### Name --> Passenger's name

#### Sex --> Passenger's sex

#### Age --> Passenger's age

#### SibSp --> Number of siblings/spouses aboard the Titanic

#### Parch --> Number of parents/children aboard the Titanic

#### Ticket --> Ticket number

#### Fare --> Fare paid for ticket

#### Cabin --> Cabin number

#### Embarked --> Where the passenger got on the ship (C - Cherbourg, S - Southampton, Q = Queenstown)

# 

# 

### Import pandas as pd

In [1]:
#import pandas 

import pandas as pd

### Read titanic dataset.csv as a dataframe called titanic_df.

In [2]:
#load data

titanic_df = pd.read_csv('titanic dataset.csv')

### Check the head of the DataFrame.

In [3]:
titanic_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### Explore The Data Using Pandas Methods

In [4]:
# Exploring the data using pandas methods :'info', 'columns', 'describe', 'dtype',...
print(f"DataFrame info : {titanic_df.info()}\n=================================")
print(f"DataFrame columns :\n{titanic_df.columns}\n=================================")
print(f"The type of each column :\n{titanic_df.dtypes}\n=================================")
print(f"How much missing value in every column :\n{titanic_df.isna().sum()}\n=================================")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
DataFrame info : None
DataFrame columns :
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')
The type of each column :
PassengerId      int64


### Describing Data

In [5]:
titanic_df.describe(percentiles = [.61, .62]) 
# look at how the value changes at 62 percentile  but nott at 61 percentile which means 38% people survived 

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
61%,543.9,0.0,3.0,32.0,0.0,0.0,23.225
62%,552.8,1.0,3.0,32.0,0.0,0.0,24.15
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


Data above shows That only 38% of the passengers survived 

***************************************************************************************************************

### What is the total number of passengers?

In [6]:
# Create a new column 'Person' in which every person under 16 is child.

titanic_df['Person'] = titanic_df.Sex
titanic_df.loc[titanic_df['Age'] < 16, 'Person'] = 'Child'

print(f"Total number of passengers : {titanic_df.Person.count()}\n=================================")

print(f"Total number of survivors : {titanic_df.Person.count() * 0.38}\n=================================")

Total number of passengers : 891
Total number of survivors : 338.58


According to the data above 338 people survived

***************************************************************************************************************

### What is the distribution of passengers according to sex?

In [7]:
# Checking the distribution

print(f"Passengers categories :{titanic_df.Person.unique()}\n=================================")

print(f"Distribution of passengers :\n{titanic_df.Person.value_counts()}")

Passengers categories :['male' 'female' 'Child']
Distribution of passengers :
male      537
female    271
Child      83
Name: Person, dtype: int64


***************************************************************************************************************

### What is the distribution of passengers according to class?

In [8]:
print(f"Pclass categories : {titanic_df.Pclass.unique()}\n=================================")

print(f"Distribution of Pclass :\n{titanic_df.Pclass.value_counts()}")

Pclass categories : [3 1 2]
Distribution of Pclass :
3    491
1    216
2    184
Name: Pclass, dtype: int64


***************************************************************************************************************

### What is the mean age of passengers?

In [9]:
print(f"Mean age : {titanic_df.Age.mean()}")

Mean age : 29.69911764705882


***************************************************************************************************************

### What is the most surviving age group?

In [10]:
titanic_df[titanic_df['Age']<18].groupby(['Sex','Pclass']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,PassengerId,Survived,Age,SibSp,Parch,Fare
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
female,1,525.375,0.875,14.125,0.5,0.875,104.083337
female,2,369.25,1.0,8.333333,0.583333,1.083333,26.241667
female,3,374.942857,0.542857,8.428571,1.571429,1.057143,18.727977
male,1,526.5,1.0,8.23,0.5,2.0,116.0729
male,2,527.818182,0.818182,4.757273,0.727273,1.0,25.659473
male,3,437.953488,0.232558,9.963256,2.069767,1.0,22.752523


The data above shows that the most surviving age group is the children below 18

***************************************************************************************************************

### Percentage of women and men alive?

In [11]:
titanic_df[["Sex", "Survived"]].groupby(['Sex'], as_index=False).mean().sort_values(by='Survived', ascending=False)

Unnamed: 0,Sex,Survived
0,female,0.742038
1,male,0.188908


The data above supports **"Women and children first policy"!**

***************************************************************************************************************

### What percentage of the Pclass's survive?

In [12]:
titanic_df[['Pclass', 'Survived']].groupby(['Pclass'], as_index=False).mean().sort_values(by='Survived', ascending=False)

Unnamed: 0,Pclass,Survived
0,1,0.62963
1,2,0.472826
2,3,0.242363


Data Shows That Rich people had the priority to be rescued 

***************************************************************************************************************

### Who was the captin of the ship?

In [13]:
titanic_df[titanic_df['Name'].str.contains("Capt")]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Person
745,746,0,1,"Crosby, Capt. Edward Gifford",male,70.0,1,1,WE/P 5735,71.0,B22,S,male


***************************************************************************************************************

## Thank You 