# Description

The purpose of this work is to get acquainted with the basic methods of the Pandas library for improving the work with tabular data.

# Tasks

## Part 1. Titanic data

You can find the Titanic data at this [link](https://www.kaggle.com/c/titanic/data) (combine train and test datasets first).

In [1]:
import pandas as pd

In [2]:
titanic_data = pd.read_csv('titanic.csv', index_col='PassengerId')
titanic_data.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0.0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1.0,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1.0,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1.0,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0.0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


Task 1. The number of men and women on the ship.

In [3]:
titanic_data['Sex'].value_counts()

male      843
female    466
Name: Sex, dtype: int64

Task 2. Count the proportion of surviving passengers.

In [4]:
round(len(titanic_data[titanic_data['Survived'] == 1.0])/len(titanic_data) * 100, 2)

26.13

Task 3. Count the proportion of first class passengers.

In [5]:
round(len(titanic_data[titanic_data['Pclass'] == 1])/len(titanic_data) * 100, 2)

24.68

Task 4. Calculate the mean and median of passengers' age. 

In [6]:
print('Mean: ', titanic_data.mean()['Age'])
print('Median: ', titanic_data.median()['Age'])

Mean:  29.881137667304014
Median:  28.0


Task 5. Calculate the Pearson correlation between the SibSp and Parch columns.

In [7]:
titanic_data[['SibSp', 'Parch']].corr(method='pearson')

Unnamed: 0,SibSp,Parch
SibSp,1.0,0.373587
Parch,0.373587,1.0


Task 6. Find the most popular female names on the ship.

In [8]:
female_df = titanic_data[titanic_data['Sex'] == 'female']

name_list = []
for item in female_df['Name']:
    if 'Mrs' in item:
        try:
            name = item.split('(')[1].split(' ')[0].replace(')', '').replace(' ', '')
            name_list.append(name)
            
        except(IndexError):
            name = item.split('Mrs.')[1].split(' ')[1]
            name_list.append(name)
            
    elif 'Miss' in item: 
        name = item.split('Miss. ')[1].split(' ')[0]
        name_list.append(name)
        
pd.Series(name_list).value_counts()[:5]

Mary         22
Anna         16
Elizabeth    16
Margaret     11
Ellen         9
dtype: int64