# Titanic

**Titanic** alebo **RMS Titanic** (*RMS* je skratka pre *Royal Mail Ship* – *Kráľovský poštový parník*) bol luxusný zaoceánsky parník triedy *Olympic* patriaci spoločnosti *White Star Line*. Jeho úlohou mal byť prevoz cestujúcich a pošty medzi Európou a Severnou Amerikou. Kapacita lode dovoľovala nalodenie *2 453* až *2 603* cestujúcich, batožiny a obmedzený počet kočiarov alebo automobilov. O prevádzku lode a o pohodlie cestujúcich sa staralo *885* až *899* členov posádky. 

![Titanic](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/RMS_Titanic_3.jpg/1024px-RMS_Titanic_3.jpg)

Titanic však stroskotal už počas svojej prvej plavby v noci zo *14.* na *15. apríla* roku *1912*. Zahynulo okolo *1 450* cestujúcich a členov posádky.

V tomto labe sa pozrieme na katastrofu cez čísla. Dataset je ukradnutý zo stránok [Kaggle](https://www.kaggle.com/competitions/titanic), kde slúži ako vstupné zadanie do sveta strojového učenia.

## 1. Základné informácie

V tomto kroku sa pozrieme na to, kto sa vlastne na *Titaniku* viezol v kritickú a vlastne jedinú plavbu.

### 1.1 Vytvorte dataset

Vytvorte dataset so zoznamom všetkých pasažierov. Tento dataset zostavte z dvoch datasetov s názvom `test.csv` a `train.csv`, ktoré sa nachádzajú uložené v priečinku `data/titanic/`. Index zostavte zo stĺpca `PassengerId`.

Výsledný dataset uložte do premennej `passengers`.

In [148]:
# solution
from pathlib import Path

import pandas as pd

path = Path('data/titanic/')

# merge
submission = pd.read_csv(path / 'gender_submission.csv', index_col='PassengerId')
df_test = pd.read_csv(path / 'test.csv', index_col='PassengerId')

df_merged = df_test.merge(submission, left_index=True, right_index=True)

# concatenate tables
df_train = pd.read_csv(path / 'train.csv', index_col='PassengerId')

passengers = pd.concat([df_train, df_merged])

# save merged and concatenated table to file
passengers.to_csv(path / 'passengers.csv')
# passengers.to_excel(path / 'passengers.xlsx')

### 1.x Data Cleaning

In [149]:
# nulitny vek
passengers.info()
1309-1046
# .isna() .isnull()
has_no_age = passengers['Age'].isna()
passengers.loc[has_no_age]

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   int64  
 1   Pclass    1309 non-null   int64  
 2   Name      1309 non-null   object 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float64
 5   SibSp     1309 non-null   int64  
 6   Parch     1309 non-null   int64  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float64
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: float64(2), int64(4), object(5)
memory usage: 122.7+ KB


Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0000,,S
20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.2250,,C
27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.2250,,C
29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q
...,...,...,...,...,...,...,...,...,...,...,...
1300,1,3,"Riordan, Miss. Johanna Hannah""""",female,,0,0,334915,7.7208,,Q
1302,1,3,"Naughton, Miss. Hannah",female,,0,0,365237,7.7500,,Q
1305,0,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S
1308,0,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S


In [150]:
# overime udajovy typ stlpca Age
def is_int(value: float):
    return value - int(value) == 0

# passengers.loc[~has_no_age, 'Age'].map(is_int)
passengers['Age'].map(is_int, na_action='ignore')

# zmena udajoveho typu na mensi (casting)
passengers['Age'] = passengers['Age'].astype('float16')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   int64  
 1   Pclass    1309 non-null   int64  
 2   Name      1309 non-null   object 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   int64  
 6   Parch     1309 non-null   int64  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float64
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: float16(1), float64(1), int64(4), object(5)
memory usage: 115.0+ KB


In [151]:
passengers['SibSp'] = passengers['SibSp'].astype('uint8')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   int64  
 1   Pclass    1309 non-null   int64  
 2   Name      1309 non-null   object 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   int64  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float64
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: float16(1), float64(1), int64(3), object(5), uint8(1)
memory usage: 106.1+ KB


In [152]:
passengers['Pclass'] = passengers['Pclass'].astype('uint8')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   int64  
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   object 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   int64  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float64
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: float16(1), float64(1), int64(2), object(5), uint8(2)
memory usage: 97.2+ KB


In [153]:
passengers['Parch'] = passengers['Parch'].astype('uint8')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   int64  
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   object 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   uint8  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float64
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: float16(1), float64(1), int64(1), object(5), uint8(3)
memory usage: 88.2+ KB


In [154]:
passengers['Fare'] = passengers['Fare'].astype('float16')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   int64  
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   object 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   uint8  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float16
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: float16(2), int64(1), object(5), uint8(3)
memory usage: 80.5+ KB


In [155]:
passengers['Survived'] = passengers['Survived'].astype('bool')
passengers.info()
passengers['Survived']

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   bool   
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   object 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   uint8  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float16
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: bool(1), float16(2), object(5), uint8(3)
memory usage: 71.6+ KB


PassengerId
1       False
2        True
3        True
4        True
5       False
        ...  
1305    False
1306     True
1307    False
1308    False
1309    False
Name: Survived, Length: 1309, dtype: bool

In [156]:
# retazec miesto object typu
passengers['Name'] = passengers['Name'].astype('string')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   bool   
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   string 
 3   Sex       1309 non-null   object 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   uint8  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float16
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: bool(1), float16(2), object(4), string(1), uint8(3)
memory usage: 71.6+ KB


In [157]:
# vytvorime stlpce is_male is_female
# passengers['is_male'] = passengers['Sex'] == 'male'
# passengers['is_female'] = passengers['Sex'] == 'female'
# passengers.drop('is_female', axis='columns', inplace=True)

# passengers['Sex2'] 
passengers['Sex'].replace({'male': 'M', 'female': 'F'}, inplace=True)
passengers['Sex'] = passengers['Sex'].astype('string')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   bool   
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   string 
 3   Sex       1309 non-null   string 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   uint8  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float16
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   object 
dtypes: bool(1), float16(2), object(3), string(2), uint8(3)
memory usage: 71.6+ KB


In [158]:
passengers['Embarked'] = passengers['Embarked'].astype('string')
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   bool   
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   string 
 3   Sex       1309 non-null   string 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   uint8  
 7   Ticket    1309 non-null   object 
 8   Fare      1308 non-null   float16
 9   Cabin     295 non-null    object 
 10  Embarked  1307 non-null   string 
dtypes: bool(1), float16(2), object(2), string(3), uint8(3)
memory usage: 71.6+ KB


In [159]:
has_no_cabin = passengers['Cabin'].isna()
is_c_cabin = passengers['Cabin'].str.startswith('F')

passengers.loc[~has_no_cabin & is_c_cabin, ['Survived','Name','Cabin']]

Unnamed: 0_level_0,Survived,Name,Cabin
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
67,True,"Nye, Mrs. (Elizabeth Ramell)",F33
76,False,"Moen, Mr. Sigurd Hansen",F G73
129,True,"Peter, Miss. Anna",F E69
149,False,"Navratil, Mr. Michel (""Louis M Hoffman"")",F2
184,True,"Becker, Master. Richard F",F4
194,True,"Navratil, Master. Michel M",F2
341,True,"Navratil, Master. Edmond Roger",F2
346,True,"Brown, Miss. Amelia ""Mildred""",F33
517,True,"Lemore, Mrs. (Amelia Milley)",F33
619,True,"Becker, Miss. Marion Louise",F4


In [160]:
# ostranime stlpce cabin a ticket, lebo nie su potrebne pre nasu analyzu
passengers.drop(['Cabin', 'Ticket'], axis='columns', inplace=True)

In [161]:
passengers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 1 to 1309
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  1309 non-null   bool   
 1   Pclass    1309 non-null   uint8  
 2   Name      1309 non-null   string 
 3   Sex       1309 non-null   string 
 4   Age       1046 non-null   float16
 5   SibSp     1309 non-null   uint8  
 6   Parch     1309 non-null   uint8  
 7   Fare      1308 non-null   float16
 8   Embarked  1307 non-null   string 
dtypes: bool(1), float16(2), string(3), uint8(3)
memory usage: 51.1 KB


In [181]:
# uprava stlpca Name
def split_name(row: str, *args, **kwargs):
    surname, name = row.split(', ')
    passengers['Surname'] = surname
    passengers['Firstname'] = name

# passengers['Name'].map(split_name)
# passengers['xxx'] = 'jano je makac'
# passengers.drop(['Surname', 'Firstname', 'xxx'], axis='columns', inplace=True)
passengers[['Surname', 'Firstname']] = passengers['Name'].str.split(', ', expand=True)

In [191]:
passengers.loc[passengers['Sex'] == 'F', ['Surname', 'Firstname', 'Name', 'Sex']]
passengers.loc[passengers['Surname'] == 'Johnson']

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Fare,Embarked,Surname,Firstname
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
9,True,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",F,27.0,0,2,11.132812,S,Johnson,Mrs. Oscar W (Elisabeth Vilhelmina Berg)
173,True,3,"Johnson, Miss. Eleanor Ileen",F,1.0,1,1,11.132812,S,Johnson,Miss. Eleanor Ileen
303,False,3,"Johnson, Mr. William Cahoone Jr",M,19.0,0,0,0.0,S,Johnson,Mr. William Cahoone Jr
598,False,3,"Johnson, Mr. Alfred",M,49.0,0,0,0.0,S,Johnson,Mr. Alfred
720,False,3,"Johnson, Mr. Malkolm Joackim",M,33.0,0,0,7.773438,S,Johnson,Mr. Malkolm Joackim
870,True,3,"Johnson, Master. Harold Theodor",M,4.0,1,1,11.132812,S,Johnson,Master. Harold Theodor


### 1.2 Koľko bolo cestujúcich?

Zistite, koľko cestujúcich sa zúčastnilo plavby.

In [244]:
# solution
len(passengers)
passengers['Name'].count()
len(passengers.index)
passengers.index.max()
passengers.shape[0]

1309

### 1.3 Koľko bolo na palube žien a koľko mužov?

Zistite, koľko bolo na palube žien a mužov. Výsledok reprezentujte vo forme slovníka v tvare:

```json
{
    'males': 123,
    'females': 456
}
```

In [245]:
# solution
is_woman = (passengers['Sex'] == 'female')
is_man = (passengers['Sex'] == 'male')

{
    'female': passengers.loc[ is_woman, 'Name' ].count(),
    'male': passengers.loc[ is_man, 'Name' ].count()
}

passengers.groupby('Sex')['Sex'].count().to_dict()

{'female': 466, 'male': 843}

### 1.4 Koľko bolo na palube detí?

Za dieťa považujeme osobu, ktorá nedosiahla vek *15* rokov.

In [246]:
# solution
is_kid = (passengers['Age'] < 15)

passengers.loc[is_kid, ['Age']].count()

Age    109
dtype: int64

### 1.5 Najstarší pasažier

Ktorý z cestujúcich bol najstarším pasažierom? Okrem najstaršieho cestujúceho celkovo vypíšte aj najstaršiu cestujúcu a najstaršieho cestujúceho. Vždy o cestujúcom vypíšte meno, vek a pohlavie.

In [247]:
# solution
fltr = passengers['Age'] == passengers['Age'].max()
passengers.loc[ fltr, ['Name', 'Age'] ]

is_female = (passengers['Sex'] == 'female')
is_male = (passengers['Sex'] == 'male')

pd.concat([
    # oldest female
    passengers.loc[ is_female, ['Name', 'Age', 'Sex']] \
        .nlargest(1, 'Age', keep='all') \
        .dropna()
,
    # oldest male
    passengers.loc[ is_male, ['Name', 'Age', 'Sex']] \
        .nlargest(1, 'Age', keep='all') \
        .dropna()
]) 

Unnamed: 0_level_0,Name,Age,Sex
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
988,"Cavendish, Mrs. Tyrell William (Julia Florence...",76.0,female
631,"Barkworth, Mr. Algernon Henry Wilson",80.0,male


### 1.6 Najmladší pasažier

A naopak - ktorý z cestujúcich bol najmladší? Okrem najmladšieho cestujúceho celkovo vypíšte aj najmladšiu cestujúcu a najmladšieho cestujúceho. Vždy vypíšte meno, vek a pohlavie a výsledok vráťte ako objekt typu `Series`.

In [248]:
# solution
fltr = passengers['Age'] == passengers['Age'].min()
passengers.loc[ fltr, ['Name', 'Age'] ]

is_female = (passengers['Sex'] == 'female')
is_male = (passengers['Sex'] == 'male')

pd.concat([
    # youngest female
    passengers.loc[ is_female, ['Name', 'Age', 'Sex']] \
        .nsmallest(1, 'Age', keep='all') \
        .dropna()
,
    # youngest male
    passengers.loc[ is_male, ['Name', 'Age', 'Sex']] \
        .nsmallest(1, 'Age', keep='all') \
        .dropna()
])

Unnamed: 0_level_0,Name,Age,Sex
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1246,"Dean, Miss. Elizabeth Gladys Millvina""""",0.17,female
1093,"Danbom, Master. Gilbert Sigvard Emanuel",0.33,male


### 1.7 Aký bol priemerný vek pasažierov?

Už vieme, kto bol najmladší a kto bol najstarší, tak ešte zistime, aký bol priemerný vek pasažierov na lodi. Výsledok zaokrúhlite na 2 desatinné miesta.

In [249]:
# solution

passengers['Age'].mean().round(2)  # only pandas version >= 1.4
round(passengers['Age'].mean(), 2)

29.88

### 1.8 Zisk z predaja lístkov

Koľko peňazí tvoril zisk z plavby, ktorý sa vyzbieral z cestovného?

In [250]:
# solution
passengers['Fare'].sum()

43550.4869

### 1.9 Priemerná cena lístka

Rozličné triedy pasažierov zaplatili za lístok rozličnú sumu. Zistite, aká bola priemerná cena lístka pre každú triedu pasažierov.

In [251]:
passengers.groupby('Pclass')['Fare'].mean()

Pclass
1    87.508992
2    21.179196
3    13.302889
Name: Fare, dtype: float64

### 1.10 Jack a Rose

Keďže Cameroonovu verziu prvej a poslednej plavby Titanicu pozná zrejme každý, pozrime sa na to, ktorí cestujúci by mohli byť potenciálnymi predstaviteľmi jeho hrdinov. Takže zistite, koľko Jackov a koľko Rose bolo na palube Titanicu. Okrem ich mena vieme povedať, že:

* Jack bol muž
* Rose bola žena 
* žiadny z nich určite nemal viac ako 30 rokov
* Rose patrila do prvej triedy

In [252]:
# solution
is_jack = (passengers['Sex'] == 'male') \
    & (passengers['Age'] < 30) \
    & (passengers['Name'].str.contains('Jack'))

is_rose = (passengers['Sex'] == 'female') \
    & (passengers['Name'].str.contains('Rose')) \
    & (passengers['Age'] < 30) \
    & (passengers['Pclass'] == 1)

passengers.loc[is_rose | is_jack, ['Name', 'Age', 'Sex', 'Pclass']]

Unnamed: 0_level_0,Name,Age,Sex,Pclass
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1


## 2. Nalodenie

### 2.1 Počet prístavov

Z koľkých prístavov nastupovali pasažieri? Vypíšte číslo zodpovedajúce počtu prístavov.

In [253]:
# solution
len(passengers.groupby('Embarked')['Embarked'])

3

### 2.2 Prehľad nalodených pasažierov

Vytvorte prehľad, z ktorého prístavu koľko ľudí nastúpilo na palubu. Výsledný objekt typu `Series` nech má ako index názvy prístavov, z ktorých sa pasažieri naloďovali.

In [254]:
# solution
passengers.groupby('Embarked')['Embarked'] \
    .count() \
    .rename({
        'S': 'Southampton',
        'Q': 'Queenstown',
        'C': 'Cherbourg'
    })

Embarked
Cherbourg      270
Queenstown     123
Southampton    914
Name: Embarked, dtype: int64

### 2.3 Najväčší prístav

A z ktorého prístavu nastúpilo najviac pasažierov? Vypíšte jeho názov a počet cestujúcich.

In [255]:
# solution
passengers.groupby('Embarked')['Embarked'] \
    .count() \
    .rename({
        'S': 'Southampton',
        'Q': 'Queenstown',
        'C': 'Cherbourg'
    }) \
    .sort_values() \
    .tail(1)


passengers.groupby('Embarked')['Embarked'] \
    .count() \
    .rename({
        'S': 'Southampton',
        'Q': 'Queenstown',
        'C': 'Cherbourg'
    }) \
    .nlargest(1)

Embarked
Southampton    914
Name: Embarked, dtype: int64

### 2.4 Najviac detí

Z ktorého prístavu nastúpilo najviac detí? Vypíšte jeho názov a počet detí.

In [256]:
# solution
is_kid = passengers['Age'] < 15

passengers \
    .loc[is_kid, :] \
    .groupby('Embarked')['Embarked'] \
    .count() \
    .rename({
        'S': 'Southampton',
        'Q': 'Queenstown',
        'C': 'Cherbourg'
    }) \
    .nlargest(1)

Embarked
Southampton    83
Name: Embarked, dtype: int64

## 3. O preživších

O tom, kto prežil, máme tiež záznamy. Jednak nás budú zaujímať absolútne čísla, ale druhak nás bude zaujímať aj to, čo všetko mohlo mať vplyv na to, že dotyčný človek prežil. A či nemali vyššiu pravdepodobnosť prežitia tí, ktorí patrili do vyššej vrstvy alebo boli mladší.

In [257]:
# filters
survived = passengers['Survived'] == 1
is_male = passengers['Sex'] == 'male'
is_kid = passengers['Age'] < 15
is_rich = passengers['Pclass'] == 1

### 3.1 Koľko ľudí katastrofu prežilo?

Zistite celkový počet ľudí, ktorí prežili potopenie *Titanic*-u. Okrem celkového počtu uveďte tento počet aj percentuálne.

In [258]:
# solution
total_passengers = len(passengers['Survived'].dropna())
total_survived = len(passengers.loc[survived])

print(f'{total_survived}/{total_passengers} ({total_survived/total_passengers*100:.4}%)')

494/1309 (37.74%)


### 3.2 Preživší a ich pohlavie

Malo na prežitie vplyv aj pohlavie? Koľko prežilo vo výsledku mužov a koľko žien? A koľko to bolo z celkového počtu mužov a žien?


In [259]:
# solution
total_males = len(passengers.loc[is_male])
total_females = len(passengers.loc[~is_male])

total_males, total_females, total_males+total_females

survived_males = len(passengers.loc[is_male & survived])
survived_females = len(passengers.loc[~is_male & survived])

print(f'males: {survived_males}/{total_males} ({survived_males/total_males*100:.4}%)')
print(f'females: {survived_females}/{total_females} ({survived_females/total_females*100:.4}%)')

males: 109/843 (12.93%)
females: 385/466 (82.62%)


### 3.3 Preživší a ich vek

Mal na prežitie vplyv aj vek pasažierov? Koľko prežilo vo výsledku detí a koľko dospelých? A koľko to bolo z celkového počtu detí a dospelých?

In [260]:
# solution
total_kids = len(passengers.loc[is_kid])
total_adults = len(passengers.loc[~is_kid])

survived_kids = len(passengers.loc[is_kid & survived])
survived_adults = len(passengers.loc[~is_kid & survived])

print(f'kids: {survived_kids}/{total_kids} ({survived_kids/total_kids*100:.4}%)')
print(f'adults: {survived_adults}/{total_adults} ({survived_adults/total_adults*100:.4}%)')

kids: 57/109 (52.29%)
adults: 437/1200 (36.42%)


### 3.4 Preživší a ich trieda

Mala na prežitie vplyv aj spoločenská trieda pasažierov? Koľko prežilo vo výsledku chudákov a koľko zámožných pasažierov? A koľko to bolo z celkového počtu cestujúcich?

In [261]:
# solution
total_rich = len(passengers.loc[is_rich])
total_poor = len(passengers.loc[~is_rich])

survived_rich = len(passengers.loc[is_rich & survived])
survived_poor = len(passengers.loc[~is_rich & survived])

print(f'{survived_rich}/{total_rich} ({survived_rich/total_rich*100:.4}%)')
print(f'{survived_poor}/{total_poor} ({survived_poor/total_poor*100:.4}%)')

186/323 (57.59%)
308/986 (31.24%)
