Predicting Titanic Survivors and Others.
======================

Preamble
----------

In [2]:
import pandas as pd

Read the training data
------------------------

In [3]:
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')

Examine a little bit...
--------------------------

In [132]:
print('Training data:')
print(train_data.head())
print('-------------------------')
print('Test data:')
print(test_data.head())

Training data:
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  Child  
0      0         A/5 21171   7.2500   NaN        S      0  
1      0          PC 17599  71.2833   C85        C      0  
2      0  STON/O2. 3101282   7.9250   NaN        S      0  
3      0            113803  53.1000  C123        S      0  
4    

In [11]:
train_data.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [12]:
train_data.shape

(891, 12)

How many survived and how many passed away?
-------------------------------------------------

In [15]:
train_data['Survived'].value_counts()

0    549
1    342
Name: Survived, dtype: int64

In [16]:
train_data['Survived'].value_counts(normalize=True)

0    0.616162
1    0.383838
Name: Survived, dtype: float64

Let's see if gender is important...
----------------------------------------

In [62]:
train_data['Survived'][train_data.Sex == 'male'].value_counts(normalize=True)

0    0.811092
1    0.188908
Name: Survived, dtype: float64

In [64]:
train_data['Survived'][train_data.Sex == 'female'].value_counts(normalize=True)

1    0.742038
0    0.257962
Name: Survived, dtype: float64

**-A little test.**

In [54]:
train_data.loc[:, ['Survived', 'Sex']]

Unnamed: 0,Survived,Sex
0,0,male
1,1,female
2,1,female
3,1,female
4,0,male
5,0,male
6,0,male
7,0,male
8,1,female
9,1,female


Maybe age too is of importance?
------------------------------------

Let's add a new column indicating who is child and who is not.

In [99]:
# Generate a 'Child' column with filled with all 0s.
train_data['Child'] = 0

# Assign 'Child' 1 to those who is younger than 18.
train_data.loc[train_data.Age < 18, 'Child'] = 1

# Print the result.
# print(train_data.loc[train_data.Child == 1, ('Child', 'Age')])

# Now let's check how many children had survived.
print('Children:')
print(train_data['Survived'][train_data.Child == 1].value_counts(normalize=True))
print('=============================')
print('Adult:')
print(train_data['Survived'][train_data.Child == 0].value_counts(normalize=True))

Children:
1    0.539823
0    0.460177
Name: Survived, dtype: float64
Adult:
0    0.638817
1    0.361183
Name: Survived, dtype: float64


Well, here is the first prediction.
----------------------------------------

Let's assume that every female voyager had survived.

In [131]:
test = test_data.copy()
test['Survived'] = 0
test.loc[test.Sex == 'female', 'Survived'] = 1

test_to_save = test.set_index('PassengerId').Survived

# Save this to a file:
test_to_save.to_csv('gender.csv')
gender_csv = None
with open('gender.csv', 'r') as gender:
    gender_csv = gender.read()

with open('gender.csv', 'w') as gender:
    gender.write('PassengerId,Survived\n')
    gender.flush()
    gender.write(gender_csv)
