## Competition Description - Titanic - Predict Who Lives and Who Does Not! ##
The sinking of the RMS Titanic is one of the most infamous shipwrecks in history.  On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.

<img src="files/RMS_Titanic.jpg">

## Notebook Contents - Purpose, Motivation, and Summary ##

The purpose of this notebook is to demonstrate the use of a simple neural network and its ability to discover its own insight for the provided data. Deep Learning, due to massive leaps in computing power and the availability of data, is achieving state of the art performances in a variety of supervised and unsupervised learning problems. This is my first, independent deep neural network without any guides or solutions helping me create the solution. The notebook also explores advanced feature engineering tactics to fill in missing data and preprocessing steps to feed input data into a neural network built using Keras. 

## CODE BEGINS HERE ##

### (1.) Load the Training & Testing Data ###

In [272]:
import pandas as pd
import numpy as np

train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
nameId=test_data['PassengerId']

train_data


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


### One-hot Encoding Categorical Data ###
Here, we are establishing which columns in our dataaset are categorical and one-hot encoding them. We create separate classes for Pclass =1,2,3, gender, and where they embarked from. After creating the one-hot-encoded columns, we drop the original column and all others, like Cabin, PassengerId that we deem to have little predictive value due to missing data or for other reasons.

In [273]:
#Convert Categorical Variables into Categories & One-hot Encode Them; Remove Variables unrelated like 
dummy_fields = ['Sex','Embarked','Pclass']


for each in dummy_fields:
    train_data[each].astype('category')
    test_data[each].astype('category')
    
    dummies_train = pd.get_dummies(train_data[each],prefix = each, drop_first = False)
    dummies_test = pd.get_dummies(test_data[each],prefix = each, drop_first = False)

    train_data = pd.concat([train_data, dummies_train],axis=1)
    test_data = pd.concat([test_data, dummies_test],axis=1)
    
#Drop columns on training & testing datasets
fields_to_drop = ['Sex','Embarked','Ticket','Cabin','PassengerId','Pclass']

train_data=train_data.drop(columns=fields_to_drop, axis =1)
test_data=test_data.drop(columns=fields_to_drop, axis =1)

### (2.) Data Preprocessing Steps ###
#### Modify Age Column - for NaN, Feature Engineering

Age data is missing for approximately 20% of all passenegers on the Titanic. Simply using the average could significantly throw off our Neural Network for optimizing the weights and getting a good probability that a person survived or not. Previous submissions found patterns in Name and Fare data, leading to the grouping of different families and/or nannies. By realizing, for example, a mother is traveling with children that seem to be missing age data, we can set a lower value of Y to approximate what this age might be. From inspection of the data, there seems to be a pattern for young boys, where their Name data also contains the string, Master, vs. Mr for adult males (Age > 15 years old). 

In [274]:
#Replace NaNs with average age value; 177 NaNs
avg_age_train = int(round(train_data['Age'].mean()))
avg_age_test = int(round(test_data['Age'].mean()))

#Check if Age data contains NaNs; approximately 20% of the data does
numNaNsAge=train_data['Age'].isnull().sum()

#Check if Fare data contains NaNs
numNaNsFare=train_data['Fare'].isnull().sum()
numNaNsFare

train_data.head()
### Getting An Approximation for Age ###

conditions_train = [(train_data['Name'].str.contains('Master')==True) & (train_data['Age'].isnull()==True),
             (train_data['Name'].str.contains('Mrs.')==True) & (train_data['Age'].isnull()==True),
             (train_data['Name'].str.contains('Mr.')==True) & (train_data['Age'].isnull()==True),
            (train_data['Name'].str.contains('Miss')==True) & (train_data['Age'].isnull()==True),
             (train_data['Age'].isnull()==True)
             ]
choices = [8, avg_age_train, avg_age_train,avg_age_train,avg_age_train]
train_data['Age']=np.select(conditions_train, choices, default = train_data['Age'])
train_data=train_data.drop('Name', axis =1)

conditions_test = [(test_data['Name'].str.contains('Master')==True) & (test_data['Age'].isnull()==True),
             (test_data['Name'].str.contains('Mrs.')==True) & (test_data['Age'].isnull()==True),
             (test_data['Name'].str.contains('Mr.')==True) & (test_data['Age'].isnull()==True),
            (test_data['Name'].str.contains('Miss')==True) & (test_data['Age'].isnull()==True),
             (test_data['Age'].isnull()==True)
             ]
test_data['Age']=np.select(conditions_test, choices, default = test_data['Age'])
test_data=test_data.drop('Name', axis =1)

In [275]:
train_data['Age'].isnull().sum()
train_data.head()

Unnamed: 0,Survived,Age,SibSp,Parch,Fare,Sex_female,Sex_male,Embarked_C,Embarked_Q,Embarked_S,Pclass_1,Pclass_2,Pclass_3
0,0,22.0,1,0,7.25,0,1,0,0,1,0,0,1
1,1,38.0,1,0,71.2833,1,0,1,0,0,1,0,0
2,1,26.0,0,0,7.925,1,0,0,0,1,0,0,1
3,1,35.0,1,0,53.1,1,0,0,0,1,1,0,0
4,0,35.0,0,0,8.05,0,1,0,0,1,0,0,1


### (3.) Prepare Continuous Input Variables Columns - Feature Scaling ###

After one-hot-encoding categorical data, we now turn our attention to rescale continuous feature data, such as Age, SibSp, Parch, and Fare, to be between 0 and 1. This prevents our Neural Network from assigning higher "weights" during back-propogation to these unscaled variables, which would lead to faulty predictions. Another method we present involves creating a standard normal distribution for each of these features. This involves creating new continuous variables with zero mean and a standard deviation of 1. The latter approach yielded higher accuracies on the training dataset. 

In [276]:
norm_columns = ['Age','SibSp','Parch','Fare'] 
scaled_features_train = {} #store scaling values for conversion back to original values later on
scaled_features_test = {} #store scaling values for conversion back to original values later on

#Continuous Variables are set to be between 0 & 1 # Yielded 90% test accuracy with 5,000 epochs #
#for each in norm_columns:
#    max_train, min_train =train_data[each].max(), train_data[each].min()
#    scaled_features_train[each]= [max_train , min_train]
#    train_data.loc[:,each] = (train_data[each])/(max_train-min_train)
#    
#    max_test, min_test =test_data[each].max(), test_data[each].min()
#    scaled_features_test[each]= [max_test , min_test]
#    test_data.loc[:,each] = (test_data[each])/(max_test-min_test)



#Continuous Variables are set to be between 0 & 1 and to have zero mean and a standard deviation of 1# Yielded 91% accuracy with 50 epochs
for each in norm_columns:
    mean_train, std_train =train_data[each].mean(), train_data[each].std()
    scaled_features_train[each]= [mean_train , std_train]
    train_data.loc[:,each] = (train_data[each] - mean_train)/std_train
    
    mean_test, std_test =test_data[each].mean(), test_data[each].std()
    scaled_features_test[each]= [mean_test , std_test]
    test_data.loc[:,each] = (test_data[each] - mean_test)/std_test


### Display a Sample of Our Preprocessed Input Data ###

Here we can see a sample of the preprocessed data that will be sent into our neural network. It is a mix of continuous and categorical data.

In [277]:
train_data.head()
test_data.head()

Unnamed: 0,Age,SibSp,Parch,Fare,Sex_female,Sex_male,Embarked_C,Embarked_Q,Embarked_S,Pclass_1,Pclass_2,Pclass_3
0,0.350568,-0.498872,-0.399769,-0.497213,0,1,0,1,0,0,0,1
1,1.325664,0.616254,-0.399769,-0.512045,1,0,0,0,1,0,0,1
2,2.495779,-0.498872,-0.399769,-0.463974,0,1,0,1,0,0,1,0
3,-0.23449,-0.498872,-0.399769,-0.482308,0,1,0,0,1,0,0,1
4,-0.624528,0.616254,0.619154,-0.417469,1,0,0,0,1,0,0,1


### (4.) Separate Inputs & Ouputs for the Training Dataset ###

In [278]:
train_inputs= train_data.drop(columns=['Survived'], axis = 1)
train_targets = train_data['Survived']

In [279]:
train_inputs.head()

Unnamed: 0,Age,SibSp,Parch,Fare,Sex_female,Sex_male,Embarked_C,Embarked_Q,Embarked_S,Pclass_1,Pclass_2,Pclass_3
0,-0.585468,0.43255,-0.473408,-0.502163,0,1,0,0,1,0,0,1
1,0.637422,0.43255,-0.473408,0.786404,1,0,1,0,0,1,0,0
2,-0.279746,-0.474279,-0.473408,-0.48858,1,0,0,0,1,0,0,1
3,0.40813,0.43255,-0.473408,0.420494,1,0,0,0,1,1,0,0
4,0.40813,-0.474279,-0.473408,-0.486064,0,1,0,0,1,0,0,1


### (5.) Construct Simple Neural Network for Classification Task ###

#### Convert panda dataframe training data into numpy arrays, the appropriate form for keras. ####

In [280]:
import  keras

# Separate data and one-hot encode the output# Separ 
# Note: We're also turning the data into numpy arrays, in order to train the model in Keras

features = np.array(train_inputs)
targets = np.array(keras.utils.to_categorical(train_targets, 2))

features_test = np.array(test_data)
#targets_test = np.array(keras.utils.to_categorical(test_data['admit'], 2))

print('These are our input features. They are now numpy arrays with 12 columns.')
print(features[:10])
print(" ")
print('These are our targets. Each row describes whether or not a passenger lives or dies.')
print(targets[:10])

These are our input features. They are now numpy arrays with 12 columns.
[[-0.58546824  0.43255043 -0.47340772 -0.50216314  0.          1.          0.
   0.          1.          0.          0.          1.        ]
 [ 0.63742222  0.43255043 -0.47340772  0.78640362  1.          0.          1.
   0.          0.          1.          0.          0.        ]
 [-0.27974563 -0.47427882 -0.47340772 -0.48857985  1.          0.          0.
   0.          1.          0.          0.          1.        ]
 [ 0.40813026  0.43255043 -0.47340772  0.42049407  1.          0.          0.
   0.          1.          1.          0.          0.        ]
 [ 0.40813026 -0.47427882 -0.47340772 -0.48606443  0.          1.          0.
   0.          1.          0.          0.          1.        ]
 [ 0.02597699 -0.47427882 -0.47340772 -0.47784805  0.          1.          0.
   1.          0.          0.          0.          1.        ]
 [ 1.86031268 -0.47427882 -0.47340772  0.39559138  0.          1.          0.
   

### (6.) Build Compile the Neural Network Model Using Keras ####

In [281]:
# Import Necessary Libraries
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
from keras import backend as K

# Building the model
model = Sequential() #
model.add(Dense(128, activation='relu', input_shape=(12,)))
model.add(Dropout(.3))
model.add(Dense(64, activation='relu'))
model.add(Dropout(.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(.1))
model.add(Dense(16, activation='relu'))
model.add(Dropout(.05))
model.add(Dense(2, activation='softmax'))


#model = Sequential() 88% with 8 batch size and 50 epochs
#model.add(Dense(16, activation='relu', input_shape=(12,)))
#model.add(Dropout(.1))
#model.add(Dense(8, activation='relu'))
#model.add(Dropout(.05))
#model.add(Dense(4, activation='relu'))
#model.add(Dropout(.025))
#model.add(Dense(2, activation='softmax'))

# Compiling the model
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_36 (Dense)             (None, 128)               1664      
_________________________________________________________________
dropout_29 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_37 (Dense)             (None, 64)                8256      
_________________________________________________________________
dropout_30 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_38 (Dense)             (None, 32)                2080      
_________________________________________________________________
dropout_31 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_39 (Dense)             (None, 16)                528       
__________

### Notes for Building a Neural Network Architecture ###


### (7.) Train the Neural Network and Set Network Hyperparameters ###

In [282]:
from keras.callbacks import ModelCheckpoint  
# features and targets are Numpy arrays --just like in the Scikit-Learn API.

#Set hyperparameters 
epochs = 50
batch_size = 8

checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5', 
                               verbose=1, save_best_only=True)

model.fit(features, targets,
          epochs=epochs, batch_size=batch_size, callbacks=[checkpointer], verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
  8/891 [..............................] - ETA: 0s - loss: 0.6548 - acc: 0.7500



Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x18189b4d68>

### (8.) Score the Model ###

In [283]:
# Evaluating the model on the training and testing set
score = model.evaluate(features, targets)
print("\n Training Accuracy:", score[1])

prediction=(model.predict(features_test))


 32/891 [>.............................] - ETA: 3s
 Training Accuracy: 0.861952862622


#### Prepare model for Export into Kaggle Submission Format ####

In [284]:
pred=pd.DataFrame(prediction)
pred = pd.concat([nameId, pred],axis=1)
pred['Survived'] = np.where(pred[0] > pred[1], 0, 1)
drop = [0,1]
submission=pred.drop(columns=drop, axis =1)

### (9.) Export Dataframe as a csv file ###

In [285]:
submission.to_csv('Adrian_Lievano_Titanic_Submission', sep=',', index=False)

## Discussion ##

Deep learning algorithms are powerful; they are able to find patterns that you and I would have trouble with. The most significant challenge is that these types require massive amounts of data and tend to overfit to training data. Data quality is extremely important. By adding the simple modifier where 'Master' meant a young child, we increased our model with only 50 epochs and 8 batch_size. Consistent results that continuous variables rescaled as standard normal distributions increased test set accuracy.