## Day 77 Lecture 1 Assignment

In this assignment, we will learn about activation functions. We will create a neural network and measure the model's performance using different activations.

In [1]:
import numpy as np
import pandas as pd

We will import the famous titanic dataset below and produce a neural network that will predict the chance of survival for a passenger.

In [2]:
titanic = pd.read_csv('https://tf-assets-prod.s3.amazonaws.com/tf-curric/data-science/titanic.csv')

In [3]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


We'll perform some feature engineering

Let's start by keeping only the columns we'd like to use for our analysis. Keep only the columns: Survived, Pclass, Sex, SibSp, Parch, and Embarked

In [4]:
# Answer below:
titanic = titanic[['Survived', 'Pclass', 'SibSp', 'Sex', 'Parch', 'Embarked']]
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Survived  891 non-null    int64 
 1   Pclass    891 non-null    int64 
 2   SibSp     891 non-null    int64 
 3   Sex       891 non-null    object
 4   Parch     891 non-null    int64 
 5   Embarked  889 non-null    object
dtypes: int64(4), object(2)
memory usage: 41.9+ KB


Now examine how many rows contain missing data. Given how much missing data we have, should we remove the column with the most missing data, or remove all rows containing missing data? Do what you think is best.

In [5]:
# Answer below: 
# since 'Embarked' only has 3 missing values, we should just drop those rows

titanic = titanic.dropna()


Now we'll create a one hot encoding of the variables Pclass, Sex, and Embarked

In [6]:
# Answer below:
titanic = pd.get_dummies(titanic, columns=['Pclass', 'Sex', 'Embarked'], drop_first=True)


In [7]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 889 entries, 0 to 890
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   Survived    889 non-null    int64
 1   SibSp       889 non-null    int64
 2   Parch       889 non-null    int64
 3   Pclass_2    889 non-null    uint8
 4   Pclass_3    889 non-null    uint8
 5   Sex_male    889 non-null    uint8
 6   Embarked_Q  889 non-null    uint8
 7   Embarked_S  889 non-null    uint8
dtypes: int64(3), uint8(5)
memory usage: 32.1 KB


Split the data into train and test. 20% of the data should be set aside for testing. Use Survived as your target variable.

In [8]:
# Answer below
from sklearn.model_selection import train_test_split as tts 
X = titanic.drop(columns=['Survived'])
y = titanic[['Survived']]

X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)

In [9]:
X_train.head()

Unnamed: 0,SibSp,Parch,Pclass_2,Pclass_3,Sex_male,Embarked_Q,Embarked_S
443,0,0,1,0,0,0,1
639,1,0,0,1,1,0,1
213,0,0,1,0,1,0,1
323,1,1,1,0,0,0,1
386,5,2,0,1,1,0,1


In [10]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 178 entries, 786 to 455
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   SibSp       178 non-null    int64
 1   Parch       178 non-null    int64
 2   Pclass_2    178 non-null    uint8
 3   Pclass_3    178 non-null    uint8
 4   Sex_male    178 non-null    uint8
 5   Embarked_Q  178 non-null    uint8
 6   Embarked_S  178 non-null    uint8
dtypes: int64(2), uint8(5)
memory usage: 5.0 KB


At this point, we are ready to create a model. Import `Sequential` and `Dense` from Keras

In [11]:
# Answer below:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense


Create a model with 5 layers. The first layer should be a dense layer that receives the input, the last layer should be of size 1. You determine the remaining layer sizes.

Use a tanh activation for the output layer.

In [12]:
# Answer below
model1 = Sequential()

# first layer, input
model1.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))

# second layer
model1.add(Dense(64, activation='relu'))

# third layer
model1.add(Dense(32, activation='relu'))

# fourth layer
model1.add(Dense(16, activation='relu'))

# fifth layer, output
model1.add(Dense(1, activation='tanh'))


In [13]:
model1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               1024      
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_2 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_3 (Dense)              (None, 16)                528       
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 17        
Total params: 11,905
Trainable params: 11,905
Non-trainable params: 0
_________________________________________________________________


Compile the model using the adam optimizer, binary crossentropy loss, and the accuracy metric.

Fit the model using a batch size of 80 over 200 epochs.

In [14]:
# Answer below:
model1.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model1.fit(X_train, y_train, batch_size=80, epochs=200, verbose=2)

In [16]:
model1_score = model1.evaluate(X_test, y_test, verbose=2)

print('Test score: ', model1_score[0])
print('Test accuracy: ', model1_score[1])

6/6 - 0s - loss: 0.7716 - accuracy: 0.7697
Test score:  0.7715847492218018
Test accuracy:  0.7696629166603088


Redefine the model using a sigmoid activation for the last layer. What is the difference in accuracy.

In [17]:
# Answer below
# Answer below
model2 = Sequential()

# first layer, input
model2.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))

# second layer
model2.add(Dense(64, activation='relu'))

# third layer
model2.add(Dense(32, activation='relu'))

# fourth layer
model2.add(Dense(16, activation='relu'))

# fifth layer, output
model2.add(Dense(1, activation='sigmoid'))



In [18]:
model2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
model2.fit(X_train, y_train, batch_size=80, epochs=200, verbose=2)

In [20]:
model2_score = model2.evaluate(X_test, y_test, verbose=2)

print('Test score: ', model2_score[0])
print('Test accuracy: ', model2_score[1])

6/6 - 0s - loss: 0.5198 - accuracy: 0.7921
Test score:  0.5197842121124268
Test accuracy:  0.7921348214149475


In [21]:
import tensorflow
print(tensorflow.__version__)


2.3.0
