# Building deep learning models with keras
Keras is an interface to using tensorflow.
4 model building steps:
- we define the architecture (nlayer, nnodes, which activation fn, ...)
- we compile (specify whi optimizer to use, learning rate)
- we fit
- we use it to predict

This is how basic architecture looks like:

In [1]:
import numpy as np
from keras.layers import Dense
from keras.models import Sequential
import pandas as pd

Using TensorFlow backend.


In [None]:
#Reading data
predictors = np.loadtxt('predictors_data.csv', delimiter=',')
#number of input nodes = n_cols
n_cols = predictors.shape[1]
#Sequential model means that each node is only reached by the directly previous layer.
#Except for the input layer of course.
#More complex models exist.
model = Sequential()

#now we add layers
#Dense = all nodes from previous layer connect to all nodes on next layer.
#first layer needs input shape specified. (n_cols, ) means any number of rows or datapoints is allowed. 
model.add(Dense(100, activation='relu', input_shape = (n_cols,)))

model.add(Dense(100, activation = 'relu'))

#output layer here single node. 
model.add(Dense(1))

### Specifying a model
Now you'll get to work with your first model in Keras, and will immediately be able to run more complex neural network models on larger datasets compared to the first two chapters.

To start, you'll take the skeleton of a neural network and add a hidden layer and an output layer. You'll then fit that model and see Keras do the optimization so your model continually gets better.

As a start, you'll predict workers wages based on characteristics like their industry, education and level of experience. You can find the dataset in a pandas dataframe called df. For convenience, everything in df except for the target has been converted to a NumPy matrix called predictors. The target, wage_per_hour, is available as a NumPy matrix called target.

For all exercises in this chapter, we've imported the Sequential model constructor, the Dense layer constructor, and pandas.

In [2]:
df = pd.read_csv('hourly_wages.csv.txt')

In [3]:
df.head()

Unnamed: 0,wage_per_hour,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,5.1,0,8,21,35,1,1,0,1,0
1,4.95,0,9,42,57,1,1,0,1,0
2,6.67,0,12,1,19,0,0,0,1,0
3,4.0,0,12,4,22,0,0,0,0,0
4,7.5,0,12,17,35,0,1,0,0,0


In [7]:
predictors = df.drop(['wage_per_hour'], axis=1)
predictors.head()

Unnamed: 0,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,0,8,21,35,1,1,0,1,0
1,0,9,42,57,1,1,0,1,0
2,0,12,1,19,0,0,0,1,0
3,0,12,4,22,0,0,0,0,0
4,0,12,17,35,0,1,0,0,0


In [13]:
target = np.array(df['wage_per_hour'])

In [15]:
target.shape

(534,)

In [18]:
# Import necessary modules
import keras
from keras.layers import Dense
from keras.models import Sequential

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]

# Set up the model: model
model = Sequential()

# Add the first layer
model.add(Dense(50,activation='relu', input_shape=(n_cols,)))

# Add the second layer
model.add(Dense(32,activation='relu'))

# Add the output layer
model.add(Dense(1))


Instructions for updating:
Colocations handled automatically by placer.


## Compiling and fitting a model
We first need to specify the optimizer:
Keras documntarion on optimizers: https://keras.io/optimizers/
- it controls the learing rate.
- there are many options and things can get very complex. There are some algortihms that adjust the learning rate alone. Most of the time the best option is to use a versatile one that we'll adjust later.
- 'Adam' is usually a good choice that adjusts the learning weight as it does gradient descent. 
Paper that introduced adam: https://arxiv.org/abs/1412.6980v8

Then we have to specify the loss function:
- Mean squared error is the most common choice for regression problems
- Whenever we use keras for classification we'll use another one.

In [19]:
#adding compiler to our previous model
model.compile(optimizer = 'adam', loss = 'mean_squared_error')

### Now we can fit the model
--> applying backpropagation and gradient descent with your data to update the weights.
Scaling the data before fitting (so that all data is kind of in the same scale) can ease optimization.
Usual way to do this is standardization (subtract mean and divide by sd).

In [20]:
model.fit(predictors, target)

Instructions for updating:
Use tf.cast instead.
Epoch 1/1


<keras.callbacks.History at 0x220b37c3f60>

## Classification models
Predicting outcomes from a set of discrete values.

We'll use the most common loss function for this kind of problems: categorical_crossentropy (log loss). Lower score is bettersd

We'll want to add the metrics= [ ' accuracy'] to compile step for easy-to-understand diagnostics.

The output layer will have a separate node for each possible outcome and uses 'softmax' activation fn.
Softmax ensures that the predictions sum to 1 so they can be interpreted as probabilities. 

Usually our target will be in a single column and we'll have to convert to categoricals in a format in which we have a diff column for each output. Keras will take care of this. 

In [23]:
#code to implement classification
# using Titanic passengers data to predict who survived
#https://www.kaggle.com/c/titanic

from keras.utils import to_categorical
df = pd.read_csv('titanic_all_numeric.csv.txt')

In [26]:
df.head()

Unnamed: 0,survived,pclass,age,sibsp,parch,fare,male,age_was_missing,embarked_from_cherbourg,embarked_from_queenstown,embarked_from_southampton
0,0,3,22.0,1,0,7.25,1,False,0,0,1
1,1,1,38.0,1,0,71.2833,0,False,1,0,0
2,1,3,26.0,0,0,7.925,0,False,0,0,1
3,1,1,35.0,1,0,53.1,0,False,0,0,1
4,0,3,35.0,0,0,8.05,1,False,0,0,1


In [28]:
df.describe()

Unnamed: 0,survived,pclass,age,sibsp,parch,fare,male,embarked_from_cherbourg,embarked_from_queenstown,embarked_from_southampton
count,891.0,891.0,891.0,891.0,891.0,891.0,891.0,891.0,891.0,891.0
mean,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208,0.647587,0.188552,0.08642,0.722783
std,0.486592,0.836071,13.002015,1.102743,0.806057,49.693429,0.47799,0.391372,0.281141,0.447876
min,0.0,1.0,0.42,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,2.0,22.0,0.0,0.0,7.9104,0.0,0.0,0.0,0.0
50%,0.0,3.0,29.699118,0.0,0.0,14.4542,1.0,0.0,0.0,1.0
75%,1.0,3.0,35.0,1.0,0.0,31.0,1.0,0.0,0.0,1.0
max,1.0,3.0,80.0,8.0,6.0,512.3292,1.0,1.0,1.0,1.0


In [37]:
predictors = df.drop(['survived'], axis=1).as_matrix()
target = to_categorical(df.survived)
n_cols = predictors.shape[1]

  """Entry point for launching an IPython kernel.


In [40]:
model = Sequential()
model.add(Dense(32, activation='relu', input_shape = (n_cols,)))
model.add(Dense(2, activation = 'softmax'))
#sgd = stochastic grad descent
model.compile(optimizer = 'sgd', loss = 'categorical_crossentropy',metrics = ['accuracy'])
model.fit(predictors, target, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x220b53a6b70>

## Using Models
- save model
- reload
- make predictions

In [None]:
#Code to do this
from keras.models import load_model

model.save('model_file.h5')

my_model = load_model('my_model.h5')

predictions = my_model.predict(data_to_predict_with)

probability_true = predictions[:, 1] #the other column is prob false

Now the code to verify the structure of the model

In [None]:
my_model.summary()

### Making predictions
The trained network from your previous coding exercise is now stored as model. New data to make predictions is stored in a NumPy array as pred_data. Use model to make predictions on your new data.

In this exercise, your predictions will be probabilities, which is the most common way for data scientists to communicate their predictions to colleagues.

In [None]:
# Specify, compile, and fit the model
model = Sequential()
model.add(Dense(32, activation='relu', input_shape = (n_cols,)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='sgd', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])
model.fit(predictors, target)

# Calculate predictions: predictions
predictions = model.predict(pred_data)

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = predictions[:, 1]

# print predicted_prob_true
print(predicted_prob_true)