# Short intro to Keras

Keras is a great Python library and a wrapper for existing neural network packages. It runs with either the `theano` or `tensorflow` backend libraries.

Keras implements two main approaches:
    
    * the Sequential model
    * the functional API

The main type of model is the **Sequential** model, a linear stack of layers. You will only need the functional API for more complex models.

### The Sequential model

In [31]:
from keras.models import Sequential

model = Sequential()

Sequential means that it takes a list of layers. You can add one layer at a time to the model. Here is a simple model:

In [32]:
from keras.layers.core import Dense, Activation

model.add(Dense(input_dim=100, units=64))
model.add(Activation("relu"))
model.add(Dense(units=10))
model.add(Activation("softmax"))


In [33]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_14 (Dense)             (None, 64)                6464      
_________________________________________________________________
activation_12 (Activation)   (None, 64)                0         
_________________________________________________________________
dense_15 (Dense)             (None, 10)                650       
_________________________________________________________________
activation_13 (Activation)   (None, 10)                0         
Total params: 7,114
Trainable params: 7,114
Non-trainable params: 0
_________________________________________________________________


For the first layer you need to specify its dimensions, the remaining layers will infer the size.

Alternative, equivalent formulation:

In [34]:
model = Sequential()
model.add(Dense(input_dim=100, units=64, activation='relu'))
# model.add(Activation("relu"))
model.add(Dense(units=10, activation='softmax'))

Then you can compile and train the model.

In [5]:
model.compile(loss='categorical_crossentropy', optimizer="sgd", metrics=['accuracy'])

A complicated LSTM model is easly created: 

In [35]:
from keras.layers import Embedding, LSTM, Dropout

num_labels=2
vocabulary_size=10000
model = Sequential()
model.add(Embedding(output_dim=128, input_dim=vocabulary_size, input_length=100))
model.add(LSTM(units=64, activation='tanh'))
model.add(Dropout(rate=0.2))
model.add(Dense(num_labels))
model.add(Activation('softmax'))

### Functional API

In the functional API each layer is a function and can be applied to another layer.

In [37]:
from keras.models import Model
from keras.layers import Input, Flatten

# input: a sequence  of 5 integers, each representing a word (index between 0 and vocab_size).
main_input = Input(shape=(5,), dtype='int32', name='main_input')

# now the embedding layer will encode the input sequence
# into a sequence of dense 128-dimensional vectors.
embeds = Embedding(output_dim=128, input_dim=vocabulary_size, input_length=5)(main_input)
flatten = Flatten()(embeds) # we flatten it as Dense expects a 2D input
dense = Dense(64, activation='tanh')(flatten)

# finally the softmax (logistic) output layer
main_loss = Dense(num_labels, activation='softmax', name='main_output')(dense)
aux_loss = Dense(num_labels, activation='softmax', name='aux_loss')(embeds)


# the model is specified by connecting input and output
model = Model(inputs=[main_input], outputs=[main_loss, aux_loss])

In [38]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
main_input (InputLayer)         (None, 5)            0                                            
__________________________________________________________________________________________________
embedding_5 (Embedding)         (None, 5, 128)       1280000     main_input[0][0]                 
__________________________________________________________________________________________________
flatten_3 (Flatten)             (None, 640)          0           embedding_5[0][0]                
__________________________________________________________________________________________________
dense_20 (Dense)                (None, 64)           41024       flatten_3[0][0]                  
__________________________________________________________________________________________________
main_outpu

With the functional API you can create a lot fun models quickly. 

# A Multilayer-Perceptron/Feedforward Neural Network

In [39]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
import keras

max_features = 2000

# read in the training data
train_data = pd.read_csv('../data/sa_train.csv')
print(len(train_data), train_data['output'].unique())

# create TFIDF representations
vectorizer = TfidfVectorizer(ngram_range=(1,2), min_df=0.001, max_df=0.75, stop_words='english', max_features=max_features)

X_train = vectorizer.fit_transform(train_data.input)
print(X_train.shape)

# transform labels into numbers
labels2numbers = LabelEncoder()

y_train_org = labels2numbers.fit_transform(train_data['output'])
print(y_train_org[:10], len(y_train_org))



# read in test data
test_data = pd.read_csv('../../lectures/17/sa_test.csv')
print(len(test_data), test_data['output'].unique())

X_test = vectorizer.transform(test_data.input)
print(X_test.shape)

y_test_org = labels2numbers.transform(test_data['output'])
print(y_test_org[:10], len(y_test_org))

# get number of classes for transformation
num_classes = max(y_train_org) + 1

print('Convert class vector to binary 1-hot encoding matrix (for use with categorical_crossentropy)')
y_train = keras.utils.to_categorical(y_train_org, num_classes)
y_test = keras.utils.to_categorical(y_test_org, num_classes)

1800 ['neg' 'pos']
(1800, 2000)
[0 0 0 1 1 0 1 1 0 0] 1800
200 ['pos' 'neg']
(200, 2000)
[1 0 0 0 1 0 0 1 0 0] 200
Convert class vector to binary 1-hot encoding matrix (for use with categorical_crossentropy)


In [40]:
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)

print(y_train[:10], y_train_org[:10])

y_train shape: (1800, 2)
y_test shape: (200, 2)
[[1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]] [0 0 0 1 1 0 1 1 0 0]


In [57]:
batch_size = 8
epochs = 3

print('Building model...')
model = Sequential()
model.add(Dense(units=512, input_shape=(max_features,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(units=128))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_split=0.1)

loss, accuracy = model.evaluate(X_test, y_test,
                       batch_size=batch_size, verbose=1)
print('Test loss:', loss)
print('Test accuracy:', accuracy)

Building model...
Train on 1620 samples, validate on 180 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Test loss: 0.6236532393097878
Test accuracy: 0.8


In [58]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_29 (Dense)             (None, 512)               1024512   
_________________________________________________________________
activation_23 (Activation)   (None, 512)               0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_30 (Dense)             (None, 128)               65664     
_________________________________________________________________
activation_24 (Activation)   (None, 128)               0         
_________________________________________________________________
dense_31 (Dense)             (None, 2)                 258       
_________________________________________________________________
activation_25 (Activation)   (None, 2)                 0         
Total para

In [59]:
# output class probability matrix
print(model.predict(X_test[:10]))

[[6.2866777e-04 9.9937135e-01]
 [9.9993646e-01 6.3520070e-05]
 [5.1778119e-02 9.4822186e-01]
 [9.9949014e-01 5.0985441e-04]
 [5.9198932e-04 9.9940801e-01]
 [9.9157399e-01 8.4260479e-03]
 [1.6980013e-01 8.3019990e-01]
 [9.7704905e-01 2.2950966e-02]
 [9.9946707e-01 5.3293799e-04]
 [5.1475006e-01 4.8524991e-01]]


In [60]:
# output actual class predictions
model.predict_classes(X_test[:10])

array([1, 0, 1, 0, 1, 0, 1, 0, 0, 0])

In [61]:
from sklearn.metrics import f1_score, classification_report
print(classification_report(y_test_org, model.predict_classes(X_test)))

              precision    recall  f1-score   support

           0       0.84      0.79      0.81       111
           1       0.76      0.81      0.78        89

   micro avg       0.80      0.80      0.80       200
   macro avg       0.80      0.80      0.80       200
weighted avg       0.80      0.80      0.80       200



## References

* [http://keras.io/#getting-started-30-seconds-to-keras](http://keras.io/#getting-started-30-seconds-to-keras)
* [http://keras.io/getting-started/sequential-model-guide/](http://keras.io/getting-started/sequential-model-guide/)
* [https://arxiv.org/pdf/1510.00726.pdf](https://arxiv.org/pdf/1510.00726.pdf)