# General 

The vast majority of this notebook is directly from [here](https://keras.io/getting-started/sequential-model-guide/)

image dimensions conventions:

* 2D-Data:
     * __tf__: (rows, cols, channels) 
     * __th__:  (channels, rows, cols). 
* 3D-Data:
    * __tf__: (conv_dim1, conv_dim2, conv_dim3, channels)
    * __th__: (channels, conv_dim1, conv_dim2, conv_dim3)

# PreProcessing Text

In [None]:
from keras.preprocessing.text import text_to_word_sequence, one_hot, Tokenizer
from keras.utils.data_utils import get_file
from nltk import word_tokenize, sent_tokenize
from pprint import pprint

# Get sample string of words to place in text.
text = get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(text).read()

In [26]:
# Store what nltk does for comparison.
nltk_sentences = sent_tokenize(text)
# Grab reasonably-sized sentences....
nltk_sentences = sorted(nltk_sentences, key=len)
print("NLTK 5 reasonable-length tokenized sentences:")
start = len(nltk_sentences)//2 - 500
stop  = start + 5
pprint(nltk_sentences[start:stop], indent=4, width=300)

NLTK 5 reasonable-length tokenized sentences:
[   'Let us acknowledge\nunprejudicedly how every higher civilization hitherto has ORIGINATED!',
    'A "free spirit"--this refreshing term is grateful in any mood,\nit almost sets one aglow.',
    'Each surrenders to the other what the other wants and\nreceives in return its own desire.',
    'Mutual manifestations of pleasure inspire mutual\nsympathy, the sentiment of homogeneity.',
    'The whole\ncircle of his judgment and feeling is clouded and draped in religious\nshadows.']


In [52]:
# Filters out punctuation.
word_seq = text_to_word_sequence(text)
print("Word sequence output by text_to_word_sequence (first 10 entries):")
print(word_seq[:10], end="\n\n")

# Specify the N most common words we are interested in working with.
vocab_size = 1000
# nb_words: Maximum number of [most common] words to work with. 
keras_tokenizer = Tokenizer(nb_words=vocab_size)
# Specify which text[s] to train on.
keras_tokenizer.fit_on_texts(word_seq)

print("Example outputs of Tokenizer attributes/methods:")
print("Document count:", keras_tokenizer.document_count)
most_common = sorted(keras_tokenizer.word_counts, key=keras_tokenizer.word_counts.get)

print("\nTop 10 most common words:\n", most_common[:10])
word_index = keras_tokenizer.word_index
index_word = {i:w for w, i in word_index.items()}

print("\nSome entries in word_index:\n", [(k, word_index[k]) for k in word_index.keys()][:10])

# Get text in the form of integer ids.
text_as_idx = one_hot(text, n=vocab_size)
print("\nSome entries in another_word_index:\n", text_as_idx[:10])

Word sequence output by text_to_word_sequence (first 10 entries):
['preface', 'supposing', 'that', 'truth', 'is', 'a', 'woman', 'what', 'then', 'is']

Example outputs of Tokenizer attributes/methods:
Document count: 101358

Top 10 most common words:
 ['ghost', 'falsity', 'professes', 'petit', 'flew', 'eine', 'footing', 'brutes', 'dialogues', 'panorama']

Some entries in word_index:
 [('ghost', 5282), ('intercalary', 10203), ('dregs', 9938), ('falsity', 5283), ('professes', 5284), ('unbend', 9435), ('flew', 5286), ('persuaded', 2519), ('æsthetical', 8897), ('feastful', 5862)]

Some entries in another_word_index:
 [327, 429, 905, 69, 561, 920, 536, 617, 840, 561]
['experience', 'common', 'heavy', 'world', 'satisfaction', 'silence', 'inner', 'since', 'nay', 'satisfaction']


In [62]:
idseq = keras_tokenizer.texts_to_sequences(word_seq)
idseq = np.array(idseq).reshape((len(idseq),)).flatten()
print(idseq)
#print([index_word[i] for i in np.array(idseq).flatten()])

[[] [592] [8] ..., [] [2] []]


# Dealing with Inputs

The 1st layer in a Sequential model needs [at least] [one of the following](https://keras.io/getting-started/sequential-model-guide/) specified. For all code snippets, assume model has been initialized via ```model = Sequential()```. The snippets show 3 _strictly equivalent_ ways of specifying input shape/size/whatever for Dense & LSTM, respectively.

* __input\_shape__:This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.
```python
model.add(Dense(32, input_shape=(784,))) # Dense example 
model.add(LSTM(32, input_shape=(10, 64))) # LSTM example
```
* __batch\_input\_shape__: argument, where the batch dimension is included. This is useful for specifying a fixed batch size (e.g. with stateful RNNs).
```python
model.add(Dense(32, batch_input_shape=(None, 784))) # Dense example
model.add(LSTM(32, batch_input_shape=(None, 10, 64))) # LSTM example
```
* __input\_dim__: some 2D layers, such as Dense, support the specification of their input shape via the argument input_dim, and some 3D temporal layers support the arguments input_dim and input_length.
```python
model.add(Dense(32, input_dim=784)) # Dense example
model.add(LSTM(32, input_length=10, input_dim=64)) # LSTM example
```

# The Merge Layer

Suppose we want to build the following:
<img src="https://s3.amazonaws.com/keras.io/img/two_branches_sequential_model.png" width="250">

This is accomplished in the cell below

In [1]:
from keras.layers import Merge

# Create the top-left and top-right branches.
left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))
right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))

# Combine the branches via concatenation.
# mode can be ['sum' (default), 'concat', 'mul', 'ave', 'dot', 'cos']
merged = Merge([left_branch, right_branch]d, mode='concat')

# Feed to a softmax layer for output
final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(10, activation='softmax'))

# How to train it.
final_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
final_model.fit([input_data_1, input_data_2], targets)  # we pass one data array per model input

Using TensorFlow backend.


NameError: name 'Sequential' is not defined

## Embedding Layers 

#### Useful Information

__Links__:
* [documentation link](https://keras.io/layers/embeddings/)
* [embeddings.py github source code link](https://github.com/fchollet/keras/blob/master/keras/layers/embeddings.py#L8)

--------------------------------------------------------------------------------

__General Information__:
* Description: "Turn positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]. This layer can only be used as the first layer in a model."
* __Function Signature__: 
```python
Embedding(input_dim, output_dim, 
          init='uniform', input_length=None, 
          W_regularizer=None, activity_regularizer=None, 
          W_constraint=None, mask_zero=False, 
          weights=None, dropout=0.0)
```
* __Input shape__: 2D tensor with shape: ```(nb_samples, sequence_length).```
    * This reveals a very important assumption being made. The reason there is no 3rd 'input\_dim' [for single sample at single timestep] is because _we assume it is 1_, i.e. that we are feeding in scalars (indices). Yes, there are somewhat conflicting/confusing name conventions for input dimensions vs. input shapes here.
* __Output shape__: 3D tensor with shape: ```(nb_samples, sequence_length, output_dim).```

------------------------------------------------------------------------------------

Particularly good parameter definitions to know/understand:
* __input\_length__: Length of input sequences, when it is constant. This argument is required if you are going to connect ```Flatten --> Dense``` layers upstream (without it, the shape of the dense outputs cannot be computed).
* __dropout__:float between 0 and 1. Fraction of the embeddings to drop.

#### An Example

In the following code cell:
*  __model input__: an integer matrix of size (batch, input_length). Recall that the following are equivalent:
```python
# version used in Embedding() signature.
input_dim=VOCAB_SIZE, input_length=SEQ_LEN 
# Both of the following can be used instead in, e.g., LSTM(...)
batch_input_shape = (None, SEQ_LEN, VOCAB_SIZE) 
input_shape = (SEQ_LEN, VOCAB_SIZE) 
```
* __model output__: has shape (None, 10, 64), where None is the batch dimension.

In [7]:
VOCAB_SIZE = 1000
BATCH_SIZE = 32
SEQ_LEN    = 10

# Random numbers are sampled from Unif[0, VOCAB_SIZE). 
input_array = np.random.randint(VOCAB_SIZE, size=(BATCH_SIZE, SEQ_LEN))

model = Sequential()
# input_dim and output_dim must come first, in this order (they don't have default values). 
model.add(Embedding(input_dim=VOCAB_SIZE, output_dim=64, input_length=SEQ_LEN))
model.compile('rmsprop', 'mse')
model.summary()

embed_output_array = model.predict(input_array)
assert(output_array.shape == (BATCH_SIZE, SEQ_LEN, 64))

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
embedding_3 (Embedding)          (None, 10, 64)        64000       embedding_input_3[0][0]          
Total params: 64,000
Trainable params: 64,000
Non-trainable params: 0
____________________________________________________________________________________________________


# Code Examples

## LSTM Examples

The cell below is assumed to have been run before any of the following examples (i.e. they all start with this in common)

In [3]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
import numpy as np

model = Sequential()

#### Sequence classification with LSTM

In [None]:
model.add(Embedding(max_features, 256, input_length=maxlen))
model.add(LSTM(output_dim=128, activation='sigmoid', inner_activation='hard_sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.fit(X_train, Y_train, batch_size=16, nb_epoch=10)
score = model.evaluate(X_test, Y_test, batch_size=16)

#### Stacked LSTM for sequence classification


In this model, we stack 3 LSTM layers on top of each other, making the model capable of learning higher-level temporal representations.

The first two LSTMs return their full output sequences, but the last one only returns the last step in its output sequence, thus dropping the temporal dimension (i.e. converting the input sequence into a single vector).
<img src="https://keras.io/img/regular_stacked_lstm.png">

In [None]:
data_dim = 16
timesteps = 8
nb_classes = 10

# expected input data shape: (batch_size, timesteps, data_dim)
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, nb_classes))

# generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, nb_classes))

model.fit(x_train, y_train,
          batch_size=64, nb_epoch=5,
          validation_data=(x_val, y_val))

#### Same stacked LSTM model, rendered "stateful"

A stateful recurrent model is one for which the internal states (memories) obtained after processing a batch of samples are reused as initial states for the samples of the next batch. This allows to process longer sequences while keeping computational complexity manageable.

In [None]:
data_dim = 16
timesteps = 8
nb_classes = 10
batch_size = 32

# expected input batch shape: (batch_size, timesteps, data_dim)
# note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, nb_classes))

# generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, nb_classes))

model.fit(x_train, y_train,
          batch_size=batch_size, nb_epoch=5,
          validation_data=(x_val, y_val))