# CNN LSTM
A Convolutional Neural Network is used to learn features in spatial input and the LSTM is used to support a sequence of inputs (e.g. video of images).
- Gentle introduction to CNN LSTM recurrent neural networks with example Python code.
- Book https://machinelearningmastery.com/cnn-long-short-term-memory-networks/
- Book https://machinelearningmastery.com/lstms-with-python/
- Paper https://arxiv.org/pdf/1411.4389v4.pdf
- https://stackoverflow.com/questions/44778439/keras-tf-time-distributed-cnnlstm-for-visual-recognition

Problem Issue
- https://github.com/fchollet/keras/issues/5527
- https://github.com/fchollet/keras/issues/401
- https://github.com/fchollet/keras/issues/4172

<h3 style="background-color:#a2b5fb"> CNN LSTM Architecture </h3>

#### The CNN LSTM architecture involves 
- using **Convolutional Neural Network (CNN) layers** for **feature extraction on input data** 
- combined with **LSTMs** to **support sequence prediction**.

#### CNN LSTMs were developed for 
- **visual time series prediction problems** 
- **generating textual descriptions** from sequences of images (e.g. videos). 

#### Specifically, the problems of:
- **Activity Recognition**: Generating a textual description of an activity demonstrated in a sequence of images.
- **Image Description**: Generating a textual description of a single image.
- **Video Description**: Generating a textual description of a sequence of images.

#### This architecture was **originally referred** to as 
- a Long-term Recurrent Convolutional Network or LRCN model,
- will use the more generic name “CNN LSTM” to refer to LSTMs that use a CNN as a front end in this lesson.

#### This architecture is used for 
- the task of generating textual descriptions of images. 
- Key is the use of a CNN that is pre-trained on a challenging image classification task that is re-purposed as a feature extractor for the caption generating problem
-  it is natural to use a CNN as an image “encoder”, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences( Show and Tell: A Neural Image Caption Generator, 2015.)
- This architecture has **also been used** on **speech recognition** and n**atural language processing** problems where **CNNs are used as feature extractors for the LSTMs on audio and textual input data.**

#### This architecture is appropriate for problems that:
- Have **spatial structure in their input** such as the **2D structure or pixels in an image** or the **1D structure of words in a sentence, paragraph, or document.**
- Have a **temporal structure in their input** such as the **order of images in a video or words in text**, or require **the generation of output with temporal structure such as words in a textual description.**

<h3 style="background-color:#a2b5fb"> Implement CNN LSTM in Keras </h3>
#### We can define a CNN LSTM model to be trained jointly in Keras.
- A CNN LSTM can be defined by adding CNN layers on the front end followed by LSTM layers with a Dense layer on the output.

#### It is helpful to think of this architecture as defining two sub-models: 
- the **CNN Model** for **feature extraction**. 
- the **LSTM Model** for **interpreting the features across time steps.**

<img style="float:left;height:250px;width:auto;margin-right:30px" src="CNN_LSTM.png">
#### define CNN model
<pre>
model = Sequential()
model.add(TimeDistributed(Conv2D(...))
model.add(TimeDistributed(MaxPooling2D(...)))
model.add(TimeDistributed(Flatten()))
</pre>

#### define LSTM model
<pre>
model.add(LSTM(...))
model.add(Dense(...))
</pre>

<h3 style="background-color:#a2b5fb"> 資料繪圖 </h3>

In [1]:
def plot_image(image):
    fig = plt.gcf()
    fig.set_size_inches(2,2)
    plt.imshow(image, cmap='binary')
    plt.show()
    
def plot_images_labels_prediction(images, labels, 
                                  prediction, idx, num=10): #(影像, 真實值, 預測結果, 資料開始index, 顯示筆數)
    fig = plt.gcf() #圖初始
    fig.set_size_inches(12, 14) #圖大小
    if num>25: num=25 #筆數限制
    for i in range(0, num):  
        ax = plt.subplot(5, 5, 1+i) #subgraph大小，位置(5行, 5列, 1開始位置)
        ax.imshow(images[idx], cmap='binary') #畫出subgraph
        title= "lable=" + str(labels[idx]) #subgraph title
        if len(prediction)>0:
            title+=",prediction="+str(prediction[idx]) #subgraph title with prediction
        ax.set_title(title, fontsize=10)
        ax.set_xticks([])
        ax.set_yticks([])
        idx +=1
    plt.show()

def show_train_history(train_histroy, train, validation):
    plt.plot(train_history.history[train])
    plt.plot(train_history.history[validation])
    plt.title('Train History')
    plt.xlabel('Epoch')
    plt.ylabel(train)
    plt.legend(['train', 'validation'], loc='upper left')
    plt.show()

# CNN LSTM Demo
<h3 style="background-color:#a2b5fb"> 資料預處理 </h3>

In [2]:
# 讀入模組與資料集
import numpy as np
import pandas as pd
from keras.datasets import mnist
from keras.utils import np_utils
import matplotlib.pyplot as plt
np.random.seed(10)
%matplotlib inline

Using TensorFlow backend.


In [3]:
# 讀取資料,切為(x_Train, y_Train), (x_Test, y_Test)
(x_Train, y_Train), (x_Test, y_Test) = mnist.load_data()

In [4]:
# 特徵轉4維矩陣, 並標準化以提高準確度及收斂速度
x_Train4D=x_Train.reshape(x_Train.shape[0], 28, 28, 1).astype('float32') #(60000, 28, 28, 1)
x_Test4D=x_Test.reshape(x_Test.shape[0], 28, 28, 1).astype('float32') #(60000, 28, 28, 1)
x_Train4D_normalize = x_Train4D/255
x_Test4D_normalize = x_Test4D/255

In [5]:
# 標籤Category轉OneHot
y_TrainOneHot=np_utils.to_categorical(y_Train)
y_TestOneHot=np_utils.to_categorical(y_Test)

<h3 style="background-color:#a2b5fb"> 建立模型 </h3>

In [6]:
# 讀入模組
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, TimeDistributed,Input
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM

#### CNN ONLY

In [None]:
model_cnn = Sequential()
# Conv2D
model_cnn.add(Conv2D(filters=16, kernel_size=(5,5), padding='same', input_shape=(28,28,1), activation='relu'))
# MaxPooling2D 
model_cnn.add(MaxPooling2D(pool_size=(2,2)))
# Conv2D
model_cnn.add(Conv2D(filters=36, kernel_size=(5,5), padding='same', activation='relu'))
# MaxPooling2D 
model_cnn.add(MaxPooling2D(pool_size=(2,2)))
# DropOut
model_cnn.add(Dropout(0.25))
# Flatten
model_cnn.add(Flatten())
# Dense
model_cnn.add(Dense(units=128, activation='relu'))
# DropOut
model_cnn.add(Dropout(0.5))
# Dense
model_cnn.add(Dense(units=10, activation='softmax'))

print(model_cnn.summary())

#### CNN LSTM 

In [None]:
# https://groups.google.com/forum/#!topic/keras-users/glng2f67Hfs
model_CL = Sequential()
model_CL.add(TimeDistributed(Conv2D(filters=16, kernel_size=(5,5), padding='same', activation='relu'),
                             input_shape = (2, 28, 28, 1)))
model_CL.add(TimeDistributed((MaxPooling2D(pool_size=(2,2)))))
model_CL.add(TimeDistributed(Conv2D(filters=36, kernel_size=(5,5), padding='same', activation='relu')))
model_CL.add(TimeDistributed((MaxPooling2D(pool_size=(2,2)))))
model_CL.add(TimeDistributed(Dropout(0.25)))
model_CL.add(TimeDistributed(Flatten()))
model_CL.add(LSTM(32, activation='sigmoid'))
model_CL.add(Dense(units=128, activation='relu'))
model_CL.add(Dropout(0.5))
model_CL.add(Dense(units=10, activation='softmax'))

print(model_CL.summary())

<h3 style="background-color:#a2b5fb"> 訓練模型 </h3>

In [None]:
# Compile model_cnn
model_cnn.compile(loss='categorical_crossentropy',
              optimizer='adam', 
              metrics=['accuracy'])

# Training model_cnn
train_history_cnn = model_cnn.fit(x=x_Train4D_normalize, 
                                  y=y_TrainOneHot,
                                  validation_split=0.2,
                                  epochs=10, 
                                  batch_size=300,
                                  verbose=2)

In [None]:
# Compile model_CL
model_CL.compile(loss='categorical_crossentropy',
              optimizer='adam', 
              metrics=['accuracy'])

# Training model_CL
train_history_CL = model_CL.fit(x=x_Train4D_normalize, 
                                y=y_TrainOneHot,
                                validation_split=0.2,
                                epochs=10, 
                                batch_size=300,
                                verbose=2)