###  The most important type of sequential data is the time series data, which is a series of data points listed in time order. This data is key for applications such as speech recognition, sentiment analysis, language translation, and so on.

The field of genomics, which consists of the most natural language ever – a sequence of nucleotides (A, G, C, and T) – is very well suited for RNNs applications, such as for predicting proteins from DNA sequences, predicting the binding domains of proteins, predicting the interaction between enhancers and promoters, predicting structural motifs, predicting base calls from sequencing instruments, optimizing coding sequences for increased protein production, predicting function, and so on. In this chapter, you will learn what RNNs are, how they are different from FNNs and CNNs, and how they are better suited for sequential data. By the end of this chapter, you will understand what RNNs are and why they are important in DL, the different types of RNN architectures and when to use what, and the different RNN applications in genomics.

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_06_001.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_06_002.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_06_003.jpg)

![](https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781804615447/files/image/B18958_06_004.jpg)

Another good way of illustrating how RNNs work is to explain it with an example: Imagine you have a standard FNN and give it a DNA sequence (ATGCGAG) and it processes one nucleotide at a time but by the time it reaches the last nucleotide (in this example ‘G’) it has forgotten everything about other nucleotides ‘A’, ‘T’, ‘G’, ‘C’, ‘G’, ‘A’ and FNN can't predict what nucleotide would come next. This information is important for sequential data such as DNA sequences because there is a structure to the sequence

In [1]:
import numpy as np
from sklearn import metrics
import pandas as pd

In [2]:
X_train = np.load('../Chapter06/data/X_train.npy.zip')['X_train']
y_train = np.load('../Chapter06/data/y_train.npy.zip')['y_train']

In [3]:
X_train.shape

(10000, 1000, 4)

In [4]:
y_train.shape

(10000, 690)

In [5]:
X_test = np.load('../Chapter06/data/X_test.npy.zip')['X_test']
y_test = np.load('../Chapter06/data/y_test.npy.zip')['y_test']

In [6]:
X_test.shape

(1000, 1000, 4)

In [7]:
y_test.shape

(1000, 690)

In [8]:
from keras.models import Sequential
from keras.models import Model
from keras.layers import Dense, Dropout, Activation, Flatten, Layer, Input
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.layers import LSTM
from keras.layers import Bidirectional
from keras.callbacks import ModelCheckpoint, EarlyStopping

2023-06-23 10:48:33.357529: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-23 10:48:39.700361: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-06-23 10:48:39.700383: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-06-23 10:48:53.654372: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-

In [18]:
input_data = Input(shape=(1000,4))

In [19]:
output = Conv1D(320, kernel_size=26, activation='relu')(input_data)
output = MaxPooling1D()(output)
output = Dropout(0.2)(output)

In [20]:
output = Bidirectional(LSTM(320, return_sequences=True))(output)
output = Dropout(0.5)(output)

In [21]:
flat_output = Flatten()(output)

In [22]:
FC_output = Dense(695)(flat_output)
FC_output = Activation('relu')(FC_output)

In [23]:
output = Dense(690)(FC_output)
output = Activation('sigmoid')(output)

In [24]:
model = Model(inputs=input_data, outputs=output)

In [25]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 1000, 4)]         0         
                                                                 
 conv1d_3 (Conv1D)           (None, 975, 320)          33600     
                                                                 
 max_pooling1d_2 (MaxPooling  (None, 487, 320)         0         
 1D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 487, 320)          0         
                                                                 
 bidirectional_1 (Bidirectio  (None, 487, 640)         1640960   
 nal)                                                            
                                                                 
 dropout_3 (Dropout)         (None, 487, 640)          0     