# 1. LSTM
LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) architecture that is particularly well-suited for modeling sequences and time series data. LSTMs are designed to overcome the limitations of traditional RNNs, especially the problem of vanishing and exploding gradients, which can make training difficult over long sequences.

## 1.1. Components of LSTM
1. **Cell State**: A memory that carries information across the sequence, which can be modified by the gates.
2. **Forget Gate**: Decides what portion of the cell state to forget.
3. **Input Gate**: Determines what new information to store in the cell state.
4. **Output Gate**: Controls the output from the cell state.

## 1.2. LSTM Use Case Study: Text Generation
One popular and illustrative use case of LSTMs is text generation. In this task, LSTMs are trained on a large corpus of text, and once trained, they can generate new text that mimics the style and structure of the original text. 

### 1.2.1. Steps in the Use Case:
1. **Data Preparation**:
   - Collect a large text corpus.
   - Preprocess the text (e.g., tokenization, creating sequences of fixed length).


2. **Model Architecture**:
   - An LSTM model with an embedding layer, one or more LSTM layers, and a dense output layer.
   - Example architecture in Keras:
     ```python
     from keras.models import Sequential
     from keras.layers import LSTM, Embedding, Dense

     model = Sequential()
     model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_sequence_length))
     model.add(LSTM(units=128, return_sequences=True))
     model.add(LSTM(units=128))
     model.add(Dense(units=vocab_size, activation='softmax'))

     model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
     ```


3. **Training**:
   - Use sequences of text as input and the next character/word as the target.
   - Train the model using categorical cross-entropy loss and a suitable optimizer like Adam.


4. **Text Generation**:
   - Generate text by feeding an initial seed sequence into the trained model.
   - Predict the next character/word iteratively to generate new text.
   - Example:
     ```python
     def generate_text(model, seed_text, next_words):
         for _ in range(next_words):
             token_list = tokenizer.texts_to_sequences([seed_text])[0]
             token_list = pad_sequences([token_list], maxlen=max_sequence_length - 1, padding='pre')
             predicted = model.predict_classes(token_list, verbose=0)
             output_word = ""
             for word, index in tokenizer.word_index.items():
                 if index == predicted:
                     output_word = word
                     break
             seed_text += " " + output_word
         return seed_text
     ```


# 2. Carry Over in Arithmetic Sequences
To illustrate the LSTM's ability to handle sequence learning, let's consider an example project where the goal is to predict the result of adding two numbers digit-by-digit, considering carry over.

#### Steps:
1. **Data Preparation**:
   - Generate sequences of digits representing addition problems.
   - Include carry-over information in the sequences.


2. **Model Architecture**:
   - An LSTM model that takes sequences of digits and predicts the next digit in the sequence.
   - Incorporate carry information into the input features.


3. **Training**:
   - Train the model on a dataset of generated addition problems.


4. **Prediction**:
   - Use the trained model to predict the result of new addition problems.


The key idea is to showcase the LSTM's ability to learn and remember dependencies across sequences, which is critical for tasks involving carry-over in arithmetic operations.

We try to compute the sum of two binay digits,
delegating to the model the task of taking care of the propagation of the carry.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Conv1D, Dense, Lambda, Layer
from tensorflow.keras.models import Model

2024-05-21 10:52:57.033070: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-21 10:52:57.033199: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-21 10:52:57.206599: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Here is our generator. Each element of the resulting batch is a pair `(a, res)` where `a[0]` and `a[1]` are two sequences of length `seqlen` of binary digits, and `res` is their sum. The digits are supposed to be represented in a positional order with less significative digits at lower positions (left to right).

The initial carry of the generator is 0; at successive invocations it reuses the final carry of the previous sum.

In [2]:
def generator(batchsize, seqlen):
    init_carry = np.zeros(batchsize)
    carry = init_carry
    while True:
#         print("initial carry = ", carry)
        a = np.random.randint(2, size=(batchsize, seqlen, 2))
        res = np.zeros((batchsize, seqlen))

        for t in range(0,seqlen):
            sum = a[:,t,0]+a[:,t,1] + carry
            res[:,t] = sum % 2
            carry = sum // 2
        yield (a, res)

Let's create an instance of the generator.

In [3]:
gen = generator(1, 2)

And now let's see a few samples.

In [4]:
a, res = next(gen)
print("a1 = {}, a2={}, res = {}".format(a[0,:,0],a[0,:,1],res[0]))

a1 = [1 1], a2=[0 1], res = [1. 0.]


We can now define the model. It takes in input a pair of boolean sequences of unspecified length. The `batchsize` dimension is, as usual, implicit too.

In [5]:
# Custom layer to apply tf.squeeze
class SqueezeLayer(Layer):
    def call(self, inputs):
        return tf.squeeze(inputs, axis=-1)

class CustomModel(Model):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.conv1 = Conv1D(8, 1, activation='relu')
        self.conv2 = Conv1D(4, 1, activation='relu')
        self.lstm = LSTM(4, activation=None, return_sequences=True)
        self.dense = Dense(1, activation='sigmoid')
        self.squeeze = SqueezeLayer()
    
    def call(self, inputs):
        x = self.conv1(inputs)
        x = self.conv2(x)
        x = self.lstm(x)
        x = self.dense(x)
        return self.squeeze(x)

def gen_model():
    # Input Layer
    xa = Input(shape=(None, 2))
    
    # Instantiate and call custom model
    model = CustomModel()
    out = model(xa)
    
    # Create the Model
    comp = Model(inputs=xa, outputs=out)
    
    return comp

In [6]:
mymodel = gen_model()
mymodel.summary()

In [7]:
mymodel.compile(optimizer='adam', loss='mse')

In [8]:
batchsize=100
seqlen=10

In [9]:
mymodel.fit(generator(batchsize,seqlen), steps_per_epoch=100, epochs=100)

Epoch 1/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - loss: 0.2500
Epoch 2/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.2499
Epoch 3/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.2489
Epoch 4/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.2186
Epoch 5/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.1144
Epoch 6/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.0640
Epoch 7/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.0532
Epoch 8/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 0.0526
Epoch 9/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 0.0518
Epoch 10/100
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms

<keras.src.callbacks.history.History at 0x7f7f1209a710>

In [10]:
example,res = next(generator(1, 10))
predicted = np.array([int(np.rint(x)) for x in mymodel.predict(example)[0]])

print("a1        = {}".format(example[0][:,0].astype(int)))
print("a2        = {}".format(example[0][:,1].astype(int)))
print("expected  = {}".format(res[0].astype(int)))
print("predicted = {}".format(predicted))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 188ms/step
a1        = [1 1 0 0 0 0 1 1 0 0]
a2        = [0 1 0 1 0 0 0 0 0 1]
expected  = [1 0 1 1 0 0 1 1 0 1]
predicted = [1 1 1 1 0 0 1 1 0 1]


There is a comprehensive theory behinde the LSTM. Theory always comes first in DL as imlementing DL model is just a simple method calling. 