<div style="text-align: right">   </div>


Introduction to Deep Learning (2023) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| &nbsp;
-------|-------------------
**Assignment 2 - Recurrent Neural Networks** | <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b0/UniversiteitLeidenLogo.svg/1280px-UniversiteitLeidenLogo.svg.png" width="300">



# Introduction


The goal of this assignment is to learn how to use encoder-decoder recurrent neural networks (RNNs). Specifically we will be dealing with a sequence to sequence problem and try to build recurrent models that can learn the principles behind simple arithmetic operations (**integer addition, subtraction and multiplication.**).

<img src="https://i.ibb.co/5Ky5pbk/Screenshot-2023-11-10-at-07-51-21.png" alt="Screenshot-2023-11-10-at-07-51-21" border="0" width="500"></a>

In this assignment you will be working with three different kinds of models, based on input/output data modalities:
1. **Text-to-text**: given a text query containing two integers and an operand between them (+ or -) the model's output should be a sequence of integers that match the actual arithmetic result of this operation
2. **Image-to-text**: same as above, except the query is specified as a sequence of images containing individual digits and an operand.
3. **Text-to-image**: the query is specified in text format as in the text-to-text model, however the model's output should be a sequence of images corresponding to the correct result.


### Description**
Let us suppose that we want to develop a neural network that learns how to add or subtract
two integers that are at most two digits long. For example, given input strings of 5 characters: ‘81+24’ or
’41-89’ that consist of 2 two-digit long integers and an operand between them, the network should return a
sequence of 3 characters: ‘105 ’ or ’-48 ’ that represent the result of their respective queries. Additionally,
we want to build a model that generalizes well - if the network can extract the underlying principles behind
the ’+’ and ’-’ operands and associated operations, it should not need too many training examples to generate
valid answers to unseen queries. To represent such queries we need 13 unique characters: 10 for digits (0-9),
2 for the ’+’ and ’-’ operands and one for whitespaces ’ ’ used as padding.
The example above describes a text-to-text sequence mapping scenario. However, we can also use different
modalities of data to represent our queries or answers. For that purpose, the MNIST handwritten digit
dataset is going to be used again, however in a slightly different format. The functions below will be used to create our datasets.

---

*To work on this notebook you should create a copy of it.*


# Function definitions for creating the datasets

First we need to create our datasets that are going to be used for training our models.

In order to create image queries of simple arithmetic operations such as '15+13' or '42-10' we need to create images of '+' and '-' signs using ***open-cv*** library. We will use these operand signs together with the MNIST dataset to represent the digits.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
import cv2
import numpy as np
import tensorflow as tf
import random
from sklearn.model_selection import train_test_split

from tensorflow.keras.layers import Dense, RNN, LSTM, Flatten, TimeDistributed, LSTMCell
from tensorflow.keras.layers import (
    RepeatVector,
    Conv2D,
    SimpleRNN,
    GRU,
    Reshape,
    ConvLSTM2D,
    Conv2DTranspose,
)

In [None]:
from scipy.ndimage import rotate


# Create plus/minus operand signs
def generate_images(number_of_images=50, sign="-"):
    blank_images = np.zeros(
        [number_of_images, 28, 28]
    )  # Dimensionality matches the size of MNIST images (28x28)
    x = np.random.randint(12, 16, (number_of_images, 2))  # Randomized x coordinates
    y1 = np.random.randint(6, 10, number_of_images)  # Randomized y coordinates
    y2 = np.random.randint(18, 22, number_of_images)  # -||-

    for i in range(number_of_images):  # Generate n different images
        cv2.line(
            blank_images[i],
            (y1[i], x[i, 0]),
            (y2[i], x[i, 1]),
            (255, 0, 0),
            2,
            cv2.LINE_AA,
        )  # Draw lines with randomized coordinates
        if sign == "+":
            cv2.line(
                blank_images[i],
                (x[i, 0], y1[i]),
                (x[i, 1], y2[i]),
                (255, 0, 0),
                2,
                cv2.LINE_AA,
            )  # Draw lines with randomized coordinates
        if sign == "*":
            cv2.line(
                blank_images[i],
                (x[i, 0], y1[i]),
                (x[i, 1], y2[i]),
                (255, 0, 0),
                2,
                cv2.LINE_AA,
            )
            # Rotate 45 degrees
            blank_images[i] = rotate(blank_images[i], -50, reshape=False)
            cv2.line(
                blank_images[i],
                (x[i, 0], y1[i]),
                (x[i, 1], y2[i]),
                (255, 0, 0),
                2,
                cv2.LINE_AA,
            )
            blank_images[i] = rotate(blank_images[i], -50, reshape=False)
            cv2.line(
                blank_images[i],
                (x[i, 0], y1[i]),
                (x[i, 1], y2[i]),
                (255, 0, 0),
                2,
                cv2.LINE_AA,
            )

    return blank_images


def show_generated(images, n=5):
    plt.figure(figsize=(2, 2))
    for i in range(n**2):
        plt.subplot(n, n, i + 1)
        plt.axis("off")
        plt.imshow(images[i])
    plt.show()


# show_generated(generate_images())
# show_generated(generate_images(sign="+"))

In [None]:
def create_data(highest_integer, num_addends=2, operands=["+", "-"]):
    """
    Creates the following data for all pairs of integers up to [1:highest integer][+/-][1:highest_integer]:

    @return:
    X_text: '51+21' -> text query of an arithmetic operation (5)
    X_img : Stack of MNIST images corresponding to the query (5 x 28 x 28) -> sequence of 5 images of size 28x28
    y_text: '72' -> answer of the arithmetic text query
    y_img :  Stack of MNIST images corresponding to the answer (3 x 28 x 28)

    Images for digits are picked randomly from the whole MNIST dataset.
    """

    num_indices = [np.where(MNIST_labels == x) for x in range(10)]
    num_data = [MNIST_data[inds] for inds in num_indices]
    image_mapping = dict(zip(unique_characters[:10], num_data))
    image_mapping["-"] = generate_images()
    image_mapping["+"] = generate_images(sign="+")
    image_mapping["*"] = generate_images(sign="*")
    image_mapping[" "] = np.zeros([1, 28, 28])

    X_text, X_img, y_text, y_img = [], [], [], []

    for i in range(highest_integer + 1):  # First addend
        for j in range(highest_integer + 1):  # Second addend
            for sign in operands:  # Create all possible combinations of operands
                query_string = to_padded_chars(
                    str(i) + sign + str(j), max_len=max_query_length, pad_right=True
                )
                query_image = []
                for n, char in enumerate(query_string):
                    image_set = image_mapping[char]
                    index = np.random.randint(0, len(image_set), 1)
                    query_image.append(image_set[index].squeeze())

                result = eval(query_string)
                result_string = to_padded_chars(
                    result, max_len=max_answer_length, pad_right=True
                )
                result_image = []
                for n, char in enumerate(result_string):
                    image_set = image_mapping[char]
                    index = np.random.randint(0, len(image_set), 1)
                    result_image.append(image_set[index].squeeze())

                X_text.append(query_string)
                X_img.append(np.stack(query_image))
                y_text.append(result_string)
                y_img.append(np.stack(result_image))

    return (
        np.stack(X_text),
        np.stack(X_img) / 255.0,
        np.stack(y_text),
        np.stack(y_img) / 255.0,
    )


def to_padded_chars(integer, max_len=3, pad_right=False):
    """
    Returns a string of len()=max_len, containing the integer padded with ' ' on either right or left side
    """
    length = len(str(integer))
    padding = (max_len - length) * " "
    if pad_right:
        return str(integer) + padding
    else:
        return padding + str(integer)

# Creating our data

The dataset consists of 20000 samples that (additions and subtractions between all 2-digit integers) and they have two kinds of inputs and label modalities:

  **X_text**: strings containing queries of length 5: ['  1+1  ', '11-18', ...]

  **X_image**: a stack of images representing a single query, dimensions: [5, 28, 28]

  **y_text**: strings containing answers of length 3: ['  2', '156']

  **y_image**: a stack of images that represents the answer to a query, dimensions: [3, 28, 28]

In [None]:
# Illustrate the generated query/answer pairs

unique_characters = "0123456789+- "  # All unique characters that are used in the queries (13 in total: digits 0-9, 2 operands [+, -], and a space character ' '.)
highest_integer = 99  # Highest value of integers contained in the queries

max_int_length = len(str(highest_integer))  # Maximum number of characters in an integer
max_query_length = (
    max_int_length * 2 + 1
)  # Maximum length of the query string (consists of two integers and an operand [e.g. '22+10'])
max_answer_length = 3  # Maximum length of the answer string (the longest resulting query string is ' 1-99'='-98')

# Create the data (might take around a minute)
(MNIST_data, MNIST_labels), _ = tf.keras.datasets.mnist.load_data()
X_text, X_img, y_text, y_img = create_data(highest_integer)
print(X_text.shape, X_img.shape, y_text.shape, y_img.shape)


## Display the samples that were created
def display_sample(n):
    labels = ["X_img:", "y_img:"]
    for i, data in enumerate([X_img, y_img]):
        plt.subplot(1, 2, i + 1)
        # plt.set_figheight(15)
        plt.axis("off")
        plt.title(labels[i])
        plt.imshow(np.hstack(data[n]), cmap="gray")
    print("=" * 50, f'\nQuery #{n}\n\nX_text: "{X_text[n]}" = y_text: "{y_text[n]}"')
    plt.show()


"""for _ in range(10):
    display_sample(np.random.randint(0, 10000, 1)[0])"""

## Helper functions

The functions below will help with input/output of the data.

In [None]:
# One-hot encoding/decoding the text queries/answers so that they can be processed using RNNs
# You should use these functions to convert your strings and read out the output of your networks


def encode_labels(labels, max_len=3):
    n = len(labels)
    length = len(labels[0])
    char_map = dict(zip(unique_characters, range(len(unique_characters))))
    one_hot = np.zeros([n, length, len(unique_characters)])
    for i, label in enumerate(labels):
        m = np.zeros([length, len(unique_characters)])
        for j, char in enumerate(label):
            m[j, char_map[char]] = 1
        one_hot[i] = m

    return one_hot


def decode_labels(labels):
    pred = np.argmax(labels, axis=1)
    predicted = "".join([unique_characters[i] for i in pred])

    return predicted


X_text_onehot = encode_labels(X_text)
y_text_onehot = encode_labels(y_text)

print(X_text_onehot.shape, y_text_onehot.shape)

---
---

## I. Text-to-text RNN model

The following code showcases how Recurrent Neural Networks (RNNs) are built using Keras. Several new layers are going to be used:

1. LSTM
2. TimeDistributed
3. RepeatVector

The code cell below explains each of these new components.

<img src="https://i.ibb.co/NY7FFTc/Screenshot-2023-11-10-at-09-27-25.png" alt="Screenshot-2023-11-10-at-09-27-25" border="0" width="500"></a>


In [None]:
def build_text2text_model():
    # We start by initializing a sequential model
    text2text = tf.keras.Sequential()

    # "Encode" the input sequence using an RNN, producing an output of size 256.
    # In this case the size of our input vectors is [5, 13] as we have queries of length 5 and 13 unique characters. Each of these 5 elements in the query will be fed to the network one by one,
    # as shown in the image above (except with 5 elements).
    # Hint: In other applications, where your input sequences have a variable length (e.g. sentences), you would use input_shape=(None, unique_characters).
    text2text.add(LSTM(256, input_shape=(None, len(unique_characters))))

    # As the decoder RNN's input, repeatedly provide with the last output of RNN for each time step. Repeat 3 times as that's the maximum length of the output (e.g. '  1-99' = '-98')
    # when using 2-digit integers in queries. In other words, the RNN will always produce 3 characters as its output.
    text2text.add(RepeatVector(max_answer_length))

    # By setting return_sequences to True, return not only the last output but all the outputs so far in the form of (num_samples, timesteps, output_dim). This is necessary as TimeDistributed in the below expects
    # the first dimension to be the timesteps.
    text2text.add(LSTM(256, return_sequences=True))

    # Apply a dense layer to the every temporal slice of an input. For each of step of the output sequence, decide which character should be chosen.
    text2text.add(TimeDistributed(Dense(len(unique_characters), activation="softmax")))

    # Next we compile the model using categorical crossentropy as our loss function.
    text2text.compile(
        loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
    )
    text2text.summary()

    return text2text


---
---

## II. Image to text RNN Model

Hint: There are two ways of building the encoder for such a model - again by using the regular LSTM cells (with flattened images as vectors) or recurrect convolutional layers [ConvLSTM2D](https://keras.io/api/layers/recurrent_layers/conv_lstm2d/).

The goal here is to use **X_img** as inputs and **y_text** as outputs.

In [None]:
## Your code


def build_image_to_text_model():
    # Define the input shape
    input_shape = (5, 28, 28)  # 5 images of size 28x28 each

    # Define the model
    img2text = tf.keras.Sequential()

    # Flatten the input and feed it into the LSTM encoder
    img2text.add(TimeDistributed(Flatten(), input_shape=input_shape))
    img2text.add(LSTM(384))

    # Repeat the encoder output
    img2text.add(RepeatVector(max_answer_length))

    # Decoder to generate the output sequence
    img2text.add(LSTM(384, return_sequences=True))

    # Apply a dense layer to each time step
    img2text.add(TimeDistributed(Dense(len(unique_characters), activation="softmax")))

    # Compile the model
    img2text.compile(
        loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
    )

    img2text.summary()

    return img2text


# add one additional LSTM to the model
def build_image_to_text_model_v2():
    # Define the input shape
    input_shape = (5, 28, 28)  # 5 images of size 28x28 each

    # Define the model
    img2text = tf.keras.Sequential()

    # Flatten the input and feed it into the LSTM encoder
    img2text.add(TimeDistributed(Flatten(), input_shape=input_shape))
    img2text.add(LSTM(384, return_sequences=True))
    img2text.add(LSTM(384))

    # Repeat the encoder output
    img2text.add(RepeatVector(max_answer_length))

    # Decoder to generate the output sequence
    img2text.add(LSTM(384, return_sequences=True))

    # Apply a dense layer to each time step
    img2text.add(TimeDistributed(Dense(len(unique_characters), activation="softmax")))

    # Compile the model
    img2text.compile(
        loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
    )

    img2text.summary()

    return img2text


# add one additional LSTM to the model
def build_image_to_text_model_v3():
    # Define the input shape
    input_shape = (5, 28, 28)  # 5 images of size 28x28 each

    # Define the model
    img2text = tf.keras.Sequential()

    # Flatten the input and feed it into the LSTM encoder
    img2text.add(TimeDistributed(Flatten(), input_shape=input_shape))
    img2text.add(LSTM(384, return_sequences=True))
    img2text.add(LSTM(384, return_sequences=True))
    img2text.add(LSTM(384))

    # Repeat the encoder output
    img2text.add(RepeatVector(max_answer_length))

    # Decoder to generate the output sequence
    img2text.add(LSTM(384, return_sequences=True))

    # Apply a dense layer to each time step
    img2text.add(TimeDistributed(Dense(len(unique_characters), activation="softmax")))

    # Compile the model
    img2text.compile(
        loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
    )

    img2text.summary()

    return img2text


# add one additional LSTM to the model
def build_image_to_text_model_v5():
    # Define the input shape
    input_shape = (5, 28, 28)  # 5 images of size 28x28 each

    # Define the model
    img2text = tf.keras.Sequential()

    # Flatten the input and feed it into the LSTM encoder
    img2text.add(TimeDistributed(Flatten(), input_shape=input_shape))
    img2text.add(LSTM(384, return_sequences=True))
    img2text.add(LSTM(384, return_sequences=True))
    img2text.add(LSTM(384, return_sequences=True))
    img2text.add(LSTM(384))

    # Repeat the encoder output
    img2text.add(RepeatVector(max_answer_length))

    # Decoder to generate the output sequence
    img2text.add(LSTM(384, return_sequences=True))

    # Apply a dense layer to each time step
    img2text.add(TimeDistributed(Dense(len(unique_characters), activation="softmax")))

    # Compile the model
    img2text.compile(
        loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
    )

    img2text.summary()

    return img2text

In [None]:
def train_and_evaluate(split_ratio, X_data, y_data, model, epochs=20):
    # Splitting the dataset
    X_train, X_test, y_train, y_test = train_test_split(
        X_data, y_data, test_size=split_ratio, random_state=42
    )

    # Training the model
    model.fit(
        X_train,
        y_train,
        epochs=epochs,
        batch_size=32,
        validation_data=(X_test, y_test),
    )

    # Evaluating the model
    loss, accuracy = model.evaluate(X_test, y_test)
    print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

    # Predicting and comparing with true labels
    predictions = model.predict(X_test)

    true_labels = []
    predicted_labels = []

    for i in range(len(X_test)):
        true_label = decode_labels(y_test[i])
        predicted_label = decode_labels(predictions[i])

        true_labels.append(true_label)
        predicted_labels.append(predicted_label)

    # Return the accuracy and the labels for visualization
    return accuracy, true_labels, predicted_labels


# Modified training and evaluation for different splits
splits = [0.5, 0.75, 0.9]  # Train-test splits, the number is the ratio of test data
results_p1_2_v1 = {}

for split in splits:
    print(f"Training with {1-split} Train - {split} Test split")
    accuracy, true_labels, predicted_labels = train_and_evaluate(
        split, X_img, y_text_onehot, build_image_to_text_model()
    )
    results_p1_2_v1[split] = {
        "accuracy": accuracy,
        "true_labels": true_labels,
        "predicted_labels": predicted_labels,
    }

In [None]:
# the out put from model with two LSTM layers in encoder
results_p1_2_v2 = {}

for split in splits:
    print(f"Training with {1-split} Train - {split} Test split")
    accuracy, true_labels, predicted_labels = train_and_evaluate(
        split, X_img, y_text_onehot, build_image_to_text_model_v2()
    )
    results_p1_2_v2[split] = {
        "accuracy": accuracy,
        "true_labels": true_labels,
        "predicted_labels": predicted_labels,
    }

In [None]:
# the out put from model with two LSTM layers in encoder
results_p1_2_v3 = {}

for split in splits:
    print(f"Training with {1-split} Train - {split} Test split")
    accuracy, true_labels, predicted_labels = train_and_evaluate(
        split, X_img, y_text_onehot, build_image_to_text_model_v3()
    )
    results_p1_2_v3[split] = {
        "accuracy": accuracy,
        "true_labels": true_labels,
        "predicted_labels": predicted_labels,
    }

In [None]:
# convert the labels to int for better visualization and comparison
def convert_to_int(results):
    def string_to_float(s):
        try:
            return int(s)
        except ValueError:
            return 0

    def process_labels(labels):
        return np.array([string_to_float(label) for label in labels])

    # Process the labels for each split
    for split in splits:
        results[split]["true_labels"] = process_labels(results[split]["true_labels"])
        results[split]["predicted_labels"] = process_labels(
            results[split]["predicted_labels"]
        )
    return results


results_p1_2_v1_int = convert_to_int(results_p1_2_v1)
results_p1_2_v2_int = convert_to_int(results_p1_2_v2)
results_p1_2_v3_int = convert_to_int(results_p1_2_v3)
# Now, results contain the processed labels as numpy arrays with floats and np.nan for invalid entries

In [None]:
# print the accuracy for different splits
for split in splits:
    print(f"Accuracy for {1-split} Train - {split} Test split:")
    print(f"Model with 1 LSTM layer: {results_p1_2_v1[split]['accuracy']}")
    print(f"Model with 2 LSTM layers: {results_p1_2_v2[split]['accuracy']}")
    print(f"Model with 3 LSTM layers: {results_p1_2_v3[split]['accuracy']}")
    print()

### Visualize the reuslts

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns


def find_confidence_interval(data, confidence=0.95):
    """
    Calculate the confidence interval for a given dataset and confidence level.

    Parameters:
    data (np.array): The dataset as a NumPy array.
    confidence (float): The confidence level, default is 0.95 for a 95% confidence interval.

    Returns:
    tuple: A tuple containing the lower and upper bounds of the confidence interval.
    """
    # Sort the data
    sorted_data = np.sort(data)

    # Calculate the number of data points
    n = len(sorted_data)

    # Calculate indexes for the lower and upper bounds
    lower_index = int(n * ((1 - confidence) / 2))
    upper_index = int(n * (1 - (1 - confidence) / 2))

    # Extract values at these indexes
    lower_bound = sorted_data[lower_index]
    upper_bound = sorted_data[upper_index]

    return (lower_bound, upper_bound)


def plot_and_save_distributions(results, num_LSTM, splits, num_bins=200):
    plt.figure(figsize=(12, 4))  # Adjust the overall figure size as needed

    for index, split in enumerate(splits, 1):
        ax = plt.subplot(1, len(splits), index)  # Create a subplot for each split

        delta_labels = (
            results[split]["predicted_labels"] - results[split]["true_labels"]
        )
        plt.hist(delta_labels, bins=num_bins)
        num_correct_predicted = np.where(delta_labels == 0)[0].shape[0]
        confidence_interval = find_confidence_interval(delta_labels)

        ymax = plt.ylim()[1]
        plt.vlines(confidence_interval[0], 0, ymax, colors="r", linestyles="dashed")
        plt.vlines(confidence_interval[1], 0, ymax, colors="r", linestyles="dashed")
        plt.text(25, ymax * 0.8, f"95% CI: {confidence_interval}")
        plt.text(
            25, ymax * 0.7, f"Correct: {num_correct_predicted}/{delta_labels.shape[0]}"
        )

        plt.xlabel("Diff between true & predicted labels")
        plt.ylabel("Number of samples")
        plt.xlim(-150, 150)
        plt.title(f"{split} split")

        # Reduce the font size for better fit
        for item in (
            [ax.title, ax.xaxis.label, ax.yaxis.label]
            + ax.get_xticklabels()
            + ax.get_yticklabels()
        ):
            item.set_fontsize(12)

    plt.suptitle(
        f"Distributions with {num_LSTM} LSTM Layers", fontsize=12
    )  # Overall title for the figure
    plt.subplots_adjust(
        left=0.05, right=0.95, bottom=0.1, top=0.85, wspace=0.3, hspace=0.4
    )  # Adjust subplot spacing
    plt.savefig(f"+-dist_{num_LSTM}.png")  # Save the combined plot
    plt.show()
    plt.close()  # Close the figure to free memory


plot_and_save_distributions(results_p1_2_v1_int, 1, splits)
plot_and_save_distributions(results_p1_2_v2_int, 2, splits)
plot_and_save_distributions(results_p1_2_v3_int, 3, splits, num_bins=100)

In [None]:
def plot_digit_confusion_matrices(results_dict, LSTM_layers, operation):
    """This function plots the confusion matrices for each digit place for each split in the results dictionary, based on the operation."""

    def extract_digits(labels):
        # Assumes labels are integers
        signs, thousands, hundreds, tens, ones = [], [], [], [], []
        for label in labels:
            sign = -1 if label < 0 else 1
            abs_label = abs(label)
            thousand, remainder = divmod(abs_label, 1000)
            hundred, remainder = divmod(remainder, 100)
            ten, one = divmod(remainder, 10)
            signs.append(sign)
            thousands.append(thousand)
            hundreds.append(hundred)
            tens.append(ten)
            ones.append(one)
        return signs, thousands, hundreds, tens, ones

    def plot_confusion_matrix(y_true, y_pred, title, ax):
        cm = confusion_matrix(y_true, y_pred)
        sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax)
        ax.set_title(title)
        ax.set_xlabel("Predicted")
        ax.set_ylabel("True")

    for split, split_results in results_dict.items():
        true_labels = split_results["true_labels"]
        predicted_labels = split_results["predicted_labels"]

        (
            true_signs,
            true_thousands,
            true_hundreds,
            true_tens,
            true_ones,
        ) = extract_digits(true_labels)
        (
            pred_signs,
            pred_thousands,
            pred_hundreds,
            pred_tens,
            pred_ones,
        ) = extract_digits(predicted_labels)

        # Determine the number of subplots based on the operation
        if operation == "*":
            fig, axs = plt.subplots(1, 3, figsize=(15, 5))  # Adjust for 3 plots
            digit_places = [
                (true_thousands, pred_thousands, "Thousands"),
                (true_hundreds, pred_hundreds, "Hundreds"),
                (true_tens, pred_tens, "Tens"),
            ]
        else:  # For "+/-" operation
            fig, axs = plt.subplots(1, 4, figsize=(20, 5))
            digit_places = [
                (true_signs, pred_signs, "Signs"),
                (true_hundreds, pred_hundreds, "Hundreds"),
                (true_tens, pred_tens, "Tens"),
                (true_ones, pred_ones, "Ones"),
            ]

        # Plot the confusion matrices for each digit place
        for idx, (y_true, y_pred, title) in enumerate(digit_places):
            plot_confusion_matrix(y_true, y_pred, title, axs[idx])

        # Adjust the layout and show the plot
        fig.suptitle(
            f"Confusion Matrices for {split} Test with {LSTM_layers} LSTM Layers",
            fontsize=16,
        )
        plt.tight_layout()
        plt.subplots_adjust(top=0.88)
        plt.savefig(f"{operation}_confusion_matrices_{split}_{LSTM_layers}.png")
        plt.show()


plot_digit_confusion_matrices(results_p1_2_v1_int, LSTM_layers=1, operation="+-")
plot_digit_confusion_matrices(results_p1_2_v2_int, LSTM_layers=2, operation="+-")
plot_digit_confusion_matrices(results_p1_2_v3_int, LSTM_layers=3, operation="+-")


---
---
---

# Part 2: Multiplication
The cell below will create the multiplication dataset used in this part of the assignment.

In [None]:
# Illustrate the generated query/answer pairs

unique_characters = "0123456789* "  # All unique characters that are used in the queries (13 in total: digits 0-9, 2 operands [+, -], and a space character ' '.)
highest_integer = 99  # Highest value of integers contained in the queries

max_int_length = len(str(highest_integer))  # Maximum number of characters in an integer
max_query_length = (
    max_int_length * 2 + 1
)  # Maximum length of the query string (consists of two integers and an operand [e.g. '22+10'])
max_answer_length = 5  # Maximum length of the answer string (the longest resulting query string is ' 1-99'='-98')

# Create the data (might take around a minute)
(MNIST_data, MNIST_labels), _ = tf.keras.datasets.mnist.load_data()
X_text2, X_img2, y_text2, y_img2 = create_data(highest_integer, operands=["*"])
print(X_text2.shape, X_img2.shape, y_text2.shape, y_img2.shape)


## Display the samples that were created
def display_sample(n):
    labels = ["X_img:", "y_img:"]
    for i, data in enumerate([X_img2, y_img2]):
        plt.subplot(1, 2, i + 1)
        # plt.set_figheight(15)
        plt.axis("off")
        plt.title(labels[i])
        plt.imshow(np.hstack(data[n]), cmap="gray")
    print("=" * 50, f'\nQuery #{n}\n\nX_text: "{X_text[n]}" = y_text: "{y_text[n]}"')
    plt.show()


for _ in range(10):
    display_sample(np.random.randint(0, 10000, 1)[0])

### Image-to-Text Model

In [None]:
## Your code

X_text2_onehot = encode_labels(X_text2)
y_text2_onehot = encode_labels(y_text2)

# Modified training and evaluation for different splits
splits = [0.5, 0.75, 0.9]  # Train-test splits, the number is the ratio of test data
# the out put from model with two LSTM layers in encoder
results_p2_2_v1 = {}

for split in splits:
    print(f"Training with {1-split} Train - {split} Test split")
    accuracy, true_labels, predicted_labels = train_and_evaluate(
        split, X_img2, y_text2_onehot, build_image_to_text_model()
    )
    results_p2_2_v1[split] = {
        "accuracy": accuracy,
        "true_labels": true_labels,
        "predicted_labels": predicted_labels,
    }

In [None]:
# the out put from model with two LSTM layers in encoder
results_p2_2_v2 = {}

for split in splits:
    print(f"Training with {1-split} Train - {split} Test split")
    accuracy, true_labels, predicted_labels = train_and_evaluate(
        split, X_img2, y_text2_onehot, build_image_to_text_model_v2()
    )
    results_p2_2_v2[split] = {
        "accuracy": accuracy,
        "true_labels": true_labels,
        "predicted_labels": predicted_labels,
    }

In [None]:
# the out put from model with two LSTM layers in encoder
results_p2_2_v3 = {}

for split in splits:
    print(f"Training with {1-split} Train - {split} Test split")
    accuracy, true_labels, predicted_labels = train_and_evaluate(
        split, X_img2, y_text2_onehot, build_image_to_text_model_v3()
    )
    results_p2_2_v3[split] = {
        "accuracy": accuracy,
        "true_labels": true_labels,
        "predicted_labels": predicted_labels,
    }

In [None]:
results_p2_2_v1_int = convert_to_int(results_p2_2_v1)
results_p2_2_v2_int = convert_to_int(results_p2_2_v2)
results_p2_2_v3_int = convert_to_int(results_p2_2_v3)

In [None]:
plot_digit_confusion_matrices(results_p2_2_v1_int, LSTM_layers=1, operation="*")

In [None]:
print("accuracies for multiplication")
for split in splits:
    print(f"Accuracy for {1-split} Train - {split} Test split:")
    print(f"Model with 1 LSTM layer: {results_p2_2_v1[split]['accuracy']}")
    print(f"Model with 2 LSTM layers: {results_p2_2_v2[split]['accuracy']}")
    print(f"Model with 3 LSTM layers: {results_p2_2_v3[split]['accuracy']}")
    print()

### Find Best Performace Image-to-Text

In [None]:
# the out put from model with two LSTM layers in encoder
results_p2_2_v5 = {}


accuracy, true_labels, predicted_labels = train_and_evaluate(
    0.5, X_img2, y_text2_onehot, build_image_to_text_model_v5(), epochs=50
)
results_p2_2_v5[0.5] = {
    "accuracy": accuracy,
    "true_labels": true_labels,
    "predicted_labels": predicted_labels,
}

In [None]:
print(f"Model with 5 LSTM layer: {results_p2_2_v5[0.5]['accuracy']}")


# convert the labels to int for better visualization and comparison
def convert_to_int_single_split(results, split):
    def string_to_float(s):
        try:
            return int(s)
        except ValueError:
            return 0

    def process_labels(labels):
        return np.array([string_to_float(label) for label in labels])

    results[split]["true_labels"] = process_labels(results[split]["true_labels"])
    results[split]["predicted_labels"] = process_labels(
        results[split]["predicted_labels"]
    )
    return results


results_p2_2_v5_int = convert_to_int_single_split(results_p2_2_v5, split=0.5)

In [None]:
plot_digit_confusion_matrices(results_p2_2_v5, LSTM_layers=5, operation="*")