<a href="https://colab.research.google.com/github/aadi350/data-science-syllabus/blob/main/Protocol_Buffers%2C_Neural_Networks_and_Python_Generators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

TLDR: So I was working on my thesis, and wanted to implement a particular paper that I would be able to iterate upon, long story short: [this paper](https://ieeexplore.ieee.org/document/8451652) presented a Fully-Convolutional Siamese Neural network for Change Detection. And me, being me, was not satisfied with simply cloning their model [from GitHub](https://github.com/rcdaudt/fully_convolutional_change_detection) and using it as-is. I had to implement it, using TensorFlow (instead of PyTorch), so that I could *really* experience the intricacies of their model. (So I did, and you can find it below in a hidden cell, but that's besides the point of this post).

In [1]:
#@title Siamese FCNN Code
import tensorflow as tf 
from tensorflow.keras.layers import *

INPUT_SHAPE = (256, 256, 3)

def build_in_channel(name):
    inputs = Input(shape=INPUT_SHAPE, name=name)
    out_0 = Conv2D(16, (2, 2), padding='same')(inputs)
    out_0 = Conv2D(16, (2, 2), padding='same')(out_0)

    # shape = 256, 256, 16

    out_1 = MaxPooling2D((2, 2))(out_0)
    # shape = 128, 128, 16
    # 2x2 MaxPool HALVES w and h

    out_1 = Conv2D(16, (2, 2), padding='same')(out_1)
    out_1 = Conv2D(32, (2, 2), padding='same')(out_1)
    out_1 = Conv2D(32, (2, 2), padding='same')(out_1)
    # shape = 128, 128, 32

    out_2 = MaxPooling2D((2, 2))(out_1)
    # shape = 64, 64, 32

    out_2 = Conv2D(32, (2, 2), padding='same')(out_2)
    out_2 = Conv2D(64, (2, 2), padding='same')(out_2)
    out_2 = Conv2D(64, (2, 2), padding='same')(out_2)
    out_2 = Conv2D(64, (2, 2), padding='same')(out_2)
    # shape = 64, 64, 64

    out_3 = MaxPooling2D((2, 2))(out_2)
    # shape = 32, 32, 64

    out_3 = Conv2D(64, (2, 2),   padding='same')(out_3)
    out_3 = Conv2D(128, (2, 2),  padding='same')(out_3)
    out_3 = Conv2D(128, (2, 2),  padding='same')(out_3)
    out_3 = Conv2D(128, (2, 2),  padding='same')(out_3)
    # shape = 32, 32, 128

    return inputs, (out_0, out_1, out_2, out_3)


def build_siamese_autoencoder():

    left_in, (l_out_0, l_out_1, l_out_2, l_out_3) = build_in_channel('left')
    right_in, (r_out_0, r_out_1, r_out_2, r_out_3) = build_in_channel('right')

    output = subtract([l_out_3, r_out_3])
    # shape = 32, 32, 128

    l_out_3 = MaxPooling2D((2, 2), padding='same')(l_out_3)
    # shape = 16, 16, 128
    l_out_3 = Conv2DTranspose(128, 2, padding='same')(l_out_3)
    # shape = 16, 16, 128
    l_out_3 = Conv2DTranspose(128, 2, padding='same')(l_out_3)
    # shape = 16, 16, 128
    l_out_3 = UpSampling2D((2, 2))(l_out_3)
    # shape = 32, 32, 128

    diff_3 = subtract([l_out_3, r_out_3])
    output = concatenate([output, diff_3])
    # shape = 32, 32, 256

    output = Conv2DTranspose(256, 2, padding='same')(output)
    # shape = 32, 32, 256
    output = Conv2DTranspose(128, 2, padding='same')(output)
    # shape = 32, 32, 128
    output = Conv2DTranspose(128, 2, padding='same')(output)
    # shape = 32, 32, 128
    output = Conv2DTranspose(64, 2, padding='same')(output)
    # shape = 32, 32, 64
    output = Conv2DTranspose(64, 2, padding='same')(output)
    # shape = 32, 32, 64
    output = Conv2DTranspose(64, 2, padding='same')(output)
    # shape = 32, 32, 64
    output = UpSampling2D((2, 2))(output)
    # shape = 64, 64, 64

    diff_2 = subtract([l_out_2, r_out_2])
    output = concatenate([output, diff_2])
    # shape = 64, 64, 128

    output = Conv2DTranspose(128, (2, 2), padding='same')(output)
    output = Conv2DTranspose(64, (2, 2), padding='same')(output)
    output = Conv2DTranspose(64, (2, 2), padding='same')(output)
    output = Conv2DTranspose(32, (2, 2), padding='same')(output)
    output = Conv2DTranspose(32, (2, 2), padding='same')(output)
    output = Conv2DTranspose(32, (2, 2), padding='same')(output)
    # shape = 64, 64, 32

    output = UpSampling2D((2, 2))(output)
    # shape = 128, 128, 32

    diff_1 = subtract([l_out_1, r_out_1])
    output = concatenate([output, diff_1])
    # shape = 128, 128, 64

    output = Conv2DTranspose(64, (2, 2), padding='same')(output)
    output = Conv2DTranspose(32, (2, 2), padding='same')(output)
    output = Conv2DTranspose(16, (2, 2), padding='same')(output)
    output = Conv2DTranspose(16, (2, 2), padding='same')(output)
    output = Conv2DTranspose(16, (2, 2), padding='same')(output)
    # shape = 128, 128, 16

    output = UpSampling2D((2, 2))(output)
    # shape = 256, 256, 16

    diff_0 = subtract([l_out_0, r_out_0])
    output = concatenate([output, diff_0])
    # shape = 256, 256, 32
    output = Conv2DTranspose(32, (2, 2), padding='same')(output)
    output = Conv2DTranspose(16, (2, 2), padding='same')(output)
    output = Conv2DTranspose(1, (2, 2), padding='same')(output)
    # shape =  256, 256, 1

    model = tf.keras.models.Model(inputs=[left_in, right_in], outputs=output)

    return model


12 hours and two days later, I was ready to train my model. A [recent 2022 paper](https://ieeexplore.ieee.org/document/9467555) released a dataset of 20000 image pairs, and painstainkingly labelled masks for the purposes of training the very type of network I had wrote. So there I was, ready with data, my training loop [written from scratch](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch) and a freshly brewed cup of coffee, ready to type the all-so-crucial command
```bash
python src/train.py
```

But then, after about 15 seconds or so, the stacktrace in my terminal immediately gave me the sense that all was not right....
Garbled, nearly unintelligible collections of words, all hinting that I was running out of memory (somehow 64 Gigabytes of system RAM and an 8GB GPU wasn't enough?!), and then, the magic error message brought my model training to a screeching halt indicating something about my "protos" did not allow for such large graph-nodes (or something along those lines).

A quick side-quest: TensorFlow 2.x default mode of operation is _eager_ mode, when I hit run, the function runs as-is, and does not care on a low-level the command that came before or after. However, if using special decorators, there is a possibility for performance enhancement in using *Graph* execution, where a really smart piece of code optimally choses how to execute my hand-written code in an execution graph. To get a better understanding of this, see the [documentation](https://www.tensorflow.org/guide/intro_to_graphs).

# "proto"?
Now that you have an idea of what Graph-execution is, and a general idea of the error I was facing, there remains one vital gap in information: what the hell is a "proto"?! According to [this stackoverflow post](https://stackoverflow.com/questions/34128872/google-protobuf-maximum-size), Protobuf has a hard limit of 2GB, since the arithmetic used is typically 32-bit signed. As [this medium post explained](https://medium.com/@ouwenhuang/tensorflow-graphs-are-just-protobufs-9df51fc7d08d), TF graphs are simply protobufs. Each operation in TensorFlow are symbolic handles for graph-based operations, which are stored as [Protocol Buffers](https://developers.google.com/protocol-buffers). A Protocol Buffer (proto for short), are Google's language-neutral, extensible mechanism for serializing structured data. The specially generated code is used to easily read and write structured data (in this case a TensorFlow graph) regardly of data stream and programming language.

To the best of my understanding, my gigantic dataset was causing individual operations in the execution graph to exceed the proto hard-limit of 2GB, since I was using the `tf.Data` API and the `from_tensor_slices` function to keep my entire dataset in memory and perform operations from there. Now, the dataset is about 8GB large, wayyyyy smaller than my 64GB of RAM, however performing multiple layers of convolutions (not to mention, in _parallel_) quickly caused the entire training pipeline to shut down.

So I needed to somehow use this large dataset, but without having to keep all the images in memory, and for this, we now move to Python generators

# `yield`
A *generator* function allows you to declare a function that behaves like an iterator. For example, in order to read lines of a text file, I could do the following, which loads the entire file first, then returns it as a list. The downside of this is that the entire file must be kept in memory



In [None]:
def csv_reader(file_name):
    lines = []
    for row in open(file_name, "r"):
        lines.append(row)

    return lines

If instead, I do the following

In [None]:
def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

I could then call the `csv_reader` function as if it were an *iterator*, where the next row is loaded *only when the function is called* and the previous output (possibly already processed) is discarded.

# Generators and `tf.Data`

TensorFlow's `tf.Data` API is extremely powerful, and the ability to define a Dataset _from a generator_, is all the more powerful. So this is how I solved my issued from above, first I defined a generator for both train and validation sets: 

(the preprocessing functions section simply loads the image from its file path, converts them to floats and normalizes them)

In [None]:
#@title Preprocessing Functions
def _normalize(img):
    img = tf.cast(img, tf.float32) / 255.0
    return img


def decode_grey(img):
    img = tf.io.decode_png(img, channels=1)
    return img


def decode_rgb(img):
    img = tf.io.decode_png(img, channels=3)
    return img


def process_path_rgb(fp):
    img = tf.io.read_file(fp)
    img = decode_rgb(img)
    img = _normalize(img)
    return img


def process_path_grey(fp):
    img = tf.io.read_file(fp)
    img = decode_grey(img)
    return img


In [None]:
def train_gen(split='train', data_path='data/'):
    path = data_path + split
    for t1, t2, l in zip(sorted(os.listdir(path+'/time1')), sorted(os.listdir(path+'/time2')), sorted(os.listdir(path+'/label'))):
        # get full paths

        t1 = process_path_rgb(f'data/{split}/time1/' + t1)
        t2 = process_path_rgb(f'data/{split}/time2/' + t2)
        l = process_path_grey(f'data/{split}/label/' + l)

        yield (t1, t2), l

def val_gen(split='val', data_path='data/'):
    path = data_path + split
    for t1, t2, l in zip(sorted(os.listdir(path+'/time1')), sorted(os.listdir(path+'/time2')), sorted(os.listdir(path+'/label'))):
        # get full paths

        t1 = process_path_rgb(f'data/{split}/time1/' + t1)
        t2 = process_path_rgb(f'data/{split}/time2/' + t2)
        l = process_path_grey(f'data/{split}/label/' + l)

        yield (t1, t2), l

Not that since my model is a _Siamese_ neural network, it has two *heads* and therefore requires **two** inputs (t1 and t2 above refer to time-1 and time-2, or before-and-after, where l is the label mask indicating the areas that actually underwent change). Finally, I passed these generators to the `tf.Data` API calls as follows:

In [None]:
train_ds = tf.data.Dataset.from_generator(
    train_gen, output_types=((tf.float32, tf.float32), tf.uint8))
val_ds = tf.data.Dataset.from_generator(
    val_gen, output_types=((tf.float32, tf.float32), tf.uint8))

The following section is more for performance and batching, which again removes how much data is actually held in memory at any given point in time. The `from_generator` call achieves exactly what I wanted, where data is loaded on a as-needed basis, and (thus far) avoided my headache with Protocol buffers

In [None]:
#@title Batching Data
buffer_size = 1000
batch_size = 200

train_batches = (
    train_ds
    .cache()
    .shuffle(buffer_size)
    .batch(batch_size)
    .repeat()
    .prefetch(buffer_size=tf.data.AUTOTUNE))

val_batches = (
    val_ds
    .cache()
    .shuffle(buffer_size)
    .batch(batch_size)
    .repeat()
    .prefetch(buffer_size=tf.data.AUTOTUNE))

This is a very, **very** problem-specific post, however it does cover some key aspects of dealing with large sets of image data, TensorFlow and Python generators. I hope that you learnt something!

For any changes, suggestions or overall comments, feel free to reach out to me [on LinkedIn](https://www.linkedin.com/in/aadidev-sooknanan/) or on Twitter [@__aadiDev](https://twitter.com/__aadiDev__)