## Fine Tuning ResNet with Aqueduct

In this notebook, we'll use Tensorflow's pre-built Keras model and fine tune it to detect the difference between comuflage clothes and regular clothes. This example is inpsired by [this blog post](https://pyimagesearch.com/2020/04/27/fine-tuning-resnet-with-keras-tensorflow-and-deep-learning/) from `pyimagesearch`.

Note that this notebook makes two assumptions: 
1. You have your Aqueduct server connected to a Kubernetes cluster with a GPU node group enabled. The easiest way to set this up is to use a hosted Kubernetes offering like AWS EKS or GKE. See [our documentation](https://docs.aqueducthq.com/integrations/compute-systems/kubernetes) for more details on connecting Aqueduct to Kubernetes.
2. You have an object store (e.g., AWS S3) connected with the dataset from the above blog post stored in it. 

We'll start by creating an Aqueduct client.

In [None]:
import aqueduct as aq
from aqueduct import op, metric

K8S_RESOURCE_NAME = 'eks-us-east-2' # REPLACE ME!

client = aq.Client()

# This line sets Aqueduct to run in lazy mode, since some of our compute can be expensive,
# and it sets all functions to run on the EKS cluster we've connected to.
aq.global_config({"lazy": True, "engine": K8S_RESOURCE_NAME})

Next, we'll load a connection to our S3 bucket that has our datasets and use Aqueduct's API to retrieve a pointer to our dataset in that S3 bucket.

In [None]:
DATASET_BUCKET_NAME = 'datasets' # REPLACE ME!
DATASET_PATH = 'resnet-data/resnet.zip' # REPLACE ME!

datasets = client.resource(DATASET_BUCKET_NAME)

# Due to the way S3 works, it's more efficient for us to load a large zipped file 
# rather than many small files. As a result, we load a zipfile.
dataset = datasets.file(filepaths=DATASET_PATH, artifact_type='bytes')

# Our dataset has two classes, for camouflage clothes and regular clothes.
CLASSES = ["camouflage_clothes", "normal_clothes"]

Now that we have our data, we can define an Aqueduct operator that is going to fine-tune ResNet. This code is mostly adapted from the blog post linked above. 

Roughly, this function: 
1. Loads in the dataset and prepares it for iteration. 
2. Loads in the ResNet model and adds layers for fine-tuning.
3. Fine-tunes the model for a configurable number of epochs. 
4. Returns the model. 

The `@op` decorator allows us to tell Aqueduct that we want this function to have 15GB of RAM and also to have a GPU. In the code below, we use Tensorflow to run the fine-tuning algorithm on the GPU.

You'll notice that, in additon to the dataset, `fine_tune_resnet` takes in a number of other parameters: `batch_size`, `num_epochs`, and `init_lr`. Aqueduct can automatically create parameters for you, which we'll see when we call `fine_tune_resnet` below, and we can also create parameters that are shared across functions. 

In this case, we'll want to use the same `batch_size` for fine-tuning and evaluation, so we'll first define a parmater for the `batch_size` with a default value of 32.

In [None]:
batch_size = client.create_param('batch_size', default=32)

@op(
    engine='eks-us-east-2',
    resources={
        'memory': '15GB',
        'gpu_resource_name': 'nvidia.com/gpu',
    },
    requirements=['tensorflow'],
)
def fine_tune_resnet(dataset, batch_size, num_epochs, init_lr):
    import tensorflow as tf
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    from tensorflow.keras.layers import AveragePooling2D, Dropout, Flatten, Dense, Input
    from tensorflow.keras.models import Model
    from tensorflow.keras.optimizers.legacy import Adam # Note this model uses a legacy version of the Adam optimizer.
    from tensorflow.keras.applications import ResNet50
    
    import zipfile
    import os
    import numpy as np
    
    # Write the zipfile to disk and unzip it. The ImageDataGenerator package from TensorFlow only 
    # recognizes files on disk.
    with open('resnet-data-test.zip', 'wb') as f:
        f.write(dataset)
    
    with zipfile.ZipFile('resnet-data-test.zip', 'r') as zip_ref:
        zip_ref.extractall('./')

    train_path = 'resnet-data/training'
    val_path = 'resnet-data/validation'
    
    total_train = len(
        os.listdir(os.path.join(train_path, 'normal_clothes'))
    ) + len(
        os.listdir(os.path.join(train_path, 'camouflage_clothes'))
    )
    
    total_val = len(
        os.listdir(os.path.join(val_path, 'normal_clothes'))
    ) + len(
        os.listdir(os.path.join(val_path, 'camouflage_clothes'))
    )

    # Run the following code on a GPU.
    with tf.device('/GPU:0'):
        # Initialize the training data augmentation object with some 
        # pre-defined parameters.
        trainAug = ImageDataGenerator(
            rotation_range=25,
            zoom_range=0.1,
            width_shift_range=0.1,
            height_shift_range=0.1,
            shear_range=0.2,
            horizontal_flip=True,
            fill_mode="nearest"
        )

        # Initialize the validation data augmentation object, which
        # we'll be add mean subtraction to below.
        valAug = ImageDataGenerator()

        # Define the ImageNet mean subtraction (in RGB order) and set the
        # the mean subtraction value for each of the data augmentation
        # objects.
        mean = np.array([123.68, 116.779, 103.939], dtype="float32")
        trainAug.mean = mean
        valAug.mean = mean

        # Use our augmentation objects to create the data generators that will 
        # allow us to iterate over our training and validation data.
        trainGen = trainAug.flow_from_directory(
            train_path,
            class_mode="categorical",
            target_size=(224, 224),
            color_mode="rgb",
            shuffle=True,
            batch_size=batch_size
        )
        valGen = valAug.flow_from_directory(
            val_path,
            class_mode="categorical",
            target_size=(224, 224),
            color_mode="rgb",
            shuffle=False,
            batch_size=batch_size
        )

        # Load the base model from the Keras ResNet50 implementation.
        baseModel = ResNet50(
            weights="imagenet", 
            include_top=False,
            input_tensor=Input(shape=(224, 224, 3))
        )

        # Construct the head of the model that will be placed on top of the
        # the base model.
        headModel = baseModel.output
        headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
        headModel = Flatten(name="flatten")(headModel)
        headModel = Dense(256, activation="relu")(headModel)
        headModel = Dropout(0.5)(headModel)
        headModel = Dense(len(CLASSES), activation="softmax")(headModel)

        # Place the head FC model on top of the base model. This will become
        # the actual model we will train.
        model = Model(inputs=baseModel.input, outputs=headModel)

        # Loop over all layers in the base model and freeze them so they will
        # *not* be updated during the training process.
        for layer in baseModel.layers:
            layer.trainable = False

        # Initialize our optimizer, compile the model, and train the model.
        opt = Adam(lr=init_lr, decay=init_lr / num_epochs)
        model.compile(
            loss="binary_crossentropy", 
            optimizer=opt,
            metrics=["accuracy"]
        )

        H = model.fit_generator(
            trainGen,
            steps_per_epoch=total_train // batch_size,
            validation_data=valGen,
            validation_steps=total_val // batch_size,
            epochs=num_epochs
        )

    return model

Now that we've defined our fine-tuning function, we can call it. Let's look at the arguments to this function:
1. `dataset` is defined above as a pointer to the dataset that lives in S3. 
2. `batch_size` is the parameter that we defined above that will be shared between this function and the next.
3. `num_epochs` determines for how many epochs we train the model; we can simply pass in 1, and Aqueduct will convert it to a parameter for us. 
4. `init_lr` is our initial leraning rate; again Aqueduct automatically turns it into a parameter for us.

In [None]:
model = fine_tune_resnet(dataset, batch_size, 1, 1e-4)

Finally, we're going to define a function that calculates the accuracy of the model we've trained on the testing data in our original dataset. This function is going to take in a pointer to the model we just saved as well as the dataset and our `batch_size` parameter and return an accuracy score for the model we just fine-tuned

In [None]:
@op(
    resources={
        'memory': '15GB',
        'gpu_resource_name': 'nvidia.com/gpu',
    },
    requirements=['tensorflow']
)
def calculate_accuracy(model, dataset, batch_size): 
    import tensorflow as tf
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    from sklearn.metrics import classification_report
    
    import zipfile
    import os
    import numpy as np

    # Write the zipfile to disk and unzip it. The ImageDataGenerator package from TensorFlow only 
    # recognizes files on disk or in a DataFrame.
    with open('resnet-data-test.zip', 'wb') as f:
        f.write(dataset)
    
    with zipfile.ZipFile('resnet-data-test.zip', 'r') as zip_ref:
        zip_ref.extractall('./')

    # For this run, we only load the testing data.
    test_path = 'resnet-data/testing'
    total_test = len(
        os.listdir(os.path.join(test_path, 'normal_clothes'))
    ) + len(
        os.listdir(os.path.join(test_path, 'camouflage_clothes'))
    )
    
    with tf.device('/GPU:0'):
        # Initialize the testing data augmentation object, which
        # we'll add mean subtraction to.
        valAug = ImageDataGenerator()

        # Define the ImageNet mean subtraction (in RGB order) and set the
        # the mean subtraction value for each of the data augmentation
        # objects.
        valAug.mean = np.array([123.68, 116.779, 103.939], dtype="float32")

        testGen = valAug.flow_from_directory(
            test_path,
            class_mode="categorical",
            target_size=(224, 224),
            color_mode="rgb",
            shuffle=False,
            batch_size=batch_size
        )

        # Generate predictions for each of the images in our testing set.
        predIdxs = model.predict_generator(
            testGen,
            steps=(total_test // batch_size) + 1
        )

        # For each image in the testing set we need to find the index of the
        # label with corresponding largest predicted probability.
        predIdxs = np.argmax(predIdxs, axis=1)

    # Use SKLearn to generate a classification report and return its accuracy score.
    res = classification_report(
        testGen.classes, 
        predIdxs,
        target_names=testGen.class_indices.keys(),
        output_dict=True
    )
    
    return res['accuracy']

We can calculate our accuracy score and see the result right here in our notebook. Since we're fine-tuning the model, running this pipeline will take us about 7-10 minutes.

In [None]:
accuracy = calculate_accuracy(model, dataset, batch_size)

accuracy.get()

Finally, we can publish this workflow to Aqueduct. All we need to do is give our workflow a name, and send it off to the Aqueduct server:

In [None]:
from textwrap import dedent

datasets.save(model, "resnet-finetune.model")

client.publish_flow(
    "Fine-Tune ImageNet",
    dedent("""
    Fine-tune Tensorflow's ResNet-50 model to differentiate camouflage clothes from regular clothes.
    """),
    artifacts=[model, accuracy],
)