This Notebook presents a part of my code for the Plant Pathology 2021 - FGVC8 Kaggle challenge. 

It shows the code for the inference on the test data.  

This notebook version doesn't exactly correspond to the code I submitted for the competition, as the requirements specified that the internet access had to be disabled. Thus it wasn't possible to install python packages and to connect to the Google Cloud Storage bucket of the test data.  
One can get around that :
- By getting the data from the local file system instead of the GSC bucket, at the cost of not being able to use the TPUs in a straightforward way (but then GPUs could do the job). 
- It was possible not to install the extra packages by loading the models with the `tf.keras.models.load_model` function. In this case, some (quick & dirty) adaptations had to be made to get it to work properly.  

For the sake of clarity and simplicity, this notebook will avoid those particular technical complications, by showing an implementation using an internet access and TPUs.

Another thing to consider is that **only 3 images from the test set were accessible**, the remaing ~2700 images being hidden and only accessible during submission runtime. Thus the following code is applied on only 3 examples. The submissions took much longer time and processing power due to this difference.

Other parts of the code can be found here :
- [**ResNet50 model training**](https://github.com/antonindurieux/Plant_Pathology_2021-FGVC8_Kaggle_challenge/blob/master/1_plant-pathology-2021-fgvc8-resnet50-training.ipynb) ;
- [**EfficientNetB7 model training**]() ;
- [**Vision Transformer model training**]().

An article about this project can be found on my website [**here**](https://antonindurieux.github.io/portfolio/1_Kaggle_Plant_Pathology_2021_competition/).

## 1. Import and configuration

As Vision Transformers are not yet implemented in Keras and TensorFlow at the time of this writing, I used this helpful [python package](https://pypi.org/project/vit-keras/) which seemed to work well enough.  
I used [this implementation](https://github.com/qubvel/efficientnet) for the EfficientNet so I could properly load the Noisy Student weights.

In [24]:
!pip install --quiet vit-keras
!pip install --quiet efficientnet

In [25]:
# Imports
import os
import pandas as pd
import numpy as np
import seaborn as sns
from tqdm import tqdm
from matplotlib import pyplot as plt
from kaggle_datasets import KaggleDatasets
from sklearn.preprocessing import MultiLabelBinarizer
import tensorflow as tf
from tensorflow.keras.layers import Flatten, Dense
import tensorflow_addons as tfa
from efficientnet.tfkeras import EfficientNetB7
from vit_keras import vit

sns.set()

We configure the TPUs, the batch size and the image resolutions corresponding to the different models :

In [26]:
# TPU configuration
try:
    # TPU detection
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    print("Running on TPU ", tpu.cluster_spec().as_dict()["worker"])
    # Connection to TPU
    tf.config.experimental_connect_to_cluster(tpu)
    # Initialization of the TPU devices
    tf.tpu.experimental.initialize_tpu_system(tpu)
    # Create a state & distribution policy on the TPU devices
    strategy = tf.distribute.experimental.TPUStrategy(tpu)

except ValueError:
    print("Not connected to a TPU runtime. Using CPU/GPU strategy")
    strategy = tf.distribute.MirroredStrategy()

Running on TPU  ['10.0.0.2:8470']


In [27]:
# Small batch size due to memory constraints during submission
BATCH_SIZE = 32

In [28]:
# Image resolution for the different models
IMG_RES_RESNET = 400
IMG_RES_EFFNET = 600
IMG_RES_VIT = 608

## 2. Data imports and datasets generation

First, we will get the list of labels corresponding to the different pathologies (in the same order as for the training process so as not to mix everything up).  
Then we will get the test files list.  
Finally, we will generate a test dataset for each model.  

Using [Test Time Augmentation (TTA)](https://towardsdatascience.com/test-time-augmentation-tta-and-how-to-perform-it-with-keras-4ac19b67fb4d) significantly improved the results. The TTA steps were the same as for the training process. They are implemented in the dataset creation pipeline.

In [29]:
# Get the pathology labels 
train_label_csv = "../input/plant-pathology-2021-fgvc8/train.csv"
train_label_df = pd.read_csv(train_label_csv)
train_label_df['labels_list'] = train_label_df.labels.apply(lambda x: x.split(' '))

mlb = MultiLabelBinarizer()
mlb.fit(train_label_df.labels_list)
pathologies = mlb.classes_

In [30]:
# Get GCS bucket path
gcs_ds_path = KaggleDatasets().get_gcs_path("plant-pathology-2021-fgvc8")

# Get the images paths
test_images_path = gcs_ds_path + "/test_images/"
test_files_ls = tf.io.gfile.glob(test_images_path + '*.jpg')

In [31]:
def crop_center(image):
    """
    Crop an image to its central square
    """
    h, w = tf.shape(image)[-3], tf.shape(image)[-2]
    if h > w:
        cropped_image = tf.image.crop_to_bounding_box(image, (h - w) // 2, 0, w, w)
    else:
        cropped_image = tf.image.crop_to_bounding_box(image, 0, (w - h) // 2, h, h)
    return cropped_image

def test_time_augmentation(image, img_crop_resolution):
    """
    Apply Test Time Augmentation to images
    """
    image = tf.image.random_brightness(image, 0.3)
    image = tf.image.random_contrast(image, 1, 3)
    image = tf.image.random_saturation(image, 1, 1.3) 
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_flip_up_down(image)
    image = tf.image.random_crop(image, [img_crop_resolution, img_crop_resolution, 3])
    return image

def process_test_img(filepath, img_resize_resolution):
    """
    Read an image from its filepath, crop it to its central square and resize it
    """
    image = tf.io.read_file(filepath)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.convert_image_dtype(image, tf.float32) 
    image = crop_center(image)
    image = tf.image.resize(image, [img_resize_resolution, img_resize_resolution])
    return image

def get_test_dataset(filenames, img_resize_resolution, tta, img_crop_resolution=None):
    """
    Create the test dataset
    """
    dataset = tf.data.Dataset.from_tensor_slices(filenames)
    dataset = dataset.map(lambda x: process_test_img(x, img_resize_resolution))
    if tta:
        dataset = dataset.map(lambda x: test_time_augmentation(x, img_crop_resolution))
    dataset = dataset.batch(BATCH_SIZE)
    return dataset

In [32]:
ds_test_resnet = get_test_dataset(test_files_ls, img_resize_resolution=450, tta=True, img_crop_resolution=IMG_RES_RESNET)
ds_test_effnet = get_test_dataset(test_files_ls, img_resize_resolution=700, tta=True, img_crop_resolution=IMG_RES_EFFNET)
# No TTA for the Vision Transformer model
ds_test_vit = get_test_dataset(test_files_ls, img_resize_resolution=IMG_RES_VIT, tta=False)

## 3. Inference

Now we will load the 3 different models, and proceed to the inference process.  

Due to time limitations at submission runtime, I was limited in the number of TTA steps I could apply. The best compromise between runtime duration and performance was obtained with :
- ResNet50 : 2 TTA steps
- EfficentNetB7 : 2 TTA steps
- Vision Transformer : no TTA.  

The TTA results matrices will be averaged for each model.

This process took ~2 hours on the full test set at runtime.

### 3.1 ResNet50

In [33]:
resnet_model_path = "../input/resnet-tpu-v2/resnet_tpu_v2.h5"

In [34]:
with strategy.scope():
    resnet_model = tf.keras.models.load_model(resnet_model_path, compile=False)
    resnet_model.compile()

In [35]:
resnet_tta_steps = 2
predictions = []

for i in tqdm(range(resnet_tta_steps)):
    resnet_preds = resnet_model.predict(ds_test_resnet, batch_size=BATCH_SIZE, verbose=1)
    predictions.append(resnet_preds)

resnet_preds_tta = np.mean(predictions, axis=0)

  0%|          | 0/2 [00:00<?, ?it/s]



 50%|█████     | 1/2 [00:17<00:17, 17.51s/it]



100%|██████████| 2/2 [00:19<00:00,  9.52s/it]


### 3.2 EfficientNetB7

In [36]:
effnet_model_path = "../input/effnetb7/effnetB7.h5"

In [37]:
with strategy.scope():
    effnet_model = tf.keras.models.load_model(effnet_model_path, compile=False)
    effnet_model.compile()

In [38]:
effnet_tta_steps = 2
predictions = []

for i in tqdm(range(effnet_tta_steps)):
    effnet_preds = effnet_model.predict(ds_test_effnet, batch_size=BATCH_SIZE, verbose=1)
    predictions.append(effnet_preds)

effnet_preds_tta = np.mean(predictions, axis=0)

  0%|          | 0/2 [00:00<?, ?it/s]



 50%|█████     | 1/2 [00:33<00:33, 33.68s/it]



100%|██████████| 2/2 [00:35<00:00, 17.66s/it]


### 3.3 Vision Transformer

In [39]:
vit_model_path = "../input/vit-model-600x600/vit_model_600x600.h5"

In [40]:
# The Vision Transformer model has to be built again so that the weights loading works properly
n_labels = len(pathologies)
inputs = tf.keras.Input(shape=(IMG_RES_VIT, IMG_RES_VIT) + (3,))

with strategy.scope():
    vit_model = vit.vit_b16(
        image_size = IMG_RES_VIT,
        activation = 'sigmoid',
        pretrained = False,
        include_top = False,
        pretrained_top = False,
        classes = len(pathologies))
    
    x = vit_model(inputs, training=True)
    x = Flatten()(x)
    outputs = Dense(n_labels, activation = 'sigmoid')(x)

    vit_model = tf.keras.Model(inputs, outputs)

In [41]:
vit_model.load_weights(vit_model_path)

In [42]:
vit_preds = vit_model.predict(ds_test_vit, batch_size=BATCH_SIZE, verbose=1)



## 4. Outputs processing

Now we will average the 3 prediction probability matrices.  
Subsequent processing on the resulting matrix will be :
- To apply thresholds to get labels from probabilities. I just applied a threshold of 0.5 for each label in my final solution.
- If all the probabilites are below the tresholds for a particular image, we will select the maximum probability label.  

Then a submission csv file can be generated and we are done !

In [43]:
# Averaging the predictions of the 3 models
mean_predictions = np.mean([effnet_preds_tta, resnet_preds_tta, vit_preds], axis=0)

In [44]:
def format_predictions(preds, files, thresholds, fill_no_label=False, labels=pathologies, oh_labels=True):
    """
    Format predictions to get a DataFrame from the prediction matrix

    Args:
        preds (float32 numpy array): predictions matrix (N_IMAGES, N_LABELS)
        files (list): list of image files
        thresholds (list): list of prediction thresholds associated with each labels 
        fill_no_label (boolean): wether or not to fill empty predictions with argmax
        labels (list): list of labels names
        oh_labels (boolean): wether or not to get booleans associated with labels in the output DataFrame

    Returns:
        predictions_df (DataFrame): predictions DataFrame
    """

    preds_copy = preds.copy()

    # Handling no label cases
    if fill_no_label:
        for i in range(preds_copy.shape[0]):
            if np.all(preds_copy[i, :] < thresholds):
                preds_copy[i, np.argmax(preds_copy[i, :])] = 1

    # Apply thresholds to get boolean values
    for j in range(preds_copy.shape[1]):
        preds_copy[:, j] = np.where(preds_copy[:, j] < thresholds[j], 0, 1)

    # Reverse MultiLabelBinarizer
    mlb_predictions = mlb.inverse_transform(preds_copy)
    mlb_predictions = [' '.join(x) for x in mlb_predictions]

    # Create the output DataFrame
    predictions_series = pd.Series(mlb_predictions, name="labels")
    oh_predictions_df = pd.DataFrame(data=preds_copy, columns=labels)
    file_names = [x.split('/')[-1] for x in files]
    file_names_series = pd.Series(file_names, name="file_name")
    predictions_df = pd.concat([file_names_series, predictions_series], axis=1)

    # Get one-hot-labels in the output DataFrame
    if oh_labels:
        predictions_df = pd.concat([predictions_df, oh_predictions_df], axis=1)

    return predictions_df

In [45]:
test_img_names = [file_path.split('/')[-1] for file_path in test_files_ls]

# Thresholds will be 0.5 for each label
thresholds = [0.5] * n_labels

# Final predictions DataFrame
predictions_df = format_predictions(mean_predictions, test_img_names, thresholds, fill_no_label=True, labels=pathologies, oh_labels=False)
predictions_df

Unnamed: 0,file_name,labels
0,85f8cb619c66b863.jpg,scab
1,ad8770db05586b59.jpg,frog_eye_leaf_spot
2,c7b03e718489f3ca.jpg,frog_eye_leaf_spot


In [46]:
predictions_df.to_csv('submission.csv', index=False)