![](https://github.com/SauravMaheshkar/Herbarium2021/blob/main/assets/Banner.png?raw=true)

The Herbarium 2021: Half-Earth Challenge is to identify vascular plant specimens provided by the New York Botanical Garden (NY), Bishop Museum (BPBM), Naturalis Biodiversity Center (NL), Queensland Herbarium (BRI), and Auckland War Memorial Museum (AK).

The Herbarium 2021: Half-Earth Challenge dataset includes more than 2.5M images representing nearly 65,000 species from the Americas and Oceania that have been aligned to a standardized plant list (LCVP v1.0.2).

This kernel covers how to train a **EfficientNet** using a TFRecords dataset. The notebook is intended to be used on TPU.

<a id = 'basic'></a>
# Packages 📦 and Basic Setup

In [1]:
%%capture

# Install Weights and Biases 
!pip3 install wandb --upgrade >> /dev/null

# Packages
import os
import time
import logging
import re, math
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow.keras.backend as K
from sklearn.model_selection import KFold
from kaggle_datasets import KaggleDatasets

# Configure Logging Level
logger = tf.get_logger()
logger.setLevel(logging.ERROR)


# Weights and Biases Setup
import wandb
from wandb.keras import WandbCallback
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("WANDB_API_KEY")
wandb.login(key=api_key);

## Basic Hyperparameters 🪡

In [2]:
DEVICE = "TPU" 

GCS_PATH = KaggleDatasets().get_gcs_path('herb2021-256')

IMG_SIZES = 256

IMAGE_SIZE = [IMG_SIZES, IMG_SIZES]

BATCH_SIZE_SINGLE = 64

EPOCHS = 40

FOLDS = 10

N_CLASSES = 64500

## Device Configuration 🔌

In [3]:
if DEVICE == "TPU":
    print("connecting to TPU...")
    try:
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
        print('Running on TPU ', tpu.master())
    except ValueError:
        print("Could not connect to TPU")
        tpu = None

    if tpu:
        try:
            print("initializing  TPU ...")
            tf.config.experimental_connect_to_cluster(tpu)
            tf.tpu.experimental.initialize_tpu_system(tpu)
            
            strategy = tf.distribute.experimental.TPUStrategy(tpu)
            print("TPU initialized")
        except _:
            print("failed to initialize TPU")
    else:
        DEVICE = "GPU"

if DEVICE == "GPU":
    n_gpu = len(tf.config.experimental.list_physical_devices('GPU'))
    print("Num GPUs Available: ", n_gpu)
    
    if n_gpu > 1:
        print("Using strategy for multiple GPU")
        strategy = tf.distribute.MirroredStrategy()
    else:
        print('Standard strategy for GPU...')
        strategy = tf.distribute.get_strategy()

AUTO     = tf.data.experimental.AUTOTUNE
REPLICAS = strategy.num_replicas_in_sync

print(f'REPLICAS: {REPLICAS}')

BATCH_SIZE = BATCH_SIZE_SINGLE * REPLICAS
print(f'BATCH_SIZE: {BATCH_SIZE}')

connecting to TPU...
Running on TPU  grpc://10.0.0.2:8470
initializing  TPU ...
TPU initialized
REPLICAS: 8
BATCH_SIZE: 512


<a id = 'data'></a>
# 💿 Tensorflow Dataset from TFRecords

In [4]:
data_augmentation = tf.keras.Sequential([
  tf.keras.layers.experimental.preprocessing.RandomRotation(0.2, seed=12345),
])

def read_labeled_tfrecord(example):
    LABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string),
        "image_idx": tf.io.FixedLenFeature([], tf.string),
        'label' : tf.io.FixedLenFeature([], tf.int64)
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = example['label']
    return image, label 

def read_labeled_tfrecord_for_test(example):
    LABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string),
        "image_idx": tf.io.FixedLenFeature([], tf.string),
        'label' : tf.io.FixedLenFeature([], tf.int64)
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = example['label']
        
    return image, label 

def decode_image(image_data):
    image = tf.image.decode_jpeg(image_data, channels=3)
    image = tf.cast(image, tf.float32)    
    image = tf.reshape(image, [*IMAGE_SIZE, 3])
    return image

def count_data_items(filenames):
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) 
         for filename in filenames]
    return np.sum(n)

def load_dataset(filenames, labeled=True, ordered=False, isTest=False):
    ignore_order = tf.data.Options()
    
    if not ordered:
        ignore_order.experimental_deterministic = False

    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO)
    dataset = dataset.with_options(ignore_order)
    
    if isTest == False:
        dataset = dataset.map(read_labeled_tfrecord)
    else:
        dataset = dataset.map(read_labeled_tfrecord_for_test)
    
    return dataset

def get_training_dataset(filenames):
    dataset = load_dataset(filenames, labeled=True, isTest = False)
    dataset = dataset.repeat()
    dataset = dataset.shuffle(2048)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

def get_valid_dataset(filenames):
    dataset = load_dataset(filenames, labeled=True, isTest = True)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

def get_test_dataset(filenames):
    dataset = load_dataset(filenames, labeled=True, isTest = True, ordered=True)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

# The Model 👷‍♀️

## Transfer Learning

The main aim of transfer learning (TL) is to implement a model quickly i.e. instead of creating a DNN (dense neural network) from scratch, the model will transfer the features it has learned from the different dataset that has performed the same task. This transaction is also known as **knowledge transfer**.

---

## EfficientNetB4

![](https://github.com/SauravMaheshkar/X-Ray-Image-Classification/blob/main/assets/effnet.png?raw=true)

> Excerpt from Google AI Blog

**Convolutional neural networks (CNNs)** are commonly developed at a fixed resource cost, and then scaled up in order to achieve better accuracy when more resources are made available. For example, ResNet can be scaled up from ResNet-18 to ResNet-200 by increasing the number of layers. The conventional practice for model scaling is to arbitrarily increase the CNN depth or width, or to use larger input image resolution for training and evaluation. While these methods do improve accuracy, they usually require tedious manual tuning, and still often yield suboptimal performance. Instead, the authors of [**"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (ICML 2019)"**](https://arxiv.org/abs/1905.11946) found a more principled method to scale up a CNN to obtain better accuracy and efficiency.

They proposed a novel model scaling method that uses a simple yet highly effective **compound coefficient** to scale up CNNs in a more structured manner. Unlike conventional approaches that arbitrarily scale network dimensions, such as width, depth and resolution, their method uniformly scales each dimension with a fixed set of scaling coefficients. The resulting models named **EfficientNets**, superpassed state-of-the-art accuracy with up to **10x** better efficiency (**smaller and faster**).

In this project we'll use **`EfficientNetB4`** for training our Classifier. The Model can easily be instantiated using the **`tf.keras.applications`** Module, which provides canned architectures with pre-trained weights. For more details kindly visit [this](https://www.tensorflow.org/api_docs/python/tf/keras/applications) link. Unhide the below cell to see the `build_model()` function

In [5]:
tpu_data_augmentation = tf.keras.Sequential([
  tf.keras.layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical", seed=12345),
])

def build_model(dim = IMG_SIZES, ef = 0):
    inp = tf.keras.layers.Input(shape=(*IMAGE_SIZE, 3))
    
    base = tf.keras.applications.EfficientNetB3(include_top=False, weights='imagenet', 
                          input_shape=(*IMAGE_SIZE, 3), pooling='avg')
    
    x = tpu_data_augmentation(inp)
    x = base(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Dense(512)(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation('relu')(x)
    x = tf.keras.layers.Dense(N_CLASSES, activation='softmax')(x)
    
    model = tf.keras.Model(inputs = inp,outputs = x)
    
    opt = tf.keras.optimizers.Adam(learning_rate = 0.001)
    
    fn_loss = tf.keras.losses.SparseCategoricalCrossentropy() 

    model.compile(optimizer = opt, loss = [fn_loss], metrics=['accuracy'])
    
    return model

In [6]:
display_model = build_model(dim=IMG_SIZES)

display_model.summary()

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb3_notop.h5
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
sequential_1 (Sequential)    (None, 256, 256, 3)       0         
_________________________________________________________________
efficientnetb3 (Functional)  (None, 1536)              10783535  
_________________________________________________________________
batch_normalization (BatchNo (None, 1536)              6144      
_________________________________________________________________
dense (Dense)                (None, 512)               786944    
_________________________________________________________________
batch_normalization_1 (Batch (None, 512)               2048      
_______________________________

<a id = 'train'></a>
# Training 💪🏻

## LearningRate Scheduler

> From a [TowardsDataScience article](https://towardsdatascience.com/learning-rate-scheduler-d8a55747dd90)

In training deep networks, it is helpful to reduce the learning rate as the number of training epochs increases. This is **based on the intuition** that with a high learning rate, the deep learning model would possess high kinetic energy. As a result, it’s parameter vector bounces around chaotically. Thus, it’s unable to settle down into deeper and narrower parts of the loss function (local minima). If the learning rate, on the other hand, was very small, the system then would have low kinetic energy. Thus, it would settle down into shallow and narrower parts of the loss function (false minima).

<center> <img src = "https://miro.medium.com/max/668/1*iYWyu8hemMyaBlK6V-2vqg.png"> </center>

The above figure depicts that a high learning rate will lead to random to and fro moment of the vector around local minima while a slow learning rate results in getting stuck into false minima. Thus, knowing when to decay the learning rate can be hard to find out.

Decreasing the learning rate during training can lead to improved accuracy and (most perplexingly) reduced overfitting of the model. A piecewise decrease of the learning rate whenever progress has plateaued is effective in practice. Essentially this ensures that we converge efficiently to a suitable solution and only then reduce the inherent variance of the parameters by reducing the learning rate.

Here, we'll demonstrate how to use LearningRate schedules to automatically **adapt learning rates** that achieve the **optimal rate of convergence** for stochastic gradient descent. Unhide the cell to see the custom callback.

In [7]:
def get_lr_callback(batch_size=8):
    lr_start   = 0.0002
    lr_max     = 0.0002 * 10
    lr_min     = lr_start/2
    lr_ramp_ep = 6
    lr_sus_ep  = 10
    lr_decay   = 0.8
   
    def lrfn(epoch):
        if epoch < lr_ramp_ep:
            lr = (lr_max - lr_start) / lr_ramp_ep * epoch + lr_start
            
        elif epoch < lr_ramp_ep + lr_sus_ep:
            lr = lr_max
            
        else:
            lr = (lr_max - lr_min) * lr_decay**(epoch - lr_ramp_ep - lr_sus_ep) + lr_min
            
        return lr

    lr_callback = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose=True)
    
    return lr_callback

In [8]:
all_files = tf.io.gfile.glob(GCS_PATH + '/train*.tfrec')

num_total_files = len(all_files)

n_images = count_data_items(all_files)

print('Total number of files for train-validation:', num_total_files)
print('Total number of image for train-validation:', n_images)

Total number of files for train-validation: 226
Total number of image for train-validation: 2257759


In [9]:
def train_one_fold(fold, files_train, files_valid):
    VERBOSE = 1
    tStart = time.time()
    
    # Better Performance
    if DEVICE=='TPU':
        tf.tpu.experimental.initialize_tpu_system(tpu)
    
    # Build the Model
    K.clear_session()
    with strategy.scope():
        print('Building model...')
        model = build_model(dim=IMG_SIZES)
    
    # Callback to Save Model
    sv = tf.keras.callbacks.ModelCheckpoint('fold-%i.h5'%fold, monitor='val_loss', verbose=1, save_best_only=True,
                                            save_weights_only=True, mode='min', save_freq='epoch')
    
    # Train for One Fold
    history = model.fit(get_training_dataset(files_train), 
                        epochs=EPOCHS, 
                        callbacks = [sv, get_lr_callback(BATCH_SIZE), WandbCallback()], 
                        steps_per_epoch = count_data_items(files_train)/BATCH_SIZE//REPLICAS,
                        validation_data = get_valid_dataset(files_valid), 
                        validation_steps = count_data_items(files_valid)/BATCH_SIZE//REPLICAS,
                        verbose=VERBOSE)
    
    model.save('b3-aug.h5')
    
    #save it as model artifact on W&B
    artifact =  wandb.Artifact(name="b3-aug", type="weights")
    artifact.add_file('b3-aug.h5')
    wandb.log_artifact(artifact)
    
    # Record the Time Spent
    tElapsed = round(time.time() - tStart, 1)
    
    print(' ')
    print('Time (sec) elapsed: ', tElapsed)
    print('...')
    print('...')
    
    return history

In [10]:
SHOW_FILES = True

STOP_FOLDS = 0

skf = KFold(n_splits = FOLDS, shuffle = True, random_state=54321)

histories = []

for fold,(idxT,idxV) in enumerate(skf.split(np.arange(num_total_files))):
    print('')
    print('#'*60) 
    print('#### FOLD', fold+1)
    print('#### Epochs: %i' %(EPOCHS))
    print('#'*60)
    
    train_files = tf.io.gfile.glob([GCS_PATH + '/train%.3i*.tfrec'%x for x in idxT])
    valid_files = tf.io.gfile.glob([GCS_PATH + '/train%.3i*.tfrec'%x for x in idxV])
    
    if SHOW_FILES:
        print('Number of training images', count_data_items(train_files))
        print('Number of validation images', count_data_items(valid_files))
        
    run = wandb.init(project='Herbarium 2021', entity='sauravmaheshkar', reinit=True)
    
    history = train_one_fold(fold+1, train_files, valid_files)
    
    run.finish()
    
    histories.append(history)

    if fold >= STOP_FOLDS:
        break


############################################################
#### FOLD 1
#### Epochs: 40
############################################################


[34m[1mwandb[0m: Currently logged in as: [33msauravmaheshkar[0m (use `wandb login --relogin` to force relogin)


Number of training images 2027759
Number of validation images 230000


Building model...
Epoch 1/40

Epoch 00001: LearningRateScheduler reducing learning rate to 0.0002.

Epoch 00001: val_loss improved from inf to 7.64593, saving model to fold-1.h5
Epoch 2/40

Epoch 00002: LearningRateScheduler reducing learning rate to 0.0005.

Epoch 00002: val_loss improved from 7.64593 to 4.84418, saving model to fold-1.h5
Epoch 3/40

Epoch 00003: LearningRateScheduler reducing learning rate to 0.0007999999999999999.

Epoch 00003: val_loss improved from 4.84418 to 3.85706, saving model to fold-1.h5
Epoch 4/40

Epoch 00004: LearningRateScheduler reducing learning rate to 0.0011.

Epoch 00004: val_loss improved from 3.85706 to 3.55707, saving model to fold-1.h5
Epoch 5/40

Epoch 00005: LearningRateScheduler reducing learning rate to 0.0014.

Epoch 00005: val_loss improved from 3.55707 to 2.86509, saving model to fold-1.h5
Epoch 6/40

Epoch 00006: LearningRateScheduler reducing learning rate to 0.0017.

Epoch 00006: val_loss improved from 2.86509 to 2.55408, saving model 

VBox(children=(Label(value=' 1023.70MB of 1023.70MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, ma…

0,1
epoch,39.0
loss,0.24459
accuracy,0.93037
val_loss,1.12466
val_accuracy,0.78669
lr,0.00011
_runtime,9920.0
_timestamp,1620581607.0
_step,39.0
best_val_loss,1.11209


0,1
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
loss,█▆▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
accuracy,▁▂▃▄▄▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇▇▇▇█████████████████
val_loss,█▅▄▄▃▃▂▂▂▂▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▃▄▄▅▅▅▆▅▅▅▆▆▆▆▆▆▇▆▇▇▇▇█████████████████
lr,▁▂▄▅▆▇███████████▇▅▅▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁
_runtime,▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
