# ILSVRC2012 Animal Subset Hierarchical Classification (Tensorflow 2.2.0)

    Author: Lukas Friedrichsen (friedrichsen.luk@googlemail.com)
    License: Apache License, Version 2.0

Description: In this notebook, different experiments are conducted on various scenarios from the field of image recognition to assess the viability of modularization as a technique to counteract specific inherent shortcomings of Neural Networks. In the first experiment, the performance of a hierarchically composed network is evaluated and compared to a monolithic benchmark model similar to the first experiment on the CIFAR100 dataset. However, instead of the latter, the disproportionally more complex ILSVRC2012 animal subset is used as the reference dataset. The goal of this experiment is to assess the general applicability of the proposed approach, i. e. to evaluate whether the results on the CIFAR100 dataset can be qualitatively reproduced on a completely different dataset with a different reference network architecture. Furthermore, since the ImageNet animal subset infers a sufficient number of hierarchical layers, we also examine the propagation of modularization errors in the composed network in a second experiment.

## Table of contents

1. [Imports](#imports)
2. [Configuration](#config)
3. [Loading the dataset](#load)
  1. [ILSVRC2012 dataset (complete)](#load_ilsvrc2012_complete)
  2. [ILSVRC2012 dataset (animal subset)](#load_ilsvrc2012_animal_subset)
4. [Mapping synset relationships](#synset_mapping)
5. [Processing and augmentation](#processing_augmentation)
6. [Model template (VGG)](#template)
7. [Composed Network (CompNet)](#compnet)
  1. [Model](#compnet_model)
  2. [Training](#compnet_train)
  3. [Testing](#compnet_test)
8. [Benchmark: VGG-19 (Simonyan et. al, 2015)](#benchmark)
  1. [Model](#benchmark_model)
  2. [Training](#benchmark_train)
  3. [Testing](#benchmark_test)

---
## Imports
<a id ='imports'></a>

In [None]:
import os
import time

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import matplotlib as mpl

# Matplotlib configuration

# General settings
mpl.rcParams['axes.grid'] = True
mpl.rcParams['grid.alpha'] = 0.5
mpl.rcParams['grid.linestyle'] = '--'
mpl.rcParams['legend.framealpha'] = 1.0
mpl.rcParams['savefig.bbox'] = 'tight'

# Font sizes
mpl.rcParams['axes.labelsize'] = 15
mpl.rcParams['axes.titlesize'] = 15
mpl.rcParams['figure.titlesize'] = 20
mpl.rcParams['legend.fontsize'] = 15
mpl.rcParams['xtick.labelsize'] = 15 
mpl.rcParams['ytick.labelsize'] = 15

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
print('Tensorflow: v{}'.format(tf.__version__))
print('Tensorflow Datasets: v{}'.format(tfds.__version__))

In [None]:
# Configure the storage resp. lookup directory for the WordNet corpus
# import nltk
# nltk.data.path = ['${HOME}/.pyenv/versions/3.6.9/share/nltk_data/']

# Download the WordNet corpus and extract it to the specified directory (make sure the directory
# exists prior to execution)
# nltk.download('wordnet', download_dir=nltk.data.path[0])

from nltk.corpus import wordnet as wn

In [None]:
# Set random seed for reproducability
tf.random.set_seed(42)

---
## Configuration
<a id ='config'></a>

In [None]:
# Download and storage locations for Tensorflow Datasets
TFDS_DATA_DIR = '/Volumes/Data/tensorflow_datasets/'
TFDS_DOWNLOAD_DIR = '/Volumes/Data/tensorflow_datasets/downloads/'

In [None]:
# Storage locations for model checkpoints and training logs
CKPT_DIR = '.model_checkpoints/ilsvrc2012/'
LOG_DIR = 'logs/ilsvrc2012/'

In [None]:
# Storage location for data from experiments
RESULTS_DIR = 'results/'

In [None]:
# Dataset specific configuration

# Storage locations for data from experiments on the composed resp. benchmark network
ILSVRC2012_RESULTS_DIR_COMPNET = RESULTS_DIR + 'ilsvrc2012/compnet/'
ILSVRC2012_RESULTS_DIR_BENCHMARK = RESULTS_DIR + 'ilsvrc2012/benchmark/'

# Storage location of the ILSVRC2012 animal subset
ILSVRC2012_ANIMAL_SUBSET_TRAIN_FILE = TFDS_DATA_DIR + 'imagenet2012/animal_subset/train.tfrecord'
ILSVRC2012_ANIMAL_SUBSET_VAL_FILE = TFDS_DATA_DIR + 'imagenet2012/animal_subset/val.tfrecord'
ILSVRC2012_ANIMAL_SUBSET_TEST_FILE = TFDS_DATA_DIR + 'imagenet2012/animal_subset/test.tfrecord'

# File containing the WordNet synset IDs corresponding to the numeric labels of the ILSVRC2012 dataset
# Label 'i' in the dataset corresponds to the i-th entry of the file
ILSVRC2012_LABELS_TO_WNIDS_FILE = TFDS_DATA_DIR + 'imagenet2012/2.0.1/label.labels.txt'

# Keys of the dataset fields containing images and labels
ILSVRC2012_IMG_KEY = 'image'
ILSVRC2012_LABEL_KEY = 'label'

# Synset level of the different taxonomic label categories (indicating the hierarchical relation between
# the different categories)
ILSVRC2012_PHYLUM_LABEL_LEVEL = 0
ILSVRC2012_CLASS_LABEL_LEVEL = 1
ILSVRC2012_ORDER_LABEL_LEVEL = 2
ILSVRC2012_SPECIES_LABEL_LEVEL = 3

In [None]:
# Preprocessing configuration

# Quality of the JPEG compression (from 0 to 100 with higher numbers indicating better quality at the
# cost of higher processing time and a larger storage footpring)
JPEG_COMP_QUALITY = 80

# Desired size the smallest side of the images
MIN_IMG_SIZE = 256

# Cropping dimensions
CROP_SIZE_H = 224
CROP_SIZE_W = 224

# RGB channel value range
RGB_MIN_VAL = 0
RGB_MAX_VAL = 255

---
## Loading the dataset
<a id='load'></a>

### ILSVRC2012 dataset (complete)
<a id='load_ilsvrc2012_complete'></a>

In [None]:
[ilsvrc2012_train_raw, ilsvrc2012_val_raw, ilsvrc2012_test_raw], ilsvrc2012_info = tfds.load('imagenet2012',
                                                                                             split=[tfds.Split.TRAIN.subsplit(tfds.percent[:80]),
                                                                                                    tfds.Split.TRAIN.subsplit(tfds.percent[-20:]),
                                                                                                    tfds.Split.VALIDATION],
                                                                                             data_dir=TFDS_DATA_DIR,
                                                                                             download_and_prepare_kwargs={'download_dir': TFDS_DOWNLOAD_DIR},
                                                                                             with_info=True)
print(ilsvrc2012_info)

### ILSVRC2012 dataset (animal subset)
<a id='load_ilsvrc2012_animal_subset'></a>

Extracting the animal subset from the (raw) ILSVRC2012 dataset

In [None]:
def ilsvrc2012_serialize_record(image, label):
    '''Serializes a given record in the form (`image`, `label`) utilizing tf.train.Example
    
    Args:
        image: 3-D image `Tensor`
        label (str): Label associated to `img`
    
    Returns:
        serialized_record: Tensor containing the serialized record in string format 
    '''
    
    # JPEG-encode the image to reduce its storage footprint
    image = tf.io.encode_jpeg(image, quality=JPEG_COMP_QUALITY)

    # Create a dictionary mapping the feature names to the respective tf.train.Example-compatible
    # data type
    record = {
        ILSVRC2012_IMG_KEY: tf.train.Feature(
            bytes_list=tf.train.BytesList(value=[image.numpy()])),
        ILSVRC2012_LABEL_KEY: tf.train.Feature(
            bytes_list=tf.train.BytesList(value=[label.numpy()]))
    }
    
    return tf.train.Example(features=tf.train.Features(feature=record)).SerializeToString()

# TensorFlow wrapper to be able to apply this function to placeholder object, thus being able to
# employ it in `tf.data.Dataset.map` functions
def tf_ilsvrc2012_serialize_record(image, label):
    return tf.py_function(ilsvrc2012_serialize_record, inp=(image, label), Tout=tf.dtypes.string)

In [None]:
# Extract and prepare the animal subset from the (raw) ILSVRC2012 dataset
# Annotation: This cell needs to be run only once; afterwards, the animal subset can be loaded from
# storage as demonstrated in the subsequent cell. Furthermore, the extraction and preparation uses
# functions from the 'Mapping synset relationships' and 'Processing and Augmentation' sections. Thus
# these need to be run before executing this cell. Nevertheless, we do think that it makes sense to
# place the latter at this point rather than later in the course for reasons of context.

datasets = [ilsvrc2012_train_raw, ilsvrc2012_val_raw, ilsvrc2012_test_raw]
file_paths = [ILSVRC2012_ANIMAL_SUBSET_TRAIN_FILE,
              ILSVRC2012_ANIMAL_SUBSET_VAL_FILE,
              ILSVRC2012_ANIMAL_SUBSET_TEST_FILE]

# List of all synsets to be included in the ILSVRC2012 animal subset
animal_synsets = ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(ILSVRC2012_SPECIES_LABEL_LEVEL)

for dataset, file_path in zip(datasets, file_paths):
    # Manually filter out ambiguous concepts `cardigan` and `crane`
    dataset = dataset.filter(
        lambda record: not tf.math.reduce_any(
            [tf.math.equal(record[ILSVRC2012_LABEL_KEY], label) for label in [474, 517]]))
    
    # Decode numeric labels
    dataset = dataset.map(
        lambda record: {ILSVRC2012_IMG_KEY: record[ILSVRC2012_IMG_KEY],
                        ILSVRC2012_LABEL_KEY: tf_ilsvrc2012_decode_native(record[ILSVRC2012_LABEL_KEY], use_wnid=False)},
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    # Filter out all entries that do not belong to the animal subset (as defined in `ILSVRC2012_SYNSET_MAP`)
    dataset = dataset.filter(
        lambda record: tf.math.reduce_any(
            [tf.math.equal(record[ILSVRC2012_LABEL_KEY], synset) for synset in animal_synsets]))
    
    # Resize images so that the smallest side is `MIN_IMG_SIZE` pixels afterwards
    dataset = dataset.map(
        lambda record: {ILSVRC2012_IMG_KEY: aspect_preserving_resize(record[ILSVRC2012_IMG_KEY], MIN_IMG_SIZE),
                        ILSVRC2012_LABEL_KEY: record[ILSVRC2012_LABEL_KEY]},
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    # Serialize the filtered and processed records
    dataset = dataset.map(
        lambda record: tf_ilsvrc2012_serialize_record(record[ILSVRC2012_IMG_KEY], record[ILSVRC2012_LABEL_KEY]),
        num_parallel_calls=tf.data.experimental.AUTOTUNE)

    # Write the serialized records to their respecitve storage location
    writer = tf.data.experimental.TFRecordWriter(file_path, compression_type='GZIP')
    writer.write(dataset)

Loading the ILSVRC2012 animal subset

In [None]:
# Load and parse the animal subset from the (raw) ILSVRC2012 dataset
# Annotation: Subsequently, the usage of the ILSVRC2012 animal subset is assumed, not the raw dataset.

# Template of the structure of the individual records to apply when parsing the serialized dataset
record_template = {
    ILSVRC2012_IMG_KEY: tf.io.FixedLenFeature([], tf.dtypes.string, default_value=''),
    ILSVRC2012_LABEL_KEY: tf.io.FixedLenFeature([], tf.dtypes.string, default_value='')
}

datasets = []
file_paths = [ILSVRC2012_ANIMAL_SUBSET_TRAIN_FILE,
              ILSVRC2012_ANIMAL_SUBSET_VAL_FILE,
              ILSVRC2012_ANIMAL_SUBSET_TEST_FILE]

for file_path in file_paths:
    dataset = tf.data.TFRecordDataset(file_path, compression_type='GZIP')
    
    # Parse the serialized `tf.train.Example` protos
    dataset = dataset.map(
        lambda record: tf.io.parse_single_example(record, record_template),
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    # JPEG-decode the images
    dataset = dataset.map(
        lambda record: {ILSVRC2012_IMG_KEY: tf.io.decode_jpeg(record[ILSVRC2012_IMG_KEY]),
                        ILSVRC2012_LABEL_KEY: record[ILSVRC2012_LABEL_KEY]},
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    datasets.append(dataset)
    
[ilsvrc2012_train_raw, ilsvrc2012_val_raw, ilsvrc2012_test_raw] = datasets

In [None]:
# Print a sample image to make sure loading worked correctly

fig, ax = plt.subplots(1, 3, figsize=(20, 5))
titles = ['Train', 'Validation', 'Test']

for idx, dataset in enumerate([ilsvrc2012_train_raw, ilsvrc2012_val_raw, ilsvrc2012_test_raw]):
    for record in dataset.take(1):
        ax[idx].imshow(record[ILSVRC2012_IMG_KEY])
        ax[idx].set_title(titles[idx])
        ax[idx].axis('off')
    
fig.show()

In [None]:
# Print the distribution of the train, validation and test dataset to validate, that the former is
# representative for the latter two (requires the initalization of `ILSVRC2012_SYNSET_MAP` prior to
# execution (cf. below))

fig, ax = plt.subplots(1, 3, figsize=(20, 5))
titles = ['Train', 'Validation', 'Test']

for idx, dataset in enumerate([ilsvrc2012_train_raw, ilsvrc2012_val_raw, ilsvrc2012_test_raw]):
    # Get the maximum label (called 'label depth' in Tensorflow)
    label_depth = ILSVRC2012_NUM_LABELS_SPECIES_LAYER
    
    # Initialize the label distribution
    dist = [0] * label_depth
    
    # Get the label distribution for the current dataset
    for record in dataset:
        dist[ilsvrc2012_encode(decode_string(record['label']))[0]] += 1
        
    # Normalize the distribution
    dist = list(map(lambda entry: entry / sum(dist), dist)) 
                
    # Plot the distribution
    ax[idx].bar(range(0, len(dist)), dist, width=1)
    ax[idx].set_title(titles[idx])
    ax[idx].set_ylim([0, 0.005])
    ax[idx].set_xlabel('Labels')
    ax[idx].set_ylabel('Share')
    
fig.show()

---
## Mapping synset relationships
<a id='synset_mapping'></a>

In this section, we create a mapping between hyper- and hyponyms (i.e. coarse and fine labels) in analogy to the wordnet corpus underlying ImageNet to be able to model the relations between the different hierarchy levels of labels and as a basis for the structure of the composed network.

In [None]:
# Annotation: We'll eventually outsource this class into a dedicated module, thus we're including
# necessary imports, etc. here instead of putting them at the top of the notebook together with
# the rest to keep all ressources in one place.

from copy import deepcopy

class synset_map(object):
    '''Representational model for hierarchical syntactical structures
    
    This class implements a representational model for (injective) hierarchical syntactical structures.
    It provides the necessary functionalities to create a mapping between multilayered hyper- and
    hyponym compositions as well as to trace the inherent relations as well as to measure the semantic
    distance between different synsets.
    
    Args:
        synset_map (optional): Representational model for a hierarchical syntactical structure (e.g.
            manually constructed; takes highest priority if provided together with `dataset` and
            `keys`)
        dataset (optional): Dataset-like structure that can be accessed via `keys`and that contains
            the hyper- and hyponyms whose relation is to be mapped (assuming an unambiguous, injective
            structure of synsets)
        keys (optional): List of keys that can be used to access the fields of `dataset` that contain
            the synset specifiers
    
    Attributes:
        synset_map (dict): Nested structure of dicts that serves to store the hierarchical relationships
            between the different synsets
    '''
    
    def __init__(self, synset_map=None, dataset=None, keys=None):
        if synset_map is not None:
            self.synset_map = synset_map
        elif (dataset is not None) and (keys is not None):
            self.synset_map_from_dataset(dataset, keys)
        else:
            self.synset_map = {}
    
    
    @property
    def synset_map(self):
        return deepcopy(self._synset_map)
    
    
    @synset_map.setter
    def synset_map(self, synset_map):
        elements = [synset_map]
        for element in elements:
            if not isinstance(element, dict):
                raise TypeError(
                    'All entries of `synset_map` have to be of type `dict`.\n'
                )

            if element:
                for value in element.values():
                    elements.append(value)
        
        self._synset_map = synset_map
        
        self.construct_hyponym_map()
        self.construct_hypernym_map()
        
    
    def construct_hyponym_map(self):
        '''Constructs a dictionary containing the hyponyms for every synset in `synset_map`
        
        Constructs a dictionary containing the hyponyms for every synset in `synset_map`. This
        function is called exactly once every time `synset_map` is set, thus reducing the complexity
        of subsequent lookup operations to O(1). Has to be called manually if changes to an existing
        synset map are made.
        
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
                
        if not self._synset_map:
            raise UserWarning(
                'Please initialize `synset_map` before calling this function.\n'
            )
                    
        self._hyponyms = {}
        
        synsets = [self._synset_map]
        for synset in synsets:
            for hypernym in synset.keys():                
                self._hyponyms[hypernym] = list(synset[hypernym].keys())
                synsets.append(synset[hypernym])
            
        
    def construct_hypernym_map(self):
        '''Constructs a dictionary containing the complete hypernym path for every synset in `synset_map`
        
        Constructs a dictionary containing the complete hypernym path for every synset in `synset_map`.
        This function is called exactly once every time `synset_map`is changed, thus reducing the
        complexity of subsequent lookup operations to O(1). Has to be called manually if changes to
        an existing synset map are made.
        
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not self._synset_map:
            raise UserWarning(
                'Please initialize `synset_map` before calling this function.\n'
            )
        
        self._hypernym_paths = {}
        
        synsets = [self._synset_map]
        for synset in synsets:
            for hypernym, hyponyms in zip(synset.keys(), synset.values()):
                hypernym_path = [hypernym]

                while True:
                    # Check if the current `hypernym` is part of the root layer (meaning it has no
                    # further hypernyms)
                    if hypernym in self._synset_map.keys():
                        break

                    elements = [self._synset_map]
                    for element in elements:
                        for key in element.keys():
                            if hypernym in element[key].keys():
                                hypernym = key
                                break
                            else:
                                elements.append(element[key])

                    hypernym_path.append(hypernym)

                self._hypernym_paths[hypernym_path[0]] = hypernym_path[::-1]
                
                synsets.append(hyponyms)
    
    
    def synset_map_from_dataset(self, dataset, keys):
        '''Creates a mapping between hyper- and hyponyms from a given dataset
        
        Args:
            dataset: Dataset-like structure that can be accessed via `keys`and that contains the
                hyper- and hyponyms whose relation is to be mapped (assuming an unambiguous, injective
                structure of synsets)
            keys: List of keys that can be used to access the fields of `dataset` that contain the
                synset specifiers
            
        Raises:
            TypeError: If one of the input arguments is of the wrong type
            ValueError: If invalid values are specified for one or more input arguments
        '''
        
        if not isinstance(dataset, list):
            raise TypeError(
                '`dataset` is expected to be of type `list`. Is of type {}.\n'.format(type(dataset))
            )
        if not dataset:
            raise ValueError(
                '`dataset` may not be empty.\n'
            )
        if not isinstance(keys, list):
            raise TypeError(
                '`keys` is expected to be of type `list`. Is of type {}.\n'.format(type(keys))
            )
        if not keys:
            raise ValueError(
                '`keys` may not be empty.\n'
            )
            
        # Determine the hierarchical relationship between the given keys (i.e. which fields in the
        # dataset contain the highest-level synsets, which ones contain the second highest and so on)
        
        ordered_keys = keys
        len_ordered_keys = len(ordered_keys)
        if len_ordered_keys > 1:
            # Assuming an injective structure of the syntactic relationships, determine the hierarchical
            # relationship between two values of `keys` by simply iterate over the dataset until one
            # of the fields differs for the same value of the field referenced by the other key,
            # indicating that the former is a hyponym of the latter. The keys can then be sorted
            # using e.g. Bubble Sort as is done here.
            for i in range(len_ordered_keys):
                for j in range(0, len_ordered_keys - i - 1):
                    key_1 = ordered_keys[j]
                    key_2 = ordered_keys[j + 1]
                    
                    val_key_1 = dataset[0][key_1]
                    val_key_2 = dataset[0][key_2]
                        
                    for record in dataset:
                        if (val_key_1 != record[key_1]) and (val_key_2 == record[key_2]):
                            ordered_keys[j], ordered_keys[j + 1] = ordered_keys[j + 1], ordered_keys[j]
                            break
        
        # Construct the mapping between hierarchically related synsets from the dataset
        
        synset_map = {}
        for record in dataset:
            synset = synset_map
            
            for key in ordered_keys:
                hyponym = record[key]
                
                if not hyponym in synset.keys():
                    synset[hyponym] = {}
                    
                synset = synset[hyponym]
        
        self.synset_map = synset_map
        
    
    def hyponyms(self, synset):
        '''Returns the hyponyms for a given synset
        
        Args:
            synset: Key / label of the synset whose hyponyms are to be returned
        
        Returns:
            hyponyms: List containing the hyponyms of the given synset (empty if `synset` is not in
                `synset_map` or if `synset` is a leaf node)
            
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not self._hyponyms:
            raise UserWarning(
                'Please initialize `synset_map` or call `construct_hyponym_map` before calling this function.\n'
            )
            
        hyponyms = []
                
        if synset in self._hyponyms.keys():
            hyponyms = self._hyponyms[synset]

        return hyponyms
    
    
    def hypernym(self, synset):
        '''Returns the hypernym for a given synset
        
        Args:
            synset: Key / label of the synset whose hypernym is to be returned
        
        Returns:
            hypernym: Hypernym of the given synset (`None` if `synset` is not in `synset_map` or if
                `synset` is part of the root layer)
            
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not self._hypernym_paths:
            raise UserWarning(
                'Please initialize `synset_map` or call `construct_hypernym_map` before calling this function.\n'
            )
        
        hypernym = None
           
        if synset in self._hypernym_paths.keys():
            # Check if `synset` is part of the root layer
            if len(self._hypernym_paths[synset]) > 1:
                hypernym = self._hypernym_paths[synset][-2]
        
        return hypernym
    
    
    def hypernym_path(self, synset):
        '''Returns the hypernym path from a given synset to the root layer
        
        Args:
            synset: Key / label of the synset whose hypernym path is to be returned
        
        Returns:
            hypernym_path: List containing the hypernym path in descending order from the root layer
                down to `synset` (empty if `synset` is not in `synset_map`)
            
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not self._hypernym_paths:
            raise UserWarning(
                'Please initialize `synset_map` or call `construct_hypernym_map` before calling this function.\n'
            )
            
        hypernym_path = []
        
        if synset in self._hypernym_paths.keys():
            hypernym_path = self._hypernym_paths[synset]
                        
        return hypernym_path
    
    
    def is_a(self, hyponym, hypernym):
        '''Checks whether a synset is a hyponym of another synset
        
        Args:
            hyponym: Key / label of the hierarchically subordinate synset in question
            hypernym: Key / label of the hierarchically superordinate synset in question
        
        Returns:
            is_hyponym: Indicator whether `hyponym` is a hyponym of `hypernym`
            
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not self._synset_map:
            raise UserWarning(
                'Please initialize `synset_map` before calling this function.\n'
            )
        
        return (hypernym in self.hypernym_path(hyponym))
    
    
    def semantic_distance(self, synset_1, synset_2):
        '''Calculates the semantic distance between two synsets in accordance to (Fergus et al., 2010)
        
        Args:
            hyponym: Key / label of the first synset
            hypernym: Key / label of the second synset
            
        Returns:
            semantic_dist (float): Semantic distance between the two given synsets (0.0 if one of
                the synsets is not in `synset_map`)
            
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not self._synset_map:
            raise UserWarning(
                'Please initialize `synset_map` before calling this function.\n'
            )
            
        semantic_dist = 0.0
            
        hypernym_path_1 = self.hypernym_path(synset_1)
        hypernym_path_2 = self.hypernym_path(synset_2)
        
        if hypernym_path_1 and hypernym_path_2:
            # Semantic distance is defined by (Fergus et al., 2010) as follows:
            # S(i, j) = intersect(path(i), path(j)) / max(length(path(i)), length(path(j)))
            semantic_dist = len([hypernym for hypernym in hypernym_path_1 if hypernym in hypernym_path_2]) / max(len(hypernym_path_1), len(hypernym_path_2))
        
        return semantic_dist
    
    
    def synset_level(self, synset):
        '''Returns the level of the given synset in the syntactical structure
        
        Args:
            synset: Key / label of the synset in question
            
        Returns:
            level: Level of the given synset in the hierarchical structure (zero indicating root
                level, one the first level below root level, etc.; `None` if `synset` is not in
                `synset_map`)
            
        Raises:
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not self._synset_map:
            raise UserWarning(
                'Please initialize `synset_map` before calling this function.\n'
            )
        
        hypernym_path = self.hypernym_path(synset)
        
        if not hypernym_path:
            return None
        
        return len(hypernym_path) - 1
    
    
    def get_all_synsets_of_level(self, level):
        '''Returns all synsets of the specified level in the hierarchical syntactic structure
        
        Args:
            level (int): Positive integer specifying of level of `synset_map` from which the synsets
                are to be returned
        
        Returns:
            synset: List of all synsets on the specified level in `synset_map` (empty if `level` is
                greater than the maximum depth of `synset_map`)
                
        Raises:
            TypeError: If one of the input arguments is of the wrong type
            ValueError: If invalid values are specified for one or more input arguments
            UserWarning: If `synset_map` has not been initialized at call time
        '''
        
        if not isinstance(level, int):
            raise TypeError(
                '`level` is expected to be of type `int`. Is of type {}.\n'.format(type(level))
            )
        if level < 0:
            raise ValueError(
                '`level` has to be a positive value.\n'
            )
        if not self._hypernym_paths:
            raise UserWarning(
                'Please initialize `synset_map` or call `construct_hypernym_map` before calling this function.\n'
            )
            
        synset = []
        
        for hypernym_path in self._hypernym_paths.values():
            if len(hypernym_path) - 1 >= level:
                if hypernym_path[level] not in synset:
                    synset.append(hypernym_path[level])
                
        return synset

In [None]:
def decode_string(string, encoding='utf-8'):
    '''Decodes a given string
    
    Args:
        string (str): Encoded string
        encoding (str, optional): Codec to use for decoding
        
    Returns:
        decoded_string (str): Decoded string
    '''
        
    # `tf.Tensor` compatibility
    if isinstance(string, tf.Tensor):
        string = string.numpy()
                        
    return string.decode(encoding)

In [None]:
# Mappings between (encoded) numeric labels and (decoded) WordNet synset IDs resp. WordNet synset
# IDs and corresponding string literals
ILSVRC2012_LABELS_TO_WNIDS = [decode_string(wnid) for wnid in tf.data.TextLineDataset(ILSVRC2012_LABELS_TO_WNIDS_FILE)]
ILSVRC2012_WNIDS_TO_NAMES = [wn.synset_from_pos_and_offset(wnid[0], int(wnid[1:])).name().split('.', 1)[0] for wnid in ILSVRC2012_LABELS_TO_WNIDS]

In [None]:
# Annotation: The decoding uses the native numeric labels of the dataset, not the relative position
# of the respective synset in the synset mapping. To resolve the latter, use `ilsvrc2012_decode` resp.
# `ilsvrc2012_encode`.

def ilsvrc2012_decode_native(label, use_wnid=False):
    '''Returns the literal associated with the given native encoded label
    
    Args:
        label (int): ILSVRC2012 encoded (numeric) label
        use_wnid (bool, optional): Indicator whether to return the unique WordNet ID or the name
            (default) of the synset
    
    Returns:
        decoded_label (str): String literal belonging to the given native numeric label
    '''
    
    # `tf.Tensor` compatibility
    if isinstance(label, tf.Tensor):
        label = label.numpy()
    
    if use_wnid:
        return ILSVRC2012_LABELS_TO_WNIDS[label]
    
    return ILSVRC2012_WNIDS_TO_NAMES[label]

# TensorFlow wrapper to be able to apply this function to placeholder object, thus being able to
# employ it in `tf.data.Dataset.map` functions
def tf_ilsvrc2012_decode_native(label, use_wnid):
    return tf.py_function(ilsvrc2012_decode_native, inp=(label, use_wnid), Tout=tf.dtypes.string)

In [None]:
# Annotation: The encoding uses the native numeric labels of the dataset, not the relative position
# of the respective synset in the synset mapping. To resolve the latter, use `ilsvrc2012_decode` resp.
# `ilsvrc2012_encode`.

def ilsvrc2012_encode_native(label):
    '''Returns the native numeric label associated with the given string literal
    
    Args:
        label (str): ILSVRC2012 decoded (string literal) label
    
    Returns:
        encoded_label (int): Native numeric label belonging to the given string literal
    '''
    
    # `tf.Tensor` compatibility
    if isinstance(label, tf.Tensor):
        label = label.numpy()
    
    if label in ILSVRC2012_LABELS_TO_WNIDS:
        return ILSVRC2012_LABELS_TO_WNIDS.index(label)
    
    return ILSVRC2012_WNIDS_TO_NAMES.index(label)

# TensorFlow wrapper to be able to apply this function to placeholder object, thus being able to
# employ it in `tf.data.Dataset.map` functions
def tf_ilsvrc2012_encode_native(label):
    return tf.py_function(ilsvrc2012_encode_native, inp=(label), Tout=tf.dtypes.int32)

In [None]:
# Annotation: The encoding relative position of the respective synset in the synset mapping, not to
# the native numeric labels of the dataset. To resolve the latter, use `ilsvrc2012_decode_native`
# resp. `ilsvrc2012_encode_native`.

def ilsvrc2012_decode(label, level=ILSVRC2012_SPECIES_LABEL_LEVEL):
    '''Returns the literal associated with the given combination of encoded label and hierarchy level
    
    Args:
        label (int): Encoded label
        level (int): Level of the given label in the hierarchical structure
    
    Returns:
        decoded_label (str): String literal belonging to the given label / level combination
    '''
    
    # Cut off invalid values for `level` (has to be between `0` or `3`)
    level = max(0, min(ILSVRC2012_SPECIES_LABEL_LEVEL, level))
    
    return ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(level)[label]

In [None]:
# Annotation: The encoding relative position of the respective synset in the synset mapping, not to
# the native numeric labels of the dataset. To resolve the latter, use `ilsvrc2012_decode_native`
# resp. `ilsvrc2012_encode_native`.

def ilsvrc2012_encode(label):
    '''Returns the encoded (numeric) label associated with the given string literal
    
    Args:
        label (str): Decoded label
    
    Returns:
        encoded_label (int): Numeric label belonging to the given string literal
        level (int): Level of the `encoded_label` in the hierarchical structure
    '''
    
    level = ILSVRC2012_SYNSET_MAP.synset_level(label)
    
    return ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(level).index(label), level

In [None]:
# Annotation: This function serves to aggregate and encapsulate all native (non-tensorized) Python
# functionalities that require eager execution for direct data access during the preprocessing of
# the dataset to minimize the overhead resulting from converting TensorFlow data structures to Python
# objects and back at runtime.

def ilsvrc2012_resolve_hypernym(label, level, encoded=False):
    '''Returns the hypernym of the given (species level) label on the specified hierarchy level

    Given a label on the finest supported categorical resolution (i.e. species level for the ILSVRC
    2012 dataset) returns the related hypernym on the specified hierarchy level

    Args:
        label: Label of the synset on the finest supported categorical resolution whose hypernym is to
            be returned in numerical or string literal format
        level (int): Level of the given label in the hierarchical structure
        encoded (bool, optional): Indicator whether to return the hypernym in encoded or decoded
            format

    Returns:
        hypernym: Hypernym of the given synset on the hierarchy level specified by `level` in numerical
            or string literal format depending on `encoded`
    '''

    # `tf.Tensor` compatibility
    if isinstance(label, tf.Tensor):
        label = label.numpy()

    # Decode the label if it was given in encoded format
    if not isinstance(label, str):
        if isinstance(label, bytes):
            # TensorFlow encodes strings as bytes by default
            label = decode_string(label)
        else:
            # If `label` is given neither as string nor bytes we assume that it is given in numerical
            # encoded format
            label = ilsvrc2012_decode(label, ILSVRC2012_SPECIES_LABEL_LEVEL)

    hypernym = ILSVRC2012_SYNSET_MAP.hypernym_path(label)[level]
    
    if encoded:
        hypernym, _ = ilsvrc2012_encode(hypernym)

    return hypernym

# TensorFlow wrapper to be able to apply this function to placeholder object, thus being able to
# employ it as part of the preprocessing pipeline
def tf_ilsvrc2012_resolve_hypernym_decoded(label, level):
    return tf.py_function(ilsvrc2012_resolve_hypernym, inp=(label, level, False), Tout=tf.dtypes.string)
def tf_ilsvrc2012_resolve_hypernym_encoded(label, level):
    return tf.py_function(ilsvrc2012_resolve_hypernym, inp=(label, level, True), Tout=tf.dtypes.int64)

In [None]:
# Create a mapping between hyper- and hyponyms of the ILSVRC2012 dataset

# Annotation: We tried to adhere to the underlying WordNet structure as far as possible during the
# creation of the synset map. Adjustments to the native hierarchy were made at the following places
# to balance out the distribution of the dataset to not unfairly put the composed approach at a
# disadvantage in comparison to the (monolithic) benchmark architecture:
# - Removed synsets with only one or less associated label from the total set of hyponyms of `animal`
#       (e.g. the hyponym `tribolite` of synset `arthropod` has exactly one associated label, thus
#       permitting no choice on the second hierarchy level in comparison to e.g. `arachnid` with 9
#       labels and was thus removed; affected a total of 20 of 398 labels (~5%))
# - Split up the native category `mammals` into smaller subcategories `dog`, `small_mammal`,
#       `large_omnivore_herbivore`, `ursine`, `primatic` and `other_mammal`
# - Aggregated native categories `coelenterate`, `echinoderm`, `mollusk` and `worm` into supercategory
#       `molluscan`
# - Aggregated native categories `amphibian` and `reptile` into supercategory `herpetologic`
# These changes effect that a) each non-leaf node of the synset map has more than one direct child
# node and does thus permit the training of a model for branch classification and b) the maximum
# spread in the amount of data points per concept between the category with the most and the one
# with the least associated records on any hierarchy level is limited to a factor of <15 (~5.5 on
# the root layer, ~8.5 on the first layer and ~15 on the second layer). With regard to the latter,
# it was additionally made sure that outlier categories with especially many data points are
# inherently coherent in their hyponyms and can thus also be trained with fewer examples.

SYNSET_MAP = {
    'vertebrate': {
        'bird': {
            'aquatic_bird': {}, 'bird_of_prey': {}, 'coraciiform_bird': {}, 'gallinaceous_bird': {}, 'piciform_bird': {}, 'parrot': {}, 'passerine': {}
        },
        'fish': {
            'bony_fish': {}, 'cartilaginous_fish': {}
        },
        'herpetologic': {
            'crocodilian_reptile': {}, 'frog': {}, 'salamander': {}, 'saurian': {}, 'snake': {}, 'turtle': {}
        },
        'dog': {
            'fox': {}, 'wild_dog': {}, 'wolf': {}, 'corgi': {}, 'poodle': {}, 'spitz': {}, 'toy_dog': {}, 'terrier': {}, 'sporting_dog': {}, 'hound': {}, 'working_dog': {}
        },
        'rodenticidal': {
            'musteline_mammal': {}, 'rodent': {}, 'viverrine': {}
        },
        'large_omnivore_herbivore': {
            'pachyderm': {}, 'ungulate': {}
        },
        'ursine': {
            'bear': {}, 'marsupial': {}, 'procyonid': {}
        },
        'primatenic': {
            'edentate': {}, 'primate': {}
        },
        'other_mammal': {
            'aquatic_mammal': {}, 'feline': {}, 'lagomorph': {}, 'monotreme': {}
        },
    },
    'invertebrate': {
        'arthropod': {
            'arachnid': {}, 'crustacean': {}, 'insect': {}
        },
        'molluscan': {
            'coelenterate': {}, 'echinoderm': {}, 'mollusk': {}, 'worm': {}
        }
    }
}

# Associate the species level labels of the ILSVRC2012 dataset to their respective hypernyms

number_labels = len(ILSVRC2012_LABELS_TO_WNIDS)
label_associated = [False] * number_labels

# Manually filter out ambiguous concepts `cardigan` and `crane`
label_associated[474] = True
label_associated[517] = True

synsets = [SYNSET_MAP]
for synset in synsets:
    for hypernym in synset.keys():
        # Check whether the current synset is a leaf node (we don't want to associate labels to
        # higher-level concepts like e.g. `bird` but rather only to the finest predefined resolution,
        # i.e. e.g. `aquatic_bird`)
        if not synset[hypernym]:
            # Iterate over all labels of the ILSVRC2012 dataset, check which ones are hyponyms of
            # the current synset and add those as child nodes to the synset map
            for idx in range(number_labels):
                wnid = ILSVRC2012_LABELS_TO_WNIDS[idx]
                pos = wnid[0]
                offset = int(wnid[1:])

                for path in wn.synset_from_pos_and_offset(pos, offset).hypernym_paths():
                    for variant in wn.synsets(hypernym):
                        if variant in path:
                            # Check whether the current label is already associated to a different
                            # hypernym to ensure unambiguity
                            if label_associated[idx] is False:
                                synset[hypernym][ILSVRC2012_WNIDS_TO_NAMES[idx]] = {}
                                label_associated[idx] = True
                            
        synsets.append(synset[hypernym])

In [None]:
# Create a `synset_map` object from the previously prepared mapping between hyper- and hyponyms
ILSVRC2012_SYNSET_MAP = synset_map(SYNSET_MAP)

# Macros for the number of labels on the different taxonomic levels of the synset mapping
ILSVRC2012_NUM_LABELS_PHYLUM_LAYER = len(ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(ILSVRC2012_PHYLUM_LABEL_LEVEL))
ILSVRC2012_NUM_LABELS_CLASS_LAYER = len(ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(ILSVRC2012_CLASS_LABEL_LEVEL))
ILSVRC2012_NUM_LABELS_ORDER_LAYER = len(ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(ILSVRC2012_ORDER_LABEL_LEVEL))
ILSVRC2012_NUM_LABELS_SPECIES_LAYER = len(ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(ILSVRC2012_SPECIES_LABEL_LEVEL))

---
## Processing and augmentation
<a id='processing_augmentation'></a>


In this section, different functions are provided for preprocessing the dataset prior to the training and inference adhering to the standard 10-crop procedure as described in (Krizhevsky et al., 2012) and (Sermanet et al., 2014) (based on https://github.com/tensorflow/models/blob/master/official/vision/image_classification/imagenet_preprocessing.py).

Rescale the shorter side of an image to a custom size

In [None]:
def resize_image(image, height, width):
    '''Simple, type-preserving wrapper around `tf.image.resize`
    
    Args:
        image: 3-D image `Tensor`
        height (int): Target height for the resized image
        width (int): Target width for the resized image
        
    Returns:
        resized_image: 3-D tensor containing the resized image of shape [width, height, ...]
    '''

    return tf.cast(
        tf.image.resize(
            image, [height, width],
            method=tf.image.ResizeMethod.BILINEAR),
        image.dtype)

In [None]:
def calc_aspect_preserving_shape(height, width, desired_min_size):
    '''Computes new shape with the smallest side equal to `desired_min_size`
    
    Computes new shape with the smallest side equal to `desired_min_size` while preserving the original
    aspect ratio.
    
    Args:
        height (int): Current height
        width (int): Current width
        desired_min_size (int): Desired size of the smallest side after resize
        
    Returns:
        new_height (int): New height
        new_width (int): New width
    '''

    # Convert to floats to make subsequent calculations go smoothly
    desired_min_size = tf.cast(desired_min_size, tf.dtypes.float32)
    height, width = tf.cast(height, tf.float32), tf.cast(width, tf.dtypes.float32)

    min_dim = tf.math.minimum(height, width)
    scale_ratio = desired_min_size / min_dim

    # Convert back to ints to make heights and widths for tf.image.resize compliance
    new_height = tf.cast(height * scale_ratio, tf.dtypes.int32)
    new_width = tf.cast(width * scale_ratio, tf.dtypes.int32)

    return new_height, new_width

In [None]:
def aspect_preserving_resize(image, desired_min_size):
    '''Resizes images preserving the original aspect ratio
    
    Resizes the image with the smallest side eqaul to `desired_min_size` while preserving the original
    aspect ratio.
    
    Args:
        image: 3-D image `Tensor`
        desired_min_size (int): Desired size of the smallest side after resize
        
    Returns:
        resized_image: 3-D tensor containing the resized image
    '''

    shape = tf.shape(input=image)
    height, width = shape[0], shape[1]

    new_height, new_width = calc_aspect_preserving_shape(height, width, desired_min_size)

    return resize_image(image, new_height, new_width)

Crop out a patch of an image

In [None]:
def crop_image(image, offset_height, offset_width, target_height, target_width):
    '''Simple, type-preserving wrapper around `tf.image.crop_to_bounding_box`
    
    Args:
        image: 3-D image `Tensor`
        offset_height (int): Vertical coordinate of the top-left corner of the crop
        offset_width (int): Horizontal coordinate of the top-left corner of the crop
        target_height (int): Height of the crop
        target_width (int): Width of the crop
        
    Returns:
        cropped_image: 3-D tensor containing the cropped image
    '''
    
    return tf.cast(
        tf.image.crop_to_bounding_box(
            image,
            offset_height, offset_width,
            target_height, target_width),
        image.dtype)

In [None]:
def crop_central(image, target_height, target_width):
    '''Crops the central patch of an image
    
    Args:
        image: 3-D image `Tensor`
        target_height (int): Height of the crop
        target_width (int): Width of the crop
        
    Returns:
        cropped_image: 3-D tensor containing the central crop of the image
    '''
    
    shape = tf.shape(input=image)
    height, width = shape[0], shape[1]

    offset_height = (height - target_height) // 2
    offset_width = (width - target_width) // 2
    
    return crop_image(
        image,
        offset_height, offset_width,
        target_height, target_width)

In [None]:
def crop_corner_upper_left(image, target_height, target_width):
    '''Crops the upper left corner patch of an image
    
    Args:
        image: 3-D image `Tensor`
        target_height (int): Height of the crop
        target_width (int): Width of the crop
        
    Returns:
        cropped_image: 3-D tensor containing the crop at the upper left corner position of the image
    '''
    
    shape = tf.shape(input=image)
    height, width = shape[0], shape[1]

    offset_height = 0
    offset_width = 0

    return crop_image(
        image,
        offset_height, offset_width,
        target_height, target_width)

In [None]:
def crop_corner_upper_right(image, target_height, target_width):
    '''Crops the upper right corner patch of an image
    
    Args:
        image: 3-D image `Tensor`
        target_height (int): Height of the crop
        target_width (int): Width of the crop
        
    Returns:
        cropped_image: 3-D tensor containing the crop at the upper right corner position of the image
    '''
    
    shape = tf.shape(input=image)
    height, width = shape[0], shape[1]

    offset_height = 0
    offset_width = (width - target_width)

    return crop_image(
        image,
        offset_height, offset_width,
        target_height, target_width)

In [None]:
def crop_corner_lower_right(image, target_height, target_width):
    '''Crops the lower right corner patch of an image
    
    Args:
        image: 3-D image `Tensor`
        target_height (int): Height of the crop
        target_width (int): Width of the crop
        
    Returns:
        cropped_image: 3-D tensor containing the crop at the lower right corner position of the image
    '''
    
    shape = tf.shape(input=image)
    height, width = shape[0], shape[1]

    offset_height = (height - target_height)
    offset_width = (width - target_width)

    return crop_image(
        image,
        offset_height, offset_width,
        target_height, target_width)

In [None]:
def crop_corner_lower_left(image, target_height, target_width):
    '''Crops the lower left corner patch of an image
    
    Args:
        image: 3-D image `Tensor`
        target_height (int): Height of the crop
        target_width (int): Width of the crop
        
    Returns:
        cropped_image: 3-D tensor containing the crop at the lower left corner position of the image
    '''
    
    shape = tf.shape(input=image)
    height, width = shape[0], shape[1]

    offset_height = (height - target_height)
    offset_width = 0

    return crop_image(
        image,
        offset_height, offset_width,
        target_height, target_width)

In [None]:
def crop_random(image, target_height, target_width):
    '''Crops a random patch of an image
    
    Args:
        image: 3-D image `Tensor`
        target_height (int): Height of the crop
        target_width (int): Width of the crop
        
    Returns:
        cropped_image: 3-D tensor containing a crop at a random position of the image
    '''
    
    shape = tf.shape(input=image)
    height, width = shape[0], shape[1]

    offset_height = tf.random.uniform((1,), maxval=(height - target_height), dtype=tf.dtypes.int32)[0]
    offset_width = tf.random.uniform((1,), maxval=(width - target_width), dtype=tf.dtypes.int32)[0]
    
    return crop_image(
        image,
        offset_height, offset_width,
        target_height, target_width)

Flip a given image

In [None]:
def flip_horizontally(image):
    '''Simple wrapper for `tf.image.flip_up_down`
    
    Args:
        image: 3-D image `Tensor`
        
    Returns:
        flipped_image: 3-D tensor containing the horizontally flipped image
    '''
    
    return tf.image.flip_up_down(image)

In [None]:
def flip_vertically(image):
    '''Simple wrapper for `tf.image.flip_left_right`
    
    Args:
        image: 3-D image `Tensor`
        
    Returns:
        flipped_image: 3-D tensor containing the vertically flipped image
    '''
    
    return tf.image.flip_left_right(image)

Alter the intensities of the RGB channels of a given image

In [None]:
# Approximate the principal components of the RGB channels from a set of pixel values from the
# training dataset using Singular Value Decomposition (SVD)

# Number of samples to use for approximation of the covariance matrix of the RGB channels
NUM_SAMPLES = 10000

# Construct a matrix containing all pixels within `NUM_SAMPLES` samples randomly drawn from the
# training dataset with the dimensions 'num_pixels' * 'num_channels'
rgb_values = None
for record in ilsvrc2012_train_raw.shuffle(buffer_size=1000).take(NUM_SAMPLES):
    image = tf.dtypes.cast(record[ILSVRC2012_IMG_KEY], tf.dtypes.float32)
    
    if rgb_values is None:
        rgb_values = tf.expand_dims(tf.math.reduce_mean(image, axis=[0, 1]), axis=0)
    else:
        rgb_values = tf.concat(
            [rgb_values, tf.expand_dims(tf.math.reduce_mean(image, axis=[0, 1]), axis=0)], axis=0)

# Offset / bias correction
rgb_values = rgb_values - tf.reduce_mean(rgb_values, axis=0)

# Standard deviation normalization
rgb_values = rgb_values / tf.math.sqrt(tf.reduce_mean(rgb_values ** 2, axis=0))

# Calculate the covariance matrix of `rgb_values`
rgb_cov = tf.linalg.matmul(rgb_values, rgb_values, transpose_a=True) / NUM_SAMPLES

# Singular Value Decomposition (SVD)
s, u, v = tf.linalg.svd(rgb_cov)

# Be s, v, u the singular values, right singular vectors and left singular vectors of M. Then the
# following holds true:
#     Mt * M = (u * s * vt)t * u * s * vt    (I)
#            = v * st * ut * u * s * vt      (II)
#            = v * (s ** 2) * vt             (III)
# Thus it can be seen that v are the eigenvectors and s ** 2 are the eigenvalues of Mt * M. In the
# case of the singular value decomposition of the covariance matrix of M the right singular vectors
# are consequently eigenvectors of the zero-mean centered, normalized M and the singular values are
# the square root of the respective eigenvalues. (Generally, the right singular vectors can be
# interpreted as a orthonormal base along the main variance axes of the matrix (i.e. the transformation
# space) while the left singular values depict the orthonormal base along the same axes in the
# codomain space.)
ILSVRC2012_TRAIN_RGB_EIGENVALUES = s ** 2
ILSVRC2012_TRAIN_RGB_EIGENVECTORS = v

In [None]:
def shift_rgb(image, axis, stddev=0.1):
    '''Shifts the itensities of the RGB channels of a given image by a random value along a given axis
    
    Args:
        image: 3-D image `Tensor`
        axis: Axis along which to perform the shift of the RGB values (i.e. if e.g. a value of
            [1., 1., 1.] is provided, the shift is applied to all channels homogenously); has to be
            of shape (None, 3); if multiple axes are provided, a shift is performed along each axis
            respectively
        stddev (float, optional): Standard deviation of the Gaussian to draw the weights from
        
    Returns:
        rgb_corrected_image: 3-D tensor containing the RGB-shifted image
    '''
    
    image_dtype = image.dtype
    
    image = tf.dtypes.cast(image, tf.dtypes.float32)
    axis = tf.dtypes.cast(axis, tf.dtypes.float32)
    
    if len(axis.shape) == 1:
        axis = tf.expand_dims(axis, 0)
            
    # Sample the weights for the different principal components from a Gaussian with mean zero and
    # standard deviation `stddev`
    # Annotation: The weights for each principal component are drawn exactly once per image.
    weights = tf.random.normal(shape=(axis.shape[1], ), mean=0.0, stddev=stddev)
    
    rgb_corrected_image = image + tf.math.reduce_sum(
        tf.math.multiply(tf.transpose(tf.broadcast_to(weights, axis.shape)), axis) * (RGB_MAX_VAL - RGB_MIN_VAL) / 2, axis=0)
        
    # Cut off invalid channel values
    rgb_corrected_image = tf.math.maximum(tf.math.minimum(rgb_corrected_image, RGB_MAX_VAL), RGB_MIN_VAL)
    
    return tf.dtypes.cast(rgb_corrected_image, image_dtype)

General preprocessing applicable to all images

In [None]:
def general_preprocessing(record, flip=False):
    '''Preprocessing steps applicable to all images
    
    Preprocessing steps applicable to all images; includes horizontal flipping
    
    Args:
        record: (`image`, `label`) tuple
        flip (optional): Indicator whether to flip the image
        
    Returns:
        processed_image: 3-D tensor containing the processed image
        label: Label belonging to `processed_image`
    '''
    
    image = record[0]
    label = record[1]
    
    if flip:
        processed_image = flip_horizontally(image)
    else:
        processed_image = image
    
    return (processed_image, label)

Preprocessing dependent on whether the images are used for training or evaluation

In [None]:
def train_preprocessing(record):
    '''Preprocessing steps applicable only to training images
    
    Preprocessing steps applicable only to training images; includes cropping of random image patches
    and shifts of the itensities of the RGB channels along the (approximated) principal components of
    the set of the mean pixel values per image in the training dataset scaled by their corresponding
    eigenvalues times a random variable drawn from a Gaussian with mean zero and standard deviation 0.1
    
    Args:
        record: (`image`, `label`) tuple
        
    Returns:
        processed_image: 3-D tensor containing the processed image
        label: Label belonging to `processed_image`
    '''
            
    image = record[0]
    label = record[1]
    
    # RGB shift
    processed_image = shift_rgb(
            image,
            tf.math.multiply(
                ILSVRC2012_TRAIN_RGB_EIGENVECTORS,
                tf.transpose(tf.broadcast_to(ILSVRC2012_TRAIN_RGB_EIGENVALUES, ILSVRC2012_TRAIN_RGB_EIGENVECTORS.shape))
            )
    )
    
    # Random cropping
    processed_image = crop_random(processed_image, CROP_SIZE_H, CROP_SIZE_W)
    
    return (processed_image, label)

In [None]:
def test_preprocessing(record, crop_mode):
    '''Preprocessing steps applicable only to test images
    
    Preprocessing steps applicable only to test images; includes cropping of center and corner
    image patches
    
    Args:
        record: (`image`, `label`) tuple
        crop_mode (int): Indicates whether to crop at the center or the corners of the image ([0:3]
            corners (clockwise starting at the upper left corner) and [4] image center)
            
    Returns:
        processed_image: 3-D tensor containing the processed image
        label: Label belonging to `processed_image`
    '''

    image = record[0]
    label = record[1]
    
    if crop_mode == 0:
        processed_image = crop_corner_upper_left(image, CROP_SIZE_H, CROP_SIZE_W)
    elif crop_mode == 1:
        processed_image = crop_corner_upper_right(image, CROP_SIZE_H, CROP_SIZE_W)
    elif crop_mode == 2:
        processed_image = crop_corner_lower_right(image, CROP_SIZE_H, CROP_SIZE_W)
    elif crop_mode == 3:
        processed_image = crop_corner_lower_left(image, CROP_SIZE_H, CROP_SIZE_W)
    else:
        processed_image = crop_central(image, CROP_SIZE_H, CROP_SIZE_W)
    
    return (processed_image, label)

Given a raw dataset, construct an iterator over the preprocessed and augmented records

In [None]:
def process_and_augment(dataset,
                        batch_size,
                        synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL,
                        hypernym=None,
                        is_train=False,
                        num_rnd_crops=5,
                        shuffle_buffer_size=10000,
                        num_epochs=1):
    '''Given raw dataset, construct an iterator over the preprocessed and augmented records
    
    Args:
        dataset: Dataset containing raw records
        batch_size: Number of samples per batch
        synset_level (optional): Specifier of the synset level (i.e. fineness of label resolution)
            to use (defaults to the finest possible resolution (i.e. species level for the ILSVRC2012
            dataset)
        hypernym (optional): Identifier of the parental (hypernym) category whose subordinate (hyponym)
            labels are to be included in the dataset (only evaluated below the root level, i.e. if
            `synset_level` is greater than `0`; value `None` (default) indicates that all categories
            are to be included; all records with other labels are discarded)
        is_train (optional): Indicator whether the input is for training
        num_rnd_crops (optional): Number of random crops perform per image (each crop is subsequently
            flipped vertically as well, resulting in a total augmentation of 2 * `num_rnd_crops` per
            image; only evaluated if `is_train`is set to True)
        shuffle_buffer_size (optional): Size of the buffer to use when shuffling records (only used
            if `is_train` is set to True)
        num_epochs (optional): Number of epochs to repeat the dataset (only used when `is_train` is
            set to True)
            
    Returns:
        dataset: Dataset of (`image`, `label`) pairs ready for iteration
    '''
    
    # Cut off invalid values for `synset_level`
    synset_level = max(0, min(ILSVRC2012_SPECIES_LABEL_LEVEL, synset_level))
    
    # Get the maximum label (called 'label depth' in Tensorflow) for one-hot encoding (cf. below)
    label_depth = len(ILSVRC2012_SYNSET_MAP.get_all_synsets_of_level(synset_level))

    # Convert each record to the form (image, label) and encode the latter at the same time
    dataset = dataset.map(
        lambda record: (record[ILSVRC2012_IMG_KEY],
                        tf_ilsvrc2012_resolve_hypernym_encoded(record[ILSVRC2012_LABEL_KEY], synset_level)),
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    if (synset_level != 0) and (hypernym is not None):
        # Check, whether `hypernym` is compatible with `synset_level`
        if (ILSVRC2012_SYNSET_MAP.synset_level(hypernym) == synset_level - 1):
            hyponyms = ILSVRC2012_SYNSET_MAP.hyponyms(hypernym)

            if hyponyms:
                # Apply filter w/ regards to the parental (coarse) category (i.e. filter out all images
                # that don't belong to a given parent category)
                dataset = dataset.filter(
                    lambda _, label: tf.math.reduce_any(
                        [tf.math.equal(label, ilsvrc2012_encode(hyponym)[0]) for hyponym in hyponyms]))
    
    # One-hot encode the remaining entries
    dataset = dataset.map(
        lambda image, label: (image, tf.one_hot(label, label_depth)),
        num_parallel_calls=tf.data.experimental.AUTOTUNE)

    # Parse the raw records into images and labels and augment the dataset as described in
    # (Krizhevsky et al., 2012) and (Sermanet et al., 2014)
    
    # Repeat each image twice (once per flip)
    dataset = dataset.enumerate().interleave(
        lambda _, record: tf.data.Dataset.from_tensors(record).repeat(2),
        cycle_length=1,
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    # General preprocessing applicable to all images
    dataset = dataset.enumerate().map(
        lambda idx, record: general_preprocessing(record, flip=tf.dtypes.cast(idx % 2, tf.dtypes.bool)),
        num_parallel_calls=tf.data.experimental.AUTOTUNE)
        
    # Mode dependent augmentation
    if is_train:
        # Repeat each image `num_rnd_crops` times (once per crop)
        dataset = dataset.enumerate().interleave(
            lambda _, record: tf.data.Dataset.from_tensors(record).repeat(num_rnd_crops),
            cycle_length=1,
            num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
        dataset = dataset.enumerate().map(
            lambda _, record: train_preprocessing(record),
            num_parallel_calls=tf.data.experimental.AUTOTUNE)
    else:
        # Repeat each image five times (once per crop)
        dataset = dataset.enumerate().interleave(
            lambda _, record: tf.data.Dataset.from_tensors(record).repeat(5),
            cycle_length=1,
            num_parallel_calls=tf.data.experimental.AUTOTUNE)
        
        dataset = dataset.enumerate().map(
            lambda idx, record: test_preprocessing(record, crop_mode=idx%5),
            num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    if is_train:
        # Shuffle records before repeating to respect epoch boundaries
        dataset = dataset.shuffle(buffer_size=shuffle_buffer_size)
        # Repeats the dataset for the number of epochs to train
        dataset = dataset.repeat(num_epochs)
    
    # Batching
    dataset = dataset.batch(batch_size)

    # Operations between the final prefetch and the get_next call to the iterator will happen
    # synchronously during run time. Manual prefetching at this point backgrounds all of the above
    # processing work and helps keep it out of the critical training path.
    dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    
    return dataset

In [None]:
# Benchmark the performance of the preprocessing pipeline

# Number of samples to use for the benchmark (higher numbers result in a lower bias from onetime
# operations but increase the computation time)
NUM_SAMPLES = 1000

ilsvrc2012_train = process_and_augment(ilsvrc2012_train_raw,
                                       batch_size=1,
                                       synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL,
                                       hypernym='aquatic_bird',
                                       is_train=True,
                                       num_rnd_crops=5,
                                       shuffle_buffer_size=10000,
                                       num_epochs=1)

start_time_raw = time.perf_counter()
for record in ilsvrc2012_train_raw.take(NUM_SAMPLES):
    continue
end_time_raw = time.perf_counter()

mean_time_raw = (end_time_raw - start_time_raw) / NUM_SAMPLES
print('Mean time per sample (raw): {} s'.format(mean_time_raw))

start_time_processed = time.perf_counter()
for record in ilsvrc2012_train.take(NUM_SAMPLES):
    continue
end_time_processed = time.perf_counter()

mean_time_processed = (end_time_processed - start_time_processed) / NUM_SAMPLES
print('Mean time per sample (preprocessed): {} s'.format(mean_time_processed))

mean_time_diff = mean_time_processed - mean_time_raw
print('Time difference per sample: {} s'.format(mean_time_diff))

---
## Model template (VGG)
<a id='template'></a>

As the basic network architecture employed for both the [composed model](#compnet) as well as for the [benchmark](#benchmark), it was decided to use the VGG architecture following (Simonyan et al., 2015). In this section, a template is provided that can be used to generate instances of the aforementioned model with customizable hyperparameters such as the number of hidden layers or neurons per layer.

Criteria that were taken into account when selecting the basic network architecture:
- a) Simplicity / Leanness (i.e. no inferred knowledge in the architecture for the same [reasons](#compnet) mentioned when arguing about the composed model's architecture)
- b) Existence of reference values in literature
- c) Scalability (so that we could employ the same architecture for the composed model as well as for the benchmark for comparative reasons)
- d) Contemporary performance levels

In [None]:
def vgg_model_from_template(input_shape,
                            num_classes,
                            num_conv_layers,
                            num_conv_channels,
                            num_fc_layers,
                            num_fc_neurons):
    '''Constructs a VGG network as described in (Simonyan et al., 2015) from template
    
    Constructs a VGG network as described in (Simonyan et al., 2015) with customizable hyperparameters
    from a template model
    
    Args:
        input_shape: Dimensions of the input to the first layer
        num_classes (int): Number of classes (translates to dimensions of the output)
        num_conv_layers (int): Number of convolutional layers
        num_conv_channels: Number of channels per convolutional layer; can be specified either as a
            list with one entry for every convolutional layer or as a single value (in case of the
            latter, the same amount of channels is used for all convolutional layers)
        num_fc_layers (int): Number of fully connected layers
        num_fc_neurons: Number of neruons per fully connected layer; can be specified either as a
            list with one entry for every fully connected layer or as a single value (in case of the
            latter, the same amount of neurons is used for all convolutional layers)
    
    Returns:
        model: VGG network with the specified hyperparameters
        
    Raises:
        TypeError: If one of the input arguments is of the wrong type
        ValueError: If invalid values are specified for one or more input arguments
    '''
    
    if not isinstance(input_shape, list):
        raise TypeError(
            '`input_shape` is expected to be of type `list`. Is of type {}.\n'.format(type(input_shape))
        )
    if not input_shape:
        raise ValueError(
            '`input_shape` may not be empty.\n'
        )
    if any([dim <= 0 for dim in input_shape]):
        raise ValueError(
            'All entries of `input_shape` must be greater than zero.\n'
        )
    if not isinstance(num_classes, int):
        raise TypeError(
            '`num_classes` is expected to be of type `int`. Is of type {}.\n'.format(type(num_classes))
        )
    if num_classes <= 0:
        raise ValueError(
            '`num_classes` must be greater than zero.\n'
        )
    if not isinstance(num_conv_layers, int):
        raise TypeError(
            '`num_conv_layers` is expected to be of type `int`. Is of type {}.\n'.format(type(num_conv_layers))
        )
    if num_conv_layers <= 0:
        raise ValueError(
            '`num_conv_layers` must be greater than zero.\n'
        )
    if not (isinstance(num_conv_channels, int) or isinstance(num_conv_channels, list)):
        raise TypeError(
            '`num_conv_channels` is expected to be either of type `int` or `list`.'
            'Is of type {}.\n'.format(type(num_conv_channels))
        )
    if isinstance(num_conv_channels, int):
        if num_conv_channels <= 0:
            raise ValueError(
                '`num_conv_channels` must be greater than zero.\n'
            )
    if isinstance(num_conv_channels, list):
        if len(num_conv_channels) != num_conv_layers:
            raise ValueError(
                'The number of entries of `num_conv_channels` must match `num_conv_layers`.\n'
            )
        if not all([isinstance(num_neurons, int) for num_neurons in num_conv_channels]):
            raise TypeError(
                'All entries of `num_conv_channels` must be of type `int`.\n'
            )
        if any([num_neurons <= 0 for num_neurons in num_conv_channels]):
            raise ValueError(
                'All entries of `num_conv_channels` must be greater than zero.\n'
            )
    if not isinstance(num_fc_layers, int):
        raise TypeError(
            '`num_fc_layers` is expected to be of type `int`. Is of type {}.\n'.format(type(num_fc_layers))
        )
    if num_fc_layers <= 0:
        raise ValueError(
            '`num_fc_layers` must be greater than zero.\n'
        )
    if not (isinstance(num_fc_neurons, int) or isinstance(num_fc_neurons, list)):
        raise TypeError(
            '`num_fc_neurons` is expected to be either of type `int` or `list`.'
            'Is of type {}.\n'.format(type(num_fc_neurons))
        )
    if isinstance(num_fc_neurons, int):
        if num_fc_neurons <= 0:
            raise ValueError(
                '`num_fc_neurons` must be greater than zero.\n'
            )
    if isinstance(num_fc_neurons, list):
        if len(num_fc_neurons) != num_fc_layers:
            raise ValueError(
                'The number of entries of `num_fc_neurons` must match `num_fc_layers`.\n'
            )
        if not all([isinstance(num_neurons, int) for num_neurons in num_fc_neurons]):
            raise TypeError(
                'All entries of `num_fc_neurons` must be of type `int`.\n'
            )
        if any([num_neurons <= 0 for num_neurons in num_fc_neurons]):
            raise ValueError(
                'All entries of `num_fc_neurons` must be greater than zero.\n'
            )
            
    # Convert `num_conv_channels` and / or `num_fc_neurons` to a list if provided as a single int
    if isinstance(num_conv_channels, int):
        num_conv_channels = [num_conv_channels] * num_conv_layers
    if isinstance(num_fc_neurons, int):
        num_fc_neurons = [num_fc_neurons] * num_fc_neurons
        
    # Define the convolutional layers after which to insert MaxOut layers based on `num_conv_layers`
    # in accordance to (Simonyan et al., 2015)
    if num_conv_layers <= 8:
        MAXPOOL_LAYERS = [1, 2, 4, 6]
    elif num_conv_layers <= 10:
        MAXPOOL_LAYERS = [2, 4, 6, 8]
    elif num_conv_layers <= 13:
        MAXPOOL_LAYERS = [2, 4, 7, 10]
    else:
        MAXPOOL_LAYERS = [2, 4, 8, 12]
    
    # Construct the network with the specified hyperparameters from a model template
    # Note: We imply a combination of Dropout and MaxNorm for weight regularization as is reported
    # in (Srivastava et al., 2014) to be most effective. Furthermore, we also apply Dropout to the
    # convolutional layers to facilitate the discovery of informative features as described in
    # (Park et al., 2016).

    # Input specification
    inputs = tf.keras.Input(shape=input_shape)
    
    # Convolutional layer
    x = tf.keras.layers.Conv2D(filters=num_conv_channels[0],
                               kernel_size=(3, 3),
                               strides=1,
                               padding='same',
                               activation='relu',
                               use_bias=True,
                               kernel_regularizer=tf.keras.regularizers.l2(l=5e-4),
                               bias_regularizer=tf.keras.regularizers.l2(l=5e-4))(inputs)
    
    for layer in range(1, num_conv_layers):
        # Check when to insert MaxPool and Dropout layers in accordance to (Simonyan et al., 2015)
        if layer in MAXPOOL_LAYERS:
            # MaxPool layer
            x = tf.keras.layers.MaxPool2D(pool_size=(2, 2),
                                          strides=(2, 2),
                                          padding='same')(x)

            # Dropout layer
            x = tf.keras.layers.Dropout(rate=0.1)(x)
        
        # Convolutional layer
        x = tf.keras.layers.Conv2D(filters=num_conv_channels[layer],
                                   kernel_size=(3, 3),
                                   strides=1,
                                   padding='same',
                                   activation='relu',
                                   use_bias=True,
                                   kernel_regularizer=tf.keras.regularizers.l2(l=5e-4),
                                   bias_regularizer=tf.keras.regularizers.l2(l=5e-4))(x)

    # MaxPool layer
    x = tf.keras.layers.MaxPool2D(pool_size=(2, 2),
                                  strides=(2, 2),
                                  padding='same')(x)
    
    # Transition between convolutional and fully connected layers
    x = tf.keras.layers.Flatten()(x)

    # Dropout layer
    x = tf.keras.layers.Dropout(rate=0.5)(x)
    
    for layer in range(num_fc_layers):
        # Fully connected layer
        x = tf.keras.layers.Dense(units=num_fc_neurons[layer],
                                  activation='relu',
                                  use_bias=True,
                                  kernel_regularizer=tf.keras.regularizers.l2(l=5e-4),
                                  bias_regularizer=tf.keras.regularizers.l2(l=5e-4))(x)

        # Dropout layer
        x = tf.keras.layers.Dropout(rate=0.5)(x)

    # Softmax output layer
    outputs = tf.keras.layers.Dense(units=num_classes,
                                    activation='softmax',
                                    use_bias=True,
                                    kernel_regularizer=tf.keras.regularizers.l2(l=5e-4),
                                    bias_regularizer=tf.keras.regularizers.l2(l=5e-4))(x)
    
    return tf.keras.Model(inputs=inputs, outputs=outputs)

---
## Composed Network (CompNet)
<a id='compnet'></a>

In this section, we examine the performance of a composed model on the ILSVRC2012 animal subset. 

The structure of the model is based on the inherent structure of the ILSVRC2012 animal subset, i.e. four interconnected "layers" of classifiers where the predictions of the first layer are used to route the inputs to the corresponding sub-modules of the second, more specialized layer and so on. For simplicity, we keep the basic architecture and hyperparameters for each sub-module the same (i.e. we don't perform inidividual hyperparameter tuning). Furthermore, for comparability with results reported in literature as well as our own benchmark (cf. below), we employ the basic VGG architecture as described by (Simonyan et al., 2015) in a downscaled form so that the total number of trainable parameters across all sub-modules roughly matches the benchmark.

It should be noted at this point, that, even though there are several promising approaches to be found in literature regarding possible improvements of the resilience of hierarchically composed models (cf. e.g. (Fergus et al., 2010), (Deng et al., 2014) or (Roy et al., 2019)), we deliberately avoided incorporating these ideas into our work as the goal of this project is to examine the question whether modularisation of networks in general is a feasible approach and we want to keep our results as unbiased as possible in this regard (i.e. not "unfairly" improving our approach with regard to the benchmark). Nevertheless, it should also be noted that in doing so, we leave lots of room for improvement.

### Model
<a id='compnet_model'></a>

We've got as many specialist modules per layer as there are categories in the preceeding one. The downstream models are grouped in an array in order of their respective hypernym label (i.e. 0..n where n is the number of parental categories) to facilitate routing between the layers by enabling the direct utilization of the prediction of the preceeding layer for indexing.

#### Phylum layer

In [None]:
phylum_model = vgg_model_from_template(input_shape=[224, 224, 3],
                                       num_classes=ILSVRC2012_NUM_LABELS_PHYLUM_LAYER,
                                       num_conv_layers=8,
                                       num_conv_channels=[32, 64, 128, 128,
                                                          256, 256, 256, 256],
                                       num_fc_layers=2,
                                       num_fc_neurons=128)
phylum_model.summary()

In [None]:
# Use Categorical Cross Entropy as loss function
loss = tf.keras.losses.CategoricalCrossentropy()

# Define metrics to watch during training
metrics = [tf.keras.metrics.CategoricalAccuracy(),
           tf.keras.metrics.TopKCategoricalAccuracy(k=5),
           tf.keras.metrics.CategoricalCrossentropy(),
           tf.keras.metrics.AUC()]

# Use Adam (Kingma et al., 2017) as optimizer during training
# Annotation: We don't set `learning_rate` here as this is automatically handled by the
# LearningRateScheduler (cf. callback section below).
optimizer = tf.keras.optimizers.Adam(beta_1=0.9,
                                     beta_2=0.999,
                                     epsilon=1e-07)

phylum_model.compile(loss=loss, metrics=metrics, optimizer=optimizer)

In [None]:
# Alternatively: Restore model from checkpoint
# phylum_model = tf.keras.models.load_model(CKPT_DIR + 'compnet_phylum')
# phylum_model.summary()

#### Class layer

In [None]:
class_models = []

for idx in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    model = vgg_model_from_template(input_shape=[224, 224, 3],
                                    num_classes=ILSVRC2012_NUM_LABELS_CLASS_LAYER,
                                    num_conv_layers=8,
                                    num_conv_channels=[32, 64, 128, 128,
                                                       256, 256, 256, 256],
                                    num_fc_layers=2,
                                    num_fc_neurons=128)
    class_models.append(model)
    
class_models[0].summary()

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    # Use Categorical Cross Entropy as loss function
    loss = tf.keras.losses.CategoricalCrossentropy()

    # Define metrics to watch during training
    metrics = [tf.keras.metrics.CategoricalAccuracy(),
               tf.keras.metrics.TopKCategoricalAccuracy(k=5),
               tf.keras.metrics.CategoricalCrossentropy(),
               tf.keras.metrics.AUC()]

    # Use Adam (Kingma et al., 2017) as optimizer during training
    # Annotation: We don't set `learning_rate` here as this is automatically handled by the
    # LearningRateScheduler (cf. callback section below).
    optimizer = tf.keras.optimizers.Adam(beta_1=0.9,
                                         beta_2=0.999,
                                         epsilon=1e-07)
    
    # Compile the respective sub-module
    class_models[idx].compile(loss=loss, metrics=metrics, optimizer=optimizer)

In [None]:
# Alternatively: Restore models from checkpoint
# class_models = []
# for idx in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
#     model = tf.keras.models.load_model(CKPT_DIR + 'compnet_class_' + str(idx))
#     class_models.append(model)
#
# class_models[0].summary()

#### Order layer

In [None]:
order_models = []

for idx in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    model = vgg_model_from_template(input_shape=[224, 224, 3],
                                    num_classes=ILSVRC2012_NUM_LABELS_ORDER_LAYER,
                                    num_conv_layers=8,
                                    num_conv_channels=[32, 64, 128, 128,
                                                       256, 256, 256, 256],
                                    num_fc_layers=2,
                                    num_fc_neurons=128)
    order_models.append(model)

order_models[0].summary()

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    # Use Categorical Cross Entropy as loss function
    loss = tf.keras.losses.CategoricalCrossentropy()

    # Define metrics to watch during training
    metrics = [tf.keras.metrics.CategoricalAccuracy(),
               tf.keras.metrics.TopKCategoricalAccuracy(k=5),
               tf.keras.metrics.CategoricalCrossentropy(),
               tf.keras.metrics.AUC()]

    # Use Adam (Kingma et al., 2017) as optimizer during training
    # Annotation: We don't set `learning_rate` here as this is automatically handled by the
    # LearningRateScheduler (cf. callback section below).
    optimizer = tf.keras.optimizers.Adam(beta_1=0.9,
                                         beta_2=0.999,
                                         epsilon=1e-07)
    
    # Compile the respective sub-module
    order_models[idx].compile(loss=loss, metrics=metrics, optimizer=optimizer)

In [None]:
# Alternatively: Restore models from checkpoint
# order_models = []
# for idx in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
#     model = tf.keras.models.load_model(CKPT_DIR + 'compnet_order_' + str(idx))
#     order_models.append(model)
#
# order_models[0].summary()

#### Species layer

In [None]:
species_models = []

for idx in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
    model = vgg_model_from_template(input_shape=[224, 224, 3],
                                    num_classes=ILSVRC2012_NUM_LABELS_SPECIES_LAYER,
                                    num_conv_layers=8,
                                    num_conv_channels=[32, 64, 128, 128,
                                                       256, 256, 256, 256],
                                    num_fc_layers=2,
                                    num_fc_neurons=128)
    species_models.append(model)
    
species_models[0].summary()

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
    # Use Categorical Cross Entropy as loss function
    loss = tf.keras.losses.CategoricalCrossentropy()

    # Define metrics to watch during training
    metrics = [tf.keras.metrics.CategoricalAccuracy(),
               tf.keras.metrics.TopKCategoricalAccuracy(k=5),
               tf.keras.metrics.CategoricalCrossentropy(),
               tf.keras.metrics.AUC()]

    # Use Adam (Kingma et al., 2017) as optimizer during training
    # Annotation: We don't set `learning_rate` here as this is automatically handled by the
    # LearningRateScheduler (cf. callback section below).
    optimizer = tf.keras.optimizers.Adam(beta_1=0.9,
                                         beta_2=0.999,
                                         epsilon=1e-07)
    
    # Compile the respective sub-module
    species_models[idx].compile(loss=loss, metrics=metrics, optimizer=optimizer)

In [None]:
# Alternatively: Restore models from checkpoint
# species_models = []
# for idx in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
#     model = tf.keras.models.load_model(CKPT_DIR + 'compnet_species_' + str(idx))
#     species_models.append(model)
#
# species_models[0].summary()

### Training
<a id='compnet_train'></a>

#### Phylum layer

In [None]:
ilsvrc2012_train_phylum = process_and_augment(ilsvrc2012_train_raw, batch_size=128, synset_level=ILSVRC2012_PHYLUM_LABEL_LEVEL, hypernym=None, is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1)
ilsvrc2012_val_phylum = process_and_augment(ilsvrc2012_val_raw, batch_size=128, synset_level=ILSVRC2012_PHYLUM_LABEL_LEVEL, hypernym=None, is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1)

In [None]:
# EarlyStopping: Stop training early if no significant improvement in the monitored quantity is
#     observed for at least `patience` epochs
# LearningRateScheduler: Dynamically adapt the learning rate depending on the training epoch to
#     facilitate accelerated learning during the first few epochs
# ModelCheckpoint: Save the model after each epoch (if `save_best_only` is set to True, only keep
#     the best model with regard to the monitored quantity)
# TensorBoard: Enable TensorBoard visualization
callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                              min_delta=0.01,
                                              patience=3),
             tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-03 if epoch < 2 else (1e-04 if epoch < 4 else 1e-05)),
             tf.keras.callbacks.ModelCheckpoint(filepath=CKPT_DIR + 'compnet_phylum',
                                                monitor='val_loss',
                                                verbose=False,
                                                save_best_only=True),
             tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR,
                                            histogram_freq=1)]

In [None]:
phylum_model.fit(x=ilsvrc2012_train_phylum,
                 epochs=3,
                 verbose=True,
                 callbacks=callbacks,
                 validation_data=ilsvrc2012_val_phylum,
                 shuffle=True,
                 validation_freq=1)

#### Class layer

In [None]:
ilsvrc2012_train_class = [process_and_augment(ilsvrc2012_train_raw, batch_size=128, synset_level=ILSVRC2012_CLASS_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_PHYLUM_LABEL_LEVEL), is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1) for category in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
ilsvrc2012_val_class = [process_and_augment(ilsvrc2012_val_raw, batch_size=128, synset_level=ILSVRC2012_CLASS_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_PHYLUM_LABEL_LEVEL), is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1) for category in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    # EarlyStopping: Stop training early if no significant improvement in the monitored quantity is
    #     observed for at least `patience` epochs
    # LearningRateScheduler: Dynamically adapt the learning rate depending on the training epoch to
    #     facilitate accelerated learning during the first few epochs
    # ModelCheckpoint: Save the model after each epoch (if `save_best_only` is set to True, only keep
    #     the best model with regard to the monitored quantity)
    # TensorBoard: Enable TensorBoard visualization
    callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                  min_delta=0.01,
                                                  patience=3),
                 tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-03 if epoch < 2 else (1e-04 if epoch < 4 else 1e-05)),
                 tf.keras.callbacks.ModelCheckpoint(filepath=CKPT_DIR + 'compnet_class_' + str(idx),
                                                    monitor='val_loss',
                                                    verbose=False,
                                                    save_best_only=True),
                 tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR,
                                                histogram_freq=1)]
    
    class_models[idx].fit(x=ilsvrc2012_train_class[idx],
                          epochs=3,
                          verbose=True,
                          callbacks=callbacks,
                          validation_data=ilsvrc2012_val_class[idx],
                          shuffle=True,
                          validation_freq=1)

#### Order layer

In [None]:
ilsvrc2012_train_order = [process_and_augment(ilsvrc2012_train_raw, batch_size=128, synset_level=ILSVRC2012_ORDER_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_CLASS_LABEL_LEVEL), is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1) for category in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
ilsvrc2012_val_order = [process_and_augment(ilsvrc2012_val_raw, batch_size=128, synset_level=ILSVRC2012_ORDER_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_CLASS_LABEL_LEVEL), is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1) for category in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):  
    # EarlyStopping: Stop training early if no significant improvement in the monitored quantity is
    #     observed for at least `patience` epochs
    # LearningRateScheduler: Dynamically adapt the learning rate depending on the training epoch to
    #     facilitate accelerated learning during the first few epochs
    # ModelCheckpoint: Save the model after each epoch (if `save_best_only` is set to True, only keep
    #     the best model with regard to the monitored quantity)
    # TensorBoard: Enable TensorBoard visualization
    callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                  min_delta=0.01,
                                                  patience=3),
                 tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-03 if epoch < 2 else (1e-04 if epoch < 4 else 1e-05)),
                 tf.keras.callbacks.ModelCheckpoint(filepath=CKPT_DIR + 'compnet_order_' + str(idx),
                                                    monitor='val_loss',
                                                    verbose=False,
                                                    save_best_only=True),
                 tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR,
                                                histogram_freq=1)]
    
    order_models[idx].fit(x=ilsvrc2012_train_order[idx],
                          epochs=3,
                          verbose=True,
                          callbacks=callbacks,
                          validation_data=ilsvrc2012_val_order[idx],
                          shuffle=True,
                          validation_freq=1)

#### Species layer

In [None]:
ilsvrc2012_train_species = [process_and_augment(ilsvrc2012_train_raw, batch_size=128, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_ORDER_LABEL_LEVEL), is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1) for category in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]
ilsvrc2012_val_species = [process_and_augment(ilsvrc2012_val_raw, batch_size=128, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_ORDER_LABEL_LEVEL), is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1) for category in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]

In [None]:
for idx in range(16, ILSVRC2012_NUM_LABELS_ORDER_LAYER):    
    # EarlyStopping: Stop training early if no significant improvement in the monitored quantity is
    #     observed for at least `patience` epochs
    # LearningRateScheduler: Dynamically adapt the learning rate depending on the training epoch to
    #     facilitate accelerated learning during the first few epochs
    # ModelCheckpoint: Save the model after each epoch (if `save_best_only` is set to True, only keep
    #     the best model with regard to the monitored quantity)
    # TensorBoard: Enable TensorBoard visualization
    callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                  min_delta=0.01,
                                                  patience=3),
                 tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-03 if epoch < 2 else (1e-04 if epoch < 4 else 1e-05)),
                 tf.keras.callbacks.ModelCheckpoint(filepath=CKPT_DIR + 'compnet_species_' + str(idx),
                                                    monitor='val_loss',
                                                    verbose=False,
                                                    save_best_only=True),
                 tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR,
                                                histogram_freq=1)]
    
    species_models[idx].fit(x=ilsvrc2012_train_species[idx],
                            epochs=5,
                            initial_epoch=3,
                            verbose=True,
                            callbacks=callbacks,
                            validation_data=ilsvrc2012_val_species[idx],
                            shuffle=True,
                            validation_freq=1)

### Testing
<a id='compnet_test'></a>

#### Phylum layer

In [None]:
ilsvrc2012_test_phylum = process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_PHYLUM_LABEL_LEVEL, hypernym=None, is_train=False)

In [None]:
# Define metrics to watch during the evaluation of the model on the test data set

# Use the same metrics as for the training
# test_metrics = metrics

# Use different metrics than during the training
test_metrics = [tf.keras.metrics.CategoricalAccuracy(name='CategoricalAccuracy'),
                tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='TopKCategoricalAccuracy'),
                tf.keras.metrics.CategoricalCrossentropy(name='CategoricalCrossentropy'),
                tf.keras.metrics.AUC(name='AUC')]

In [None]:
# Reset all metrics before starting the evaluation
for metric in test_metrics:
    metric.reset_states()

for (imgs, ground_truths) in ilsvrc2012_test_phylum:
    # Generate predictions for each image in the current batch
    batch_scores = phylum_model.predict(imgs)
    
    # Since we constructed our test data set in a way that each image in one batch is an augmented
    # version of the same base image, we can simply average the individual scores to get the final
    # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
    # (Krizhevsky et al., 2012).
    prediction = tf.math.reduce_mean(batch_scores, axis=0)
    
    # Update the metrics w/ the result for the current base image; as all images in one batch
    # originate from the same base image (cf. above), the ground truth is hence identical as well.
    # Annotation: The `tf.expand_dims` is a workaround for compatibility with
    # `tf.keras.metrics.TopKCategoricalAccuracy` since the latter  does not accept one-dimensional
    # inputs as of TensorFlow version v2.2.0-rc3.
    for metric in test_metrics:
        metric.update_state(tf.expand_dims(ground_truths[0], 0), tf.expand_dims(prediction, 0))

print('Phylum Model')
print()
print('==================================================')
print()
print('Final results:')
for metric in test_metrics:
    print('{}: {}'.format(metric.name, metric.result().numpy()))

#### Class layer

In [None]:
ilsvrc2012_test_class = [process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_CLASS_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_PHYLUM_LABEL_LEVEL), is_train=False) for category in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    # Define metrics to watch during the evaluation of the model on the test data set

    # Use the same metrics as for the training
    # test_metrics = metrics

    # Use different metrics than during the training
    test_metrics = [tf.keras.metrics.CategoricalAccuracy(name='CategoricalAccuracy'),
                    tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='TopKCategoricalAccuracy'),
                    tf.keras.metrics.CategoricalCrossentropy(name='CategoricalCrossentropy'),
                    tf.keras.metrics.AUC(name='AUC')]

    # Reset all metrics before starting the evaluation
    for metric in test_metrics:
        metric.reset_states()

    for (imgs, ground_truths) in ilsvrc2012_test_class[idx]:
        # Generate predictions for each image in the current batch
        batch_scores = class_models[idx].predict(imgs)

        # Since we constructed our test data set in a way that each image in one batch is an augmented
        # version of the same base image, we can simply average the individual scores to get the final
        # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
        # (Krizhevsky et al., 2012).
        prediction = tf.math.reduce_mean(batch_scores, axis=0)

        # Update the metrics w/ the result for the current base image; as all images in one batch
        # originate from the same base image (cf. above), the ground truth is hence identical as well.
        # Annotation: The `tf.expand_dims` is a workaround for compatibility with
        # `tf.keras.metrics.TopKCategoricalAccuracy` since the latter  does not accept one-dimensional
        # inputs as of TensorFlow version v2.2.0-rc3.
        for metric in test_metrics:
            metric.update_state(tf.expand_dims(ground_truths[0], 0), tf.expand_dims(prediction, 0))

    print('Class Model #{}'.format(ilsvrc2012_decode(idx, ILSVRC2012_PHYLUM_LABEL_LEVEL)))
    print()
    print('==================================================')
    print()
    print('Final results:')
    for metric in test_metrics:
        print('{}: {}'.format(metric.name, metric.result().numpy()))
    print()
    print('====================================================================================================')
    print()

#### Order layer

In [None]:
ilsvrc2012_test_order = [process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_ORDER_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_CLASS_LABEL_LEVEL), is_train=False) for category in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    # Define metrics to watch during the evaluation of the model on the test data set

    # Use the same metrics as for the training
    # test_metrics = metrics

    # Use different metrics than during the training
    test_metrics = [tf.keras.metrics.CategoricalAccuracy(name='CategoricalAccuracy'),
                    tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='TopKCategoricalAccuracy'),
                    tf.keras.metrics.CategoricalCrossentropy(name='CategoricalCrossentropy'),
                    tf.keras.metrics.AUC(name='AUC')]

    # Reset all metrics before starting the evaluation
    for metric in test_metrics:
        metric.reset_states()

    for (imgs, ground_truths) in ilsvrc2012_test_order[idx]:
        # Generate predictions for each image in the current batch
        batch_scores = order_models[idx].predict(imgs)

        # Since we constructed our test data set in a way that each image in one batch is an augmented
        # version of the same base image, we can simply average the individual scores to get the final
        # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
        # (Krizhevsky et al., 2012).
        prediction = tf.math.reduce_mean(batch_scores, axis=0)

        # Update the metrics w/ the result for the current base image; as all images in one batch
        # originate from the same base image (cf. above), the ground truth is hence identical as well.
        # Annotation: The `tf.expand_dims` is a workaround for compatibility with
        # `tf.keras.metrics.TopKCategoricalAccuracy` since the latter  does not accept one-dimensional
        # inputs as of TensorFlow version v2.2.0-rc3.
        for metric in test_metrics:
            metric.update_state(tf.expand_dims(ground_truths[0], 0), tf.expand_dims(prediction, 0))

    print('Order Model #{}'.format(ilsvrc2012_decode(idx, ILSVRC2012_CLASS_LABEL_LEVEL)))
    print()
    print('==================================================')
    print()
    print('Final results:')
    for metric in test_metrics:
        print('{}: {}'.format(metric.name, metric.result().numpy()))
    print()
    print('====================================================================================================')
    print()

#### Species layer

In [None]:
ilsvrc2012_test_species = [process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=ilsvrc2012_decode(category, ILSVRC2012_ORDER_LABEL_LEVEL), is_train=False) for category in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]

In [None]:
for idx in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
    # Define metrics to watch during the evaluation of the model on the test data set

    # Use the same metrics as for the training
    # test_metrics = metrics

    # Use different metrics than during the training
    test_metrics = [tf.keras.metrics.CategoricalAccuracy(name='CategoricalAccuracy'),
                    tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='TopKCategoricalAccuracy'),
                    tf.keras.metrics.CategoricalCrossentropy(name='CategoricalCrossentropy'),
                    tf.keras.metrics.AUC(name='AUC')]

    # Reset all metrics before starting the evaluation
    for metric in test_metrics:
        metric.reset_states()

    for (imgs, ground_truths) in ilsvrc2012_test_species[idx]:
        # Generate predictions for each image in the current batch
        batch_scores = species_models[idx].predict(imgs)

        # Since we constructed our test data set in a way that each image in one batch is an augmented
        # version of the same base image, we can simply average the individual scores to get the final
        # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
        # (Krizhevsky et al., 2012).
        prediction = tf.math.reduce_mean(batch_scores, axis=0)

        # Update the metrics w/ the result for the current base image; as all images in one batch
        # originate from the same base image (cf. above), the ground truth is hence identical as well.
        # Annotation: The `tf.expand_dims` is a workaround for compatibility with
        # `tf.keras.metrics.TopKCategoricalAccuracy` since the latter  does not accept one-dimensional
        # inputs as of TensorFlow version v2.2.0-rc3.
        for metric in test_metrics:
            metric.update_state(tf.expand_dims(ground_truths[0], 0), tf.expand_dims(prediction, 0))

    print('Species Model #{}'.format(ilsvrc2012_decode(idx, ILSVRC2012_ORDER_LABEL_LEVEL)))
    print()
    print('==================================================')
    print()
    print('Final results:')
    for metric in test_metrics:
        print('{}: {}'.format(metric.name, metric.result().numpy()))
    print()
    print('====================================================================================================')
    print()

#### Composed model

In [None]:
ilsvrc2012_test_composed = process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=None, is_train=False)

In [None]:
# Define metrics to watch during the evaluation of the model on the test data set

# Use the same metrics as for the training
# test_metrics = metrics

# Use different metrics than during the training
test_metrics = [tf.keras.metrics.CategoricalAccuracy(name='CategoricalAccuracy'),
                tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='TopKCategoricalAccuracy'),
                tf.keras.metrics.CategoricalCrossentropy(name='CategoricalCrossentropy'),
                tf.keras.metrics.AUC(name='AUC')]

In [None]:
# Reset all metrics before starting the evaluation
for metric in test_metrics:
    metric.reset_states()
    
# Initialize additional custom metrics to watch during the evaluation

# Overall label distribution (predicted and ground truth)
dist_ground_truth = [0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER
dist_predicted = [0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER

# Predicted label distribution for each taxonomic category
dist_predicted_phylum = [[0] * ILSVRC2012_NUM_LABELS_PHYLUM_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_class = [[0] * ILSVRC2012_NUM_LABELS_CLASS_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_predicted_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]
dist_predicted_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_SPECIES_LAYER)]

# Fine label distribution for each coarse category (predicted and ground truth)
dist_ground_truth_phylum_class = [[0] * ILSVRC2012_NUM_LABELS_CLASS_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_phylum_class = [[0] * ILSVRC2012_NUM_LABELS_CLASS_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_ground_truth_phylum_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_phylum_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_ground_truth_phylum_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_phylum_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_ground_truth_class_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_predicted_class_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_ground_truth_class_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_predicted_class_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_ground_truth_order_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]
dist_predicted_order_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]

# Semantic distance between the predicted category and the ground truth in accordance to
# (Fergus et al., 2010)
semantic_distance = 0.0

# Uncertainty metrics in accordance to (Ovadia et. al, 2019)
confidence = []
cat_acc = []
neg_log_likelihood = []
brier_score = []
pred_entropy = []

In [None]:
for (imgs, ground_truths) in ilsvrc2012_test_composed:
    # Generate predictions for each image in the current batch
    batch_scores_phylum = phylum_model.predict(imgs)
    
    # Since we constructed our test data set in a way that each image in one batch is an augmented
    # version of the same base image, we can simply average the individual scores to get the final
    # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
    # (Krizhevsky et al., 2012).
    prediction_phylum = tf.math.reduce_mean(batch_scores_phylum, axis=0)
    
    # Route the current batch to the sub-module predcited by the phylum model and generate the
    # predictions for the respective class category labels
    batch_scores_class = class_models[tf.math.argmax(prediction_phylum)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_class = tf.math.reduce_mean(batch_scores_class, axis=0)
    
    # Route the current batch to the sub-module predcited by the class model and generate the
    # predictions for the respective order category labels
    batch_scores_order = order_models[tf.math.argmax(prediction_class)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_order = tf.math.reduce_mean(batch_scores_order, axis=0)
    
    # Route the current batch to the sub-module predcited by the order model and generate the
    # predictions for the respective species category labels
    batch_scores_species = species_models[tf.math.argmax(prediction_order)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_species = tf.math.reduce_mean(batch_scores_species, axis=0)
    
    # Update the metrics w/ the result for the current base image; as all images in one batch
    # originate from the same base image (cf. above), the ground truth is hence identical as well.
    # Annotation: The `tf.expand_dims` is a workaround for compatibility with
    # `tf.keras.metrics.TopKCategoricalAccuracy` since the latter  does not accept one-dimensional
    # inputs as of TensorFlow version v2.2.0-rc3.
    for metric in test_metrics:
        metric.update_state(tf.expand_dims(ground_truths[0], 0), tf.expand_dims(prediction_species, 0))
        
    # Update custom metrics w/ the result for the current base image; cf. above concerning the
    # ground truth for each batch
    
    ground_truth = ground_truths[0]
    ground_truth_species = tf.math.argmax(ground_truth).numpy()
    ground_truth_species_decoded = ilsvrc2012_decode(ground_truth_species, ILSVRC2012_SPECIES_LABEL_LEVEL)
    
    ground_truth_phylum = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_PHYLUM_LABEL_LEVEL, encoded=True)
    ground_truth_class = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_CLASS_LABEL_LEVEL, encoded=True)
    ground_truth_order = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_ORDER_LABEL_LEVEL, encoded=True)

    prediction = prediction_species
    prediction_species = tf.math.argmax(prediction).numpy()
    prediction_species_decoded = ilsvrc2012_decode(prediction_species, ILSVRC2012_SPECIES_LABEL_LEVEL)
    
    prediction_phylum = tf.math.argmax(prediction_phylum).numpy()
    prediction_class = tf.math.argmax(prediction_class).numpy()
    prediction_order = tf.math.argmax(prediction_order).numpy()
    
    dist_ground_truth[ground_truth_species] += 1
    dist_predicted[prediction_species] += 1

    dist_predicted_phylum[ground_truth_phylum][prediction_phylum] += 1
    dist_predicted_class[ground_truth_class][prediction_class] += 1
    dist_predicted_order[ground_truth_order][prediction_order] += 1
    dist_predicted_species[ground_truth_species][prediction_species] += 1

    dist_ground_truth_phylum_class[ground_truth_phylum][prediction_class] += 1
    dist_predicted_phylum_class[ground_truth_phylum][prediction_class] += 1
    dist_ground_truth_phylum_order[ground_truth_phylum][prediction_order] += 1
    dist_predicted_phylum_order[ground_truth_phylum][prediction_order] += 1
    dist_ground_truth_phylum_species[ground_truth_phylum][prediction_species] += 1
    dist_predicted_phylum_species[ground_truth_phylum][prediction_species] += 1
    dist_ground_truth_class_order[ground_truth_class][prediction_order] += 1
    dist_predicted_class_order[ground_truth_class][prediction_order] += 1
    dist_ground_truth_class_species[ground_truth_class][prediction_species] += 1
    dist_predicted_class_species[ground_truth_class][prediction_species] += 1
    dist_ground_truth_order_species[ground_truth_order][prediction_species] += 1
    dist_predicted_order_species[ground_truth_order][prediction_species] += 1
    
    semantic_distance += ILSVRC2012_SYNSET_MAP.semantic_distance(
        ground_truth_species_decoded,
        prediction_species_decoded
    )

    confidence.append(
        prediction[prediction_species])
        
    cat_acc.append(
        tf.dtypes.cast(tf.math.equal(ground_truth_species, prediction_species), tf.dtypes.float32))

    neg_log_likelihood.append(
        -tf.math.log(prediction[ground_truth_species]))

    brier_score.append(
        tf.math.reduce_sum((prediction - ground_truth) ** 2))

    pred_entropy.append(
        -tf.math.reduce_sum(tf.map_fn(lambda p: p * tf.math.log(p), prediction)))

print('Benchmark')
print()
print('==================================================')
print()
print('Final results:')
for metric in test_metrics:
    print('{}: {}'.format(metric.name, metric.result().numpy()))

In [None]:
# Store the results

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth.data',
    tf.io.serialize_tensor(dist_ground_truth))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted.data',
    tf.io.serialize_tensor(dist_predicted))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum.data',
    tf.io.serialize_tensor(dist_predicted_phylum))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_class.data',
    tf.io.serialize_tensor(dist_predicted_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_order.data',
    tf.io.serialize_tensor(dist_predicted_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_species.data',
    tf.io.serialize_tensor(dist_predicted_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_phylum_class.data',
    tf.io.serialize_tensor(dist_ground_truth_phylum_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum_class.data',
    tf.io.serialize_tensor(dist_predicted_phylum_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_phylum_order.data',
    tf.io.serialize_tensor(dist_ground_truth_phylum_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum_order.data',
    tf.io.serialize_tensor(dist_predicted_phylum_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_phylum_species.data',
    tf.io.serialize_tensor(dist_ground_truth_phylum_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum_species.data',
    tf.io.serialize_tensor(dist_predicted_phylum_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_class_order.data',
    tf.io.serialize_tensor(dist_ground_truth_class_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_class_order.data',
    tf.io.serialize_tensor(dist_predicted_class_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_class_species.data',
    tf.io.serialize_tensor(dist_ground_truth_class_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_class_species.data',
    tf.io.serialize_tensor(dist_predicted_class_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_order_species.data',
    tf.io.serialize_tensor(dist_ground_truth_order_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_order_species.data',
    tf.io.serialize_tensor(dist_predicted_order_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'semantic_distance.data',
    tf.io.serialize_tensor(semantic_distance))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'confidence.data',
    tf.io.serialize_tensor(confidence))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'cat_acc.data',
    tf.io.serialize_tensor(cat_acc))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'neg_log_likelihood.data',
    tf.io.serialize_tensor(neg_log_likelihood))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'brier_score.data',
    tf.io.serialize_tensor(brier_score))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'pred_entropy.data',
    tf.io.serialize_tensor(pred_entropy))

In [None]:
# Load the results

dist_ground_truth = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth.data'),
        tf.dtypes.int32)

dist_predicted = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted.data'),
        tf.dtypes.int32)

dist_predicted_phylum = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum.data'),
        tf.dtypes.int32)

dist_predicted_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_class.data'),
        tf.dtypes.int32)

dist_predicted_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_order.data'),
        tf.dtypes.int32)

dist_predicted_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_species.data'),
        tf.dtypes.int32)

dist_ground_truth_phylum_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_phylum_class.data'),
        tf.dtypes.int32)

dist_predicted_phylum_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum_class.data'),
        tf.dtypes.int32)

dist_ground_truth_phylum_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_phylum_order.data'),
        tf.dtypes.int32)

dist_predicted_phylum_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum_order.data'),
        tf.dtypes.int32)

dist_ground_truth_phylum_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_phylum_species.data'),
        tf.dtypes.int32)

dist_predicted_phylum_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_phylum_species.data'),
        tf.dtypes.int32)

dist_ground_truth_class_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_class_order.data'),
        tf.dtypes.int32)

dist_predicted_class_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_class_order.data'),
        tf.dtypes.int32)

dist_ground_truth_class_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_class_species.data'),
        tf.dtypes.int32)

dist_predicted_class_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_class_species.data'),
        tf.dtypes.int32)

dist_ground_truth_order_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_ground_truth_order_species.data'),
        tf.dtypes.int32)

dist_predicted_order_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'dist_predicted_order_species.data'),
        tf.dtypes.int32)

semantic_distance = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'semantic_distance.data'),
        tf.dtypes.float32)

confidence = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'confidence.data'),
        tf.dtypes.float32)

cat_acc = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'cat_acc.data'),
        tf.dtypes.float32)

neg_log_likelihood = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'neg_log_likelihood.data'),
        tf.dtypes.float32)

brier_score = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'brier_score.data'),
        tf.dtypes.float32)

pred_entropy = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'pred_entropy.data'),
        tf.dtypes.float32)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20, 5))
titles = ['Predicted', 'Ground Truth']

for idx, dist in enumerate([dist_predicted, dist_ground_truth]):
    # Normalize the distribution
    dist = list(map(lambda entry: entry / sum(dist), dist)) 
                
    # Plot the distribution
    ax[idx].bar(range(1, len(dist) + 1), dist, width=1)
    ax[idx].set_title(titles[idx])
    ax[idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 25))
    ax[idx].set_ylim([0, 0.033])
    ax[idx].set_xlabel('Labels')
    ax[idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(20, 15))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_phylum[label]), dist_predicted_phylum[label])) 

    # Plot the distribution
    ax[label].bar(range(0, len(dist)), dist, width=1)
    ax[label].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))
    )
    ax[label].set_xticks(range(0, ILSVRC2012_NUM_LABELS_PHYLUM_LAYER, 1))
    ax[label].set_xlabel('Labels')
    ax[label].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(11, 1, figsize=(20, 165)

for label in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_class[label]), dist_predicted_class[label])) 

    # Plot the distribution
    ax[label].bar(range(0, len(dist)), dist, width=1)
    ax[label].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL))
    )
    ax[label].set_xticks(range(0, ILSVRC2012_NUM_LABELS_CLASS_LAYER, 1))
    ax[label].set_xlabel('Labels')
    ax[label].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(12, 4, figsize=(20, 180))

for label in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
    row = int(label / 4)
    col = int(label % 4)
    
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_order[label]), dist_predicted_order[label])) 

    # Plot the distribution
    ax[row][col].bar(range(0, len(dist)), dist, width=1)
    ax[row][col].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_ORDER_LABEL_LEVEL))
    )
    ax[row][col].set_xticks(range(0, ILSVRC2012_NUM_LABELS_ORDER_LAYER, 4))
    ax[row][col].set_xlabel('Labels')
    ax[row][col].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(95, 4, figsize=(20, 1425))

for label in range(ILSVRC2012_NUM_LABELS_SPECIES_LAYER):
    row = int(label / 4)
    col = int(label % 4)
    
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_species[label]), dist_predicted_species[label])) 

    # Plot the distribution
    ax[row][col].bar(range(0, len(dist)), dist, width=1)
    ax[row][col].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_SPECIES_LABEL_LEVEL))
    )
    ax[row][col].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
    ax[row][col].set_xlabel('Labels')
    ax[row][col].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(20, 30))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_phylum_class, dist_ground_truth_phylum_class]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_CLASS_LAYER, 1))
        ax[label][idx].set_ylim([0, 0.33])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(20, 30))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_phylum_order, dist_ground_truth_phylum_order]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_ORDER_LAYER, 4))
        ax[label][idx].set_ylim([0, 0.01])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(20, 30))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_phylum_species, dist_ground_truth_phylum_species]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
        ax[label][idx].set_ylim([0, 0.033])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(11, 2, figsize=(20, 165))

for label in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_class_order, dist_ground_truth_class_order]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_ORDER_LAYER, 4))
        ax[label][idx].set_ylim([0, 0.01])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(11, 2, figsize=(20, 165))

for label in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_class_species, dist_ground_truth_class_species]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
        ax[label][idx].set_ylim([0, 0.033])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(47, 2, figsize=(20, 40))

for label in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_ORDER_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_ORDER_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_order_species, dist_ground_truth_order_species]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
        ax[label][idx].set_ylim([0, 0.033])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.boxplot(cat_acc, sym='')

ax.set_title('Categorical accuracy over varying corruption severities')
ax.set_ylabel('Accuracy')

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.boxplot(brier_score, sym='')

ax.set_title('Brier Score over varying corruption severities')
ax.set_ylabel('Brier Score')

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.hist(confidence[idx], bins=10000, cumulative=-1, histtype='step')
ax.set_title('Confidence')
ax.set_xlabel('Confidence ' + r'$\tau$')
ax.set_ylabel('Number examples ' + r'$P(x) > \tau$')
#ax.set_xlim([0.002, 1.0])

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.hist(neg_log_likelihood[idx], bins=10000, histtype='step')
ax.set_title('Negative log likelihood')
ax.set_xlabel('Likelihood ' + r'$l$')
ax.set_ylabel('Number examples ' + r'$L(x) > l$')
#ax.set_xlim([0.202, 1.0])

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.hist(pred_entropy[idx], bins=10000, cumulative=-1, histtype='step')
ax.set_title('Predictive entropy')
ax.set_xlabel('Entropy (Nats)' + r'$h$')
ax.set_ylabel('Number examples ' + r'$H(x) > h$')
#ax.set_xlim([0.0, 2.5])

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.plot(
    [val for val in sorted(confidence[idx])],
    [val for _, val in sorted(zip(confidence[idx], cat_acc[idx]))])
ax.set_title('Confidence vs accuracy')
ax.set_xlabel('Confidence ' + r'$\tau$')
ax.set_ylabel('Categorical accuracy on examples ' + r'$P(x) > \tau$')
ax.set_xlim([0.0, 1.0])

ax.grid(linestyle='--')

fig.show()

#### Error propagation

Modeling error

In [None]:
ilsvrc2012_test_modeling_error = process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=None, is_train=False)

In [None]:
# Initialize custom metrics to watch during the evaluation

# Categorical crossentropies for each taxonomic category
cat_crossentropy_phylum = []
cat_crossentropy_class = []
cat_crossentropy_order = []
cat_crossentropy_species = []

In [None]:
for (imgs, ground_truths) in ilsvrc2012_test_modeling_error:
    # Generate predictions for each image in the current batch
    batch_scores_phylum = phylum_model.predict(imgs)
    
    # Since we constructed our test data set in a way that each image in one batch is an augmented
    # version of the same base image, we can simply average the individual scores to get the final
    # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
    # (Krizhevsky et al., 2012).
    prediction_phylum = tf.math.reduce_mean(batch_scores_phylum, axis=0)
    
    # Route the current batch to the sub-module predcited by the phylum model and generate the
    # predictions for the respective class category labels
    batch_scores_class = class_models[tf.math.argmax(prediction_phylum)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_class = tf.math.reduce_mean(batch_scores_class, axis=0)
    
    # Route the current batch to the sub-module predcited by the class model and generate the
    # predictions for the respective order category labels
    batch_scores_order = order_models[tf.math.argmax(prediction_class)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_order = tf.math.reduce_mean(batch_scores_order, axis=0)
    
    # Route the current batch to the sub-module predcited by the order model and generate the
    # predictions for the respective species category labels
    batch_scores_species = species_models[tf.math.argmax(prediction_order)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_species = tf.math.reduce_mean(batch_scores_species, axis=0)
        
    # Update custom metrics w/ the result for the current base image; as all images in one batch
    # originate from the same base image (cf. above), the ground truth is hence identical as well.
    
    ground_truth_species = tf.math.argmax(ground_truths[0]).numpy()
    ground_truth_phylum = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_PHYLUM_LABEL_LEVEL, encoded=True)
    ground_truth_class = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_CLASS_LABEL_LEVEL, encoded=True)
    ground_truth_order = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_ORDER_LABEL_LEVEL, encoded=True)

    cat_crossentropy_phylum.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_phylum, ILSVRC2012_NUM_LABELS_PHYLUM_LAYER) * tf.math.log(prediction_phylum)))
    cat_crossentropy_class.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_class, ILSVRC2012_NUM_LABELS_CLASS_LAYER) * tf.math.log(prediction_class)))
    cat_crossentropy_order.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_order, ILSVRC2012_NUM_LABELS_ORDER_LAYER) * tf.math.log(prediction_order)))
    cat_crossentropy_species.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_species, ILSVRC2012_NUM_LABELS_SPECIES_LAYER) * tf.math.log(prediction_species)))

In [None]:
# Store the results

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_phylum.data',
    tf.io.serialize_tensor(cat_crossentropy_phylum))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_class.data',
    tf.io.serialize_tensor(cat_crossentropy_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_order.data',
    tf.io.serialize_tensor(cat_crossentropy_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_species.data',
    tf.io.serialize_tensor(cat_crossentropy_species))

In [None]:
# Load the results

cat_crossentropy_phylum = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_phylum.data'),
        tf.dtypes.float32)

cat_crossentropy_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_class.data'),
        tf.dtypes.float32)

cat_crossentropy_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_order.data'),
        tf.dtypes.float32)

cat_crossentropy_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'modeling_error_cat_crossentropy_species.data'),
        tf.dtypes.float32)

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.boxplot(
    [cat_crossentropy_phylum,
     cat_crossentropy_class,
     cat_crossentropy_order,
     cat_crossentropy_species],
    sym='')

ax.set_title('Developmet of categorical crossentropy across layers')
ax.set_xlabel('Layer')
ax.set_ylabel('Categorical crossentropy')
ax.set_xticklabels(['phylum', 'class', 'order', 'species'])

ax.grid(linestyle='--')

fig.show()

Data error

In [None]:
# To evaluate how the composed model propagates data error across layers, we add artificial white
# noise to the original images to model erroneous records.
ilsvrc2012_test_data_error = process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=None, is_train=False)
ilsvrc2012_test_data_error = ilsvrc2012_test_data_error.unbatch().map(
    lambda image, label: (
        tf.dtypes.cast(
            tf.math.maximum(
                tf.math.minimum(
                    tf.dtypes.cast(image, tf.dtypes.float32) + tf.random.normal((CROP_SIZE_H, CROP_SIZE_W, 3), mean=0, stddev=(RGB_MAX_VAL - RGB_MIN_VAL) / 20),
                    RGB_MAX_VAL),
                RGB_MIN_VAL),
            tf.dtypes.uint8),
        label),
    num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(batch_size=10)

In [None]:
# Initialize custom metrics to watch during the evaluation

# Categorical crossentropies for each taxonomic category
cat_crossentropy_phylum = []
cat_crossentropy_class = []
cat_crossentropy_order = []
cat_crossentropy_species = []

In [None]:
for (imgs, ground_truths) in ilsvrc2012_test_data_error:
    # Generate predictions for each image in the current batch
    batch_scores_phylum = phylum_model.predict(imgs)
    
    # Since we constructed our test data set in a way that each image in one batch is an augmented
    # version of the same base image, we can simply average the individual scores to get the final
    # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
    # (Krizhevsky et al., 2012).
    prediction_phylum = tf.math.reduce_mean(batch_scores_phylum, axis=0)
    
    # Route the current batch to the sub-module predcited by the phylum model and generate the
    # predictions for the respective class category labels
    batch_scores_class = class_models[tf.math.argmax(prediction_phylum)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_class = tf.math.reduce_mean(batch_scores_class, axis=0)
    
    # Route the current batch to the sub-module predcited by the class model and generate the
    # predictions for the respective order category labels
    batch_scores_order = order_models[tf.math.argmax(prediction_class)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_order = tf.math.reduce_mean(batch_scores_order, axis=0)
    
    # Route the current batch to the sub-module predcited by the order model and generate the
    # predictions for the respective species category labels
    batch_scores_species = species_models[tf.math.argmax(prediction_order)].predict(imgs)
    
    # Average the scores for the individual images in the batch as described above
    prediction_species = tf.math.reduce_mean(batch_scores_species, axis=0)
        
    # Update custom metrics w/ the result for the current base image; as all images in one batch
    # originate from the same base image (cf. above), the ground truth is hence identical as well.
    
    ground_truth_species = tf.math.argmax(ground_truths[0]).numpy()
    ground_truth_phylum = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_PHYLUM_LABEL_LEVEL, encoded=True)
    ground_truth_class = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_CLASS_LABEL_LEVEL, encoded=True)
    ground_truth_order = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_ORDER_LABEL_LEVEL, encoded=True)

    cat_crossentropy_phylum.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_phylum, ILSVRC2012_NUM_LABELS_PHYLUM_LAYER) * tf.math.log(prediction_phylum)))
    cat_crossentropy_class.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_class, ILSVRC2012_NUM_LABELS_CLASS_LAYER) * tf.math.log(prediction_class)))
    cat_crossentropy_order.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_order, ILSVRC2012_NUM_LABELS_ORDER_LAYER) * tf.math.log(prediction_order)))
    cat_crossentropy_species.append(
        -tf.math.reduce_sum(
            tf.one_hot(ground_truth_species, ILSVRC2012_NUM_LABELS_SPECIES_LAYER) * tf.math.log(prediction_species)))

In [None]:
# Store the results

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_phylum.data',
    tf.io.serialize_tensor(cat_crossentropy_phylum))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_class.data',
    tf.io.serialize_tensor(cat_crossentropy_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_order.data',
    tf.io.serialize_tensor(cat_crossentropy_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_species.data',
    tf.io.serialize_tensor(cat_crossentropy_species))

In [None]:
# Load the results

cat_crossentropy_phylum = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_phylum.data'),
        tf.dtypes.float32)

cat_crossentropy_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_class.data'),
        tf.dtypes.float32)

cat_crossentropy_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_order.data'),
        tf.dtypes.float32)

cat_crossentropy_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_COMPNET + 'data_error_cat_crossentropy_species.data'),
        tf.dtypes.float32)

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.boxplot(
    [cat_crossentropy_phylum,
     cat_crossentropy_class,
     cat_crossentropy_order,
     cat_crossentropy_species],
    sym='')

ax.set_title('Developmet of categorical crossentropy across layers')
ax.set_xlabel('Layer')
ax.set_ylabel('Categorical crossentropy')
ax.set_xticklabels(['phylum', 'class', 'order', 'species'])

ax.grid(linestyle='--')

fig.show()

---
## Benchmark: VGG-19 (Simonyan et. al, 2015)
<a id='benchmark'></a>

As a comparative benchmark, a (monolithic) VGG-19 Network as described in (Simonyan et. al, 2015) was trained on the basis of the same preprocessed dataset. Hyperparameters were chosen in accordance to the values reported in the original publication as far as possible; the rest was tuned experimentally.

### Model
<a id='benchmark_model'></a>

In [None]:
model = vgg_model_from_template(input_shape=[224, 224, 3],
                                num_classes=ILSVRC2012_NUM_LABELS_SPECIES_LAYER,
                                num_conv_layers=16,
                                num_conv_channels=[64, 64, 128, 128,
                                                   256, 256, 256, 256,
                                                   512, 512, 512, 512,
                                                   512, 512, 512, 512],
                                num_fc_layers=2,
                                num_fc_neurons=4096)
model.summary()

In [None]:
# Use Categorical Cross Entropy as loss function
loss = tf.keras.losses.CategoricalCrossentropy()

# Define metrics to watch during training
metrics = [tf.keras.metrics.CategoricalAccuracy(),
           tf.keras.metrics.TopKCategoricalAccuracy(k=5),
           tf.keras.metrics.CategoricalCrossentropy(),
           tf.keras.metrics.AUC()]

# Use Adam (Kingma et al., 2017) as optimizer during training
# Annotation: We don't set `learning_rate` here as this is automatically handled by the
# LearningRateScheduler (cf. callback section below).
optimizer = tf.keras.optimizers.Adam(beta_1=0.9,
                                     beta_2=0.999,
                                     epsilon=1e-07)

In [None]:
model.compile(loss=loss, metrics=metrics, optimizer=optimizer)

In [None]:
# Alternatively: Restore model from checkpoint
# model = tf.keras.models.load_model(CKPT_DIR + 'vgg19')
# model.summary()

### Training
<a id='benchmark_train'></a>

In [None]:
ilsvrc2012_train = process_and_augment(ilsvrc2012_train_raw, batch_size=32, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=None, is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1)
ilsvrc2012_val = process_and_augment(ilsvrc2012_val_raw, batch_size=32, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=None, is_train=True, num_rnd_crops=5, shuffle_buffer_size=10000, num_epochs=1)

In [None]:
# EarlyStopping: Stop training early if no significant improvement in the monitored quantity is
#     observed for at least `patience` epochs
# LearningRateScheduler: Dynamically adapt the learning rate depending on the training epoch to
#     facilitate accelerated learning during the first few epochs
# ModelCheckpoint: Save the model after each epoch (if `save_best_only` is set to True, only keep
#     the best model with regard to the monitored quantity)
# TensorBoard: Enable TensorBoard visualization
callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                              min_delta=0.01,
                                              patience=3),
             tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-02 if epoch < 3 else 1e-03),
             tf.keras.callbacks.ModelCheckpoint(filepath=CKPT_DIR + 'vgg19',
                                                monitor='val_loss',
                                                verbose=False,
                                                save_best_only=True),
             tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR,
                                            histogram_freq=1)]

In [None]:
model.fit(x=ilsvrc2012_train,
          epochs=100,
          verbose=True,
          callbacks=callbacks,
          validation_data=ilsvrc2012_val,
          shuffle=True,
          validation_freq=1)

### Testing
<a id='benchmark_test'></a>

We perform model evaluation adhering to the standard 10-crop procedure as described in (Goodfellow et al., 2013) and (Krizhevsky et al., 2012).

In [None]:
ilsvrc2012_test = process_and_augment(ilsvrc2012_test_raw, batch_size=10, synset_level=ILSVRC2012_SPECIES_LABEL_LEVEL, hypernym=None, is_train=False)

In [None]:
# Define metrics to watch during the evaluation of the model on the test data set

# Use the same metrics as for the training
# test_metrics = metrics

# Use different metrics than during the training
test_metrics = [tf.keras.metrics.CategoricalAccuracy(name='CategoricalAccuracy'),
                tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='TopKCategoricalAccuracy'),
                tf.keras.metrics.CategoricalCrossentropy(name='CategoricalCrossentropy'),
                tf.keras.metrics.AUC(name='AUC')]

In [None]:
# Reset all metrics before starting the evaluation
for metric in test_metrics:
    metric.reset_states()
    
# Initialize additional custom metrics to watch during the evaluation

# Overall label distribution (predicted and ground truth)
dist_ground_truth = [0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER
dist_predicted = [0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER

# Predicted label distribution for each taxonomic category
dist_predicted_phylum = [[0] * ILSVRC2012_NUM_LABELS_PHYLUM_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_class = [[0] * ILSVRC2012_NUM_LABELS_CLASS_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_predicted_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]
dist_predicted_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_SPECIES_LAYER)]

# Fine label distribution for each coarse category (predicted and ground truth)
dist_ground_truth_phylum_class = [[0] * ILSVRC2012_NUM_LABELS_CLASS_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_phylum_class = [[0] * ILSVRC2012_NUM_LABELS_CLASS_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_ground_truth_phylum_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_phylum_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_ground_truth_phylum_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_predicted_phylum_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER)]
dist_ground_truth_class_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_predicted_class_order = [[0] * ILSVRC2012_NUM_LABELS_ORDER_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_ground_truth_class_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_predicted_class_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER)]
dist_ground_truth_order_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]
dist_predicted_order_species = [[0] * ILSVRC2012_NUM_LABELS_SPECIES_LAYER for _ in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER)]

# Semantic distance between the predicted category and the ground truth in accordance to
# (Fergus et al., 2010) 
semantic_distance = 0.0

# Uncertainty metrics in accordance to (Ovadia et. al, 2019)
confidence = []
cat_acc = []
neg_log_likelihood = []
brier_score = []
pred_entropy = []

In [None]:
for (imgs, ground_truths) in ilsvrc2012_test:
    # Generate predictions for each image in the current batch
    batch_scores = model.predict(imgs)
    
    # Since we constructed our test data set in a way that each image in one batch is an augmented
    # version of the same base image, we can simply average the individual scores to get the final
    # prediction for the respective base image in adherence to (Goodfellow et al., 2013) and
    # (Krizhevsky et al., 2012).
    prediction = tf.math.reduce_mean(batch_scores, axis=0)
    
    # Update the metrics w/ the result for the current base image; as all images in one batch
    # originate from the same base image (cf. above), the ground truth is hence identical as well.
    # Annotation: The `tf.expand_dims` is a workaround for compatibility with
    # `tf.keras.metrics.TopKCategoricalAccuracy` since the latter  does not accept one-dimensional
    # inputs as of TensorFlow version v2.2.0-rc3.
    for metric in test_metrics:
        metric.update_state(tf.expand_dims(ground_truths[0], 0), tf.expand_dims(prediction, 0))
        
    # Update custom metrics w/ the result for the current base image; cf. above concerning the
    # ground truth for each batch
    
    ground_truth = ground_truths[0]
    ground_truth_species = tf.math.argmax(ground_truth).numpy()
    ground_truth_species_decoded = ilsvrc2012_decode(ground_truth_species, ILSVRC2012_SPECIES_LABEL_LEVEL)
    
    ground_truth_phylum = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_PHYLUM_LABEL_LEVEL, encoded=True)
    ground_truth_class = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_CLASS_LABEL_LEVEL, encoded=True)
    ground_truth_order = ilsvrc2012_resolve_hypernym(ground_truth_species, ILSVRC2012_ORDER_LABEL_LEVEL, encoded=True)

    prediction_species = tf.math.argmax(prediction).numpy()
    prediction_species_decoded = ilsvrc2012_decode(prediction_species, ILSVRC2012_SPECIES_LABEL_LEVEL)
    
    prediction_phylum = ilsvrc2012_resolve_hypernym(prediction_species, ILSVRC2012_PHYLUM_LABEL_LEVEL, encoded=True)
    prediction_class = ilsvrc2012_resolve_hypernym(prediction_species, ILSVRC2012_CLASS_LABEL_LEVEL, encoded=True)
    prediction_order = ilsvrc2012_resolve_hypernym(prediction_species, ILSVRC2012_ORDER_LABEL_LEVEL, encoded=True)
    
    dist_ground_truth[ground_truth_species] += 1
    dist_predicted[prediction_species] += 1

    dist_predicted_phylum[ground_truth_phylum][prediction_phylum] += 1
    dist_predicted_class[ground_truth_class][prediction_class] += 1
    dist_predicted_order[ground_truth_order][prediction_order] += 1
    dist_predicted_species[ground_truth_species][prediction_species] += 1

    dist_ground_truth_phylum_class[ground_truth_phylum][prediction_class] += 1
    dist_predicted_phylum_class[ground_truth_phylum][prediction_class] += 1
    dist_ground_truth_phylum_order[ground_truth_phylum][prediction_order] += 1
    dist_predicted_phylum_order[ground_truth_phylum][prediction_order] += 1
    dist_ground_truth_phylum_species[ground_truth_phylum][prediction_species] += 1
    dist_predicted_phylum_species[ground_truth_phylum][prediction_species] += 1
    dist_ground_truth_class_order[ground_truth_class][prediction_order] += 1
    dist_predicted_class_order[ground_truth_class][prediction_order] += 1
    dist_ground_truth_class_species[ground_truth_class][prediction_species] += 1
    dist_predicted_class_species[ground_truth_class][prediction_species] += 1
    dist_ground_truth_order_species[ground_truth_order][prediction_species] += 1
    dist_predicted_order_species[ground_truth_order][prediction_species] += 1
    
    semantic_distance += ILSVRC2012_SYNSET_MAP.semantic_distance(
        ground_truth_species_decoded,
        prediction_species_decoded
    )

    confidence.append(
        prediction[prediction_species])
        
    cat_acc.append(
        tf.dtypes.cast(tf.math.equal(ground_truth_species, prediction_species), tf.dtypes.float32))

    neg_log_likelihood.append(
        -tf.math.log(prediction[ground_truth_species]))

    brier_score.append(
        tf.math.reduce_sum((prediction - ground_truth) ** 2))

    pred_entropy.append(
        -tf.math.reduce_sum(tf.map_fn(lambda p: p * tf.math.log(p), prediction)))

print('Benchmark')
print()
print('==================================================')
print()
print('Final results:')
for metric in test_metrics:
    print('{}: {}'.format(metric.name, metric.result().numpy()))

In [None]:
# Store the results

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth.data',
    tf.io.serialize_tensor(dist_ground_truth))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted.data',
    tf.io.serialize_tensor(dist_predicted))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum.data',
    tf.io.serialize_tensor(dist_predicted_phylum))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_class.data',
    tf.io.serialize_tensor(dist_predicted_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_order.data',
    tf.io.serialize_tensor(dist_predicted_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_species.data',
    tf.io.serialize_tensor(dist_predicted_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_phylum_class.data',
    tf.io.serialize_tensor(dist_ground_truth_phylum_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum_class.data',
    tf.io.serialize_tensor(dist_predicted_phylum_class))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_phylum_order.data',
    tf.io.serialize_tensor(dist_ground_truth_phylum_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum_order.data',
    tf.io.serialize_tensor(dist_predicted_phylum_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_phylum_species.data',
    tf.io.serialize_tensor(dist_ground_truth_phylum_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum_species.data',
    tf.io.serialize_tensor(dist_predicted_phylum_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_class_order.data',
    tf.io.serialize_tensor(dist_ground_truth_class_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_class_order.data',
    tf.io.serialize_tensor(dist_predicted_class_order))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_class_species.data',
    tf.io.serialize_tensor(dist_ground_truth_class_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_class_species.data',
    tf.io.serialize_tensor(dist_predicted_class_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_order_species.data',
    tf.io.serialize_tensor(dist_ground_truth_order_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_order_species.data',
    tf.io.serialize_tensor(dist_predicted_order_species))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'semantic_distance.data',
    tf.io.serialize_tensor(semantic_distance))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'confidence.data',
    tf.io.serialize_tensor(confidence))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'cat_acc.data',
    tf.io.serialize_tensor(cat_acc))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'neg_log_likelihood.data',
    tf.io.serialize_tensor(neg_log_likelihood))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'brier_score.data',
    tf.io.serialize_tensor(brier_score))

tf.io.write_file(
    ILSVRC2012_RESULTS_DIR_BENCHMARK + 'pred_entropy.data',
    tf.io.serialize_tensor(pred_entropy))

In [None]:
# Load the results

dist_ground_truth = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth.data'),
        tf.dtypes.int32)

dist_predicted = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted.data'),
        tf.dtypes.int32)

dist_predicted_phylum = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum.data'),
        tf.dtypes.int32)

dist_predicted_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_class.data'),
        tf.dtypes.int32)

dist_predicted_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_order.data'),
        tf.dtypes.int32)

dist_predicted_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_species.data'),
        tf.dtypes.int32)

dist_ground_truth_phylum_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_phylum_class.data'),
        tf.dtypes.int32)

dist_predicted_phylum_class = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum_class.data'),
        tf.dtypes.int32)

dist_ground_truth_phylum_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_phylum_order.data'),
        tf.dtypes.int32)

dist_predicted_phylum_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum_order.data'),
        tf.dtypes.int32)

dist_ground_truth_phylum_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_phylum_species.data'),
        tf.dtypes.int32)

dist_predicted_phylum_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_phylum_species.data'),
        tf.dtypes.int32)

dist_ground_truth_class_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_class_order.data'),
        tf.dtypes.int32)

dist_predicted_class_order = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_class_order.data'),
        tf.dtypes.int32)

dist_ground_truth_class_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_class_species.data'),
        tf.dtypes.int32)

dist_predicted_class_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_class_species.data'),
        tf.dtypes.int32)

dist_ground_truth_order_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_ground_truth_order_species.data'),
        tf.dtypes.int32)

dist_predicted_order_species = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'dist_predicted_order_species.data'),
        tf.dtypes.int32)

semantic_distance = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'semantic_distance.data'),
        tf.dtypes.float32)

confidence = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'confidence.data'),
        tf.dtypes.float32)

cat_acc = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'cat_acc.data'),
        tf.dtypes.float32)

neg_log_likelihood = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'neg_log_likelihood.data'),
        tf.dtypes.float32)

brier_score = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'brier_score.data'),
        tf.dtypes.float32)

pred_entropy = tf.io.parse_tensor(
    tf.io.read_file(
        ILSVRC2012_RESULTS_DIR_BENCHMARK + 'pred_entropy.data'),
        tf.dtypes.float32)

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20, 5))
titles = ['Predicted', 'Ground Truth']

for idx, dist in enumerate([dist_predicted, dist_ground_truth]):
    # Normalize the distribution
    dist = list(map(lambda entry: entry / sum(dist), dist)) 
                
    # Plot the distribution
    ax[idx].bar(range(1, len(dist) + 1), dist, width=1)
    ax[idx].set_title(titles[idx])
    ax[idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 25))
    ax[idx].set_ylim([0, 0.033])
    ax[idx].set_xlabel('Labels')
    ax[idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(20, 15))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_phylum[label]), dist_predicted_phylum[label])) 

    # Plot the distribution
    ax[label].bar(range(0, len(dist)), dist, width=1)
    ax[label].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))
    )
    ax[label].set_xticks(range(0, ILSVRC2012_NUM_LABELS_PHYLUM_LAYER, 1))
    ax[label].set_xlabel('Labels')
    ax[label].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(11, 1, figsize=(20, 165)

for label in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_class[label]), dist_predicted_class[label])) 

    # Plot the distribution
    ax[label].bar(range(0, len(dist)), dist, width=1)
    ax[label].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL))
    )
    ax[label].set_xticks(range(0, ILSVRC2012_NUM_LABELS_CLASS_LAYER, 1))
    ax[label].set_xlabel('Labels')
    ax[label].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(12, 4, figsize=(20, 180))

for label in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
    row = int(label / 4)
    col = int(label % 4)
    
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_order[label]), dist_predicted_order[label])) 

    # Plot the distribution
    ax[row][col].bar(range(0, len(dist)), dist, width=1)
    ax[row][col].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_ORDER_LABEL_LEVEL))
    )
    ax[row][col].set_xticks(range(0, ILSVRC2012_NUM_LABELS_ORDER_LAYER, 4))
    ax[row][col].set_xlabel('Labels')
    ax[row][col].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(95, 4, figsize=(20, 1425))

for label in range(ILSVRC2012_NUM_LABELS_SPECIES_LAYER):
    row = int(label / 4)
    col = int(label % 4)
    
    # Normalize the distribution
    dist = list(map(
        lambda entry: entry / sum(dist_predicted_species[label]), dist_predicted_species[label])) 

    # Plot the distribution
    ax[row][col].bar(range(0, len(dist)), dist, width=1)
    ax[row][col].set_title(
        'Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_SPECIES_LABEL_LEVEL))
    )
    ax[row][col].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
    ax[row][col].set_xlabel('Labels')
    ax[row][col].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(20, 30))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_phylum_class, dist_ground_truth_phylum_class]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_CLASS_LAYER, 1))
        ax[label][idx].set_ylim([0, 0.33])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(20, 30))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_phylum_order, dist_ground_truth_phylum_order]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_ORDER_LAYER, 4))
        ax[label][idx].set_ylim([0, 0.01])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(20, 30))

for label in range(ILSVRC2012_NUM_LABELS_PHYLUM_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_PHYLUM_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_phylum_species, dist_ground_truth_phylum_species]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
        ax[label][idx].set_ylim([0, 0.033])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(11, 2, figsize=(20, 165))

for label in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_class_order, dist_ground_truth_class_order]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_ORDER_LAYER, 4))
        ax[label][idx].set_ylim([0, 0.01])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(11, 2, figsize=(20, 165))

for label in range(ILSVRC2012_NUM_LABELS_CLASS_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_CLASS_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_class_species, dist_ground_truth_class_species]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
        ax[label][idx].set_ylim([0, 0.033])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(47, 2, figsize=(20, 40))

for label in range(ILSVRC2012_NUM_LABELS_ORDER_LAYER):
    titles = ['Predicted Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_ORDER_LABEL_LEVEL)),
          'Ground Truth Distribution for #{}'.format(ilsvrc2012_decode(label, ILSVRC2012_ORDER_LABEL_LEVEL))]

    for idx, dist in enumerate([dist_predicted_order_species, dist_ground_truth_order_species]):
        # Normalize the distribution
        dist = list(map(lambda entry: entry / sum(dist[label]), dist[label])) 
                    
        # Plot the distribution
        ax[label][idx].bar(range(1, len(dist) + 1), dist, width=1)
        ax[label][idx].set_title(titles[idx])
        ax[label][idx].set_xticks(range(0, ILSVRC2012_NUM_LABELS_SPECIES_LAYER, 20))
        ax[label][idx].set_ylim([0, 0.033])
        ax[label][idx].set_xlabel('Labels')
        ax[label][idx].set_ylabel('Share')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.boxplot(cat_acc, sym='')

ax.set_title('Categorical accuracy over varying corruption severities')
ax.set_ylabel('Accuracy')

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.boxplot(brier_score, sym='')

ax.set_title('Brier Score over varying corruption severities')
ax.set_ylabel('Brier Score')

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.hist(confidence[idx], bins=10000, cumulative=-1, histtype='step')
ax.set_title('Confidence')
ax.set_xlabel('Confidence ' + r'$\tau$')
ax.set_ylabel('Number examples ' + r'$P(x) > \tau$')
#ax.set_xlim([0.002, 1.0])

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.hist(neg_log_likelihood[idx], bins=10000, histtype='step')
ax.set_title('Negative log likelihood')
ax.set_xlabel('Likelihood ' + r'$l$')
ax.set_ylabel('Number examples ' + r'$L(x) > l$')
#ax.set_xlim([0.202, 1.0])

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.hist(pred_entropy[idx], bins=10000, cumulative=-1, histtype='step')
ax.set_title('Predictive entropy')
ax.set_xlabel('Entropy (Nats)' + r'$h$')
ax.set_ylabel('Number examples ' + r'$H(x) > h$')
#ax.set_xlim([0.0, 2.5])

ax.grid(linestyle='--')

fig.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(20, 10))

ax.plot(
    [val for val in sorted(confidence[idx])],
    [val for _, val in sorted(zip(confidence[idx], cat_acc[idx]))])
ax.set_title('Confidence vs accuracy')
ax.set_xlabel('Confidence ' + r'$\tau$')
ax.set_ylabel('Categorical accuracy on examples ' + r'$P(x) > \tau$')
ax.set_xlim([0.0, 1.0])

ax.grid(linestyle='--')

fig.show()