# EnsNet Ensemble Learning with Majority Voting on TPU
----------

## This notebook contains a third party TensorFlow implementation of EnsNet
A novel CNN architecture, it is one of the state-of-art for MNIST, it can also be tested with Fashion MNIST and CIFAR-10.

[Ensemble Learning in CNN Augmented with Fully Connected Subnetworks](https://arxiv.org/pdf/2003.08562v3.pdf).

EnsNet is designed to enhance image recognition performance by leveraging a base CNN combined with multiple FCSNs, improving accuracy through ensemble learning techniques to then use a majority voting count.

[paperswithcode/image-classification-on-mnist](https://paperswithcode.com/sota/image-classification-on-mnist)

### Let's make sure we install the dependencies

In [1]:
!pip install cloud-tpu-client
!pip install dropconnect-tensorflow
!pip install tensorflow-addons

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Collecting dropconnect-tensorflow
  Downloading dropconnect-tensorflow-0.1.1.tar.gz (4.2 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting keras<2.16,>=2.15.0
  Downloading keras-2.15.0-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Building wheels for collected packages: dropconnect-tensorflow
  Building wheel for dropconnect-tensorflow (setup.py) ... [?25ldone
[?25h  Created wheel for dropconnect-tensorflow: filename=dropconnect_tensorflow-0.1.1-py3-none-any.whl size=4640 sha256=4813c40afd6bc49f1327568af742d1f9ca1e55a418414e6ffd4bad56d41c2117
  Stored in directory: /root/.cache/pip/wheels/7e/4a/e5/266cd645d

In [None]:
import math
import numpy as np
import tensorflow as tf
from PIL import Image
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import BatchNormalization, Conv2D, Dense, Dropout, Flatten, Input, Lambda, MaxPooling2D, Reshape
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, LearningRateScheduler
from dropconnect_tensorflow import DropConnectDense
import tensorflow.keras.backend as K
import tensorflow_addons as tfa
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')


# Leveraging TPUs for Enhanced Performance in Deep Learning

### Unlike GPUs, TPUs are designed specifically for deep learning. 
- They excel in handling high volumes of matrix and tensor calculations, which are prevalent in deep learning algorithms. 
- This specialization allows for smoother model training and inference by reducing computational bottlenecks.
## Setting Up the TPU Environment
- TPU Cluster Resolver: Initialize the TPU system by specifying the TPU's address. TensorFlow provides a simple interface to connect to the TPU cluster.
- Initializing TPU System: Once connected, we initialize the TPU system, making it ready to execute operations.
- Distributing with TPUStrategy: TensorFlow's TPUStrategy allows us to define how our model should be distributed and executed across the TPU cores. This strategy handles the distribution of computations and data, optimizing for parallel execution.


In [None]:
def setup_distribution_strategy():
    """
    Sets up the TensorFlow distribution strategy based on available hardware, prioritizing TPUs if available.

    Returns:
    - strategy: The resolved TensorFlow distribution strategy.
    """
    # Suppress warnings to clean up output
    warnings.filterwarnings('ignore')

    # Automatically tune the dataset performance
    AUTO = tf.data.experimental.AUTOTUNE

    try:
        # Attempt to detect and initialize a TPU
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
        print('Running on TPU ', tpu.master())
    except ValueError:
        tpu = None

    if tpu:
        # If TPU is found, initialize the TPU system and setup TPU strategy
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        strategy = tf.distribute.experimental.TPUStrategy(tpu)
    else:
        # If TPU is not found, fall back to default strategy (could be CPUs, GPUs)
        strategy = tf.distribute.get_strategy()

    return strategy, AUTO

## Pre-Processing Data

## Loading Data

We start loading the MNIST dataset, normalizating inputs by 255 and reshaping it, as well as specifying the number of classes.

## We will create Pre-Processing Functions ourselves. 

### But... isn't easier with Keras library "ImageDataGenerator"?

Yes, but to avoid bottleneck it is neccesary to use tf.data API, which provides more flexibility and efficiency for data loading and preprocessing!

To fully leverage the power of TPU, we will have to <b>parallelized</b> our data

We will have to create some manual data augmentation functions, they will include

- <b>Rotation</b>: This is like spinning a picture around a point in the middle. Imagine pinning a photo to a wall and then twisting it left or right without moving its center.
- <b>Shear</b>: Think of it as pulling the top edge of an image to one side without moving the bottom edge, making the picture look slanted. It's like stretching or compressing one side of the image more than the other.
- <b>Zoom</b>: This involves making everything in the image bigger (zooming in) or smaller (zooming out). It's like moving a camera lens closer to or further from a scene to change how much of it you see.
- <b>Shift</b>: This means moving the whole image up, down, left, or right. Picture sliding a photograph across a table without rotating it; every part of the image moves the same distance in the same direction.

In [None]:
def load_and_preprocess_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    num_classes = 10
    x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
    x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
    y_train = to_categorical(y_train, num_classes)
    y_test = to_categorical(y_test, num_classes)
    return x_train, y_train, x_test, y_test

def get_mat(rotation, shear, height_zoom, width_zoom, height_shift, width_shift):
    # Returns 3x3 transform matrix which transforms indicies
        
    # Convert degrees to radians
    rotation = math.pi * rotation / 180.
    
    # Rotation Matrix
    c1 = tf.math.cos(rotation)
    s1 = tf.math.sin(rotation)
    one = tf.constant([1],dtype='float32')
    zero = tf.constant([1],dtype='float32')
    rotation_matrix = tf.reshape(tf.concat([c1,s1,zero, -s1,c1,zero, zero,zero,one],axis=0),[3,3])
    
    # Shear Matrix
    c2 = tf.math.cos(shear)
    s2 = tf.math.sin(shear)
    shear_matrix = tf.reshape(tf.concat([one,s2,zero, zero,c2,zero, zero,zero,one],axis=0),[3,3])    
    
    # Zoom Matrix
    zoom_matrix = tf.reshape(tf.concat([one/height_zoom,zero,zero, zero,one/width_zoom,zero, zero,zero,one],axis=0),[3,3])
    
    # Shift Matrix
    shift_matrix = tf.reshape(tf.concat([one,zero,height_shift, zero,one,width_shift, zero,zero,one],axis=0),[3,3])

    return(rotation_matrix)

def transform(image,label):
    # input image - is one image of size [dim,dim,3] not a batch of [b,dim,dim,3]
    # output - image randomly rotated
    DIM = image.shape[0]
    XDIM = DIM%2 #fix for size 331
    rot = 10. * tf.random.normal([1],dtype='float32')
    shr = 3. * tf.random.normal([1],dtype='float32') 
    h_zoom = .08 + tf.random.normal([1],dtype='float32')/10.
    w_zoom = .08 + tf.random.normal([1],dtype='float32')/10.
    random_factor = tf.random.uniform([1], minval=0, maxval=1, dtype='float32')

    # Interpolate within the range for height and width zoom factors
    h_shift = 8. * tf.random.normal([1],dtype='float32') 
    w_shift = 8. * tf.random.normal([1],dtype='float32') 
  
    # Get transformation matrix
    m = get_mat(rot, shr, h_zoom, w_zoom, h_shift, w_shift) 

    # List destination pixel indices
    x = tf.repeat( tf.range(DIM//2,-DIM//2,-1), DIM )
    y = tf.tile( tf.range(-DIM//2,DIM//2),[DIM] )
    z = tf.ones([DIM*DIM],dtype='int32')
    idx = tf.stack( [x,y,z] )
    
    # Rotate destination pixels onto origin pixels
    idx2 = tf.keras.backend.dot(m,tf.cast(idx,dtype='float32'))
    idx2 = tf.keras.backend.cast(idx2,dtype='int32')
    idx2 = tf.keras.backend.clip(idx2,-DIM//2+XDIM+1,DIM//2)
    
    # Find origin pixel values           
    idx3 = tf.stack( [DIM//2-idx2[0,], DIM//2-1+idx2[1,]] )
    d = tf.gather_nd(image,tf.transpose(idx3))
        
    return tf.reshape(d,[DIM,DIM,1]),label

In [None]:
def prepare_datasets(x_train, y_train, x_test, y_test, batch_size, AUTO=tf.data.experimental.AUTOTUNE, augment=True):
    """
    Prepares training, validation, and test datasets. Optionally applies data augmentation to the training dataset.

    Parameters:
    - x_train, y_train: Training data and labels.
    - x_test, y_test: Test data and labels.
    - batch_size: The size of the batches to use.
    - AUTO: TensorFlow data experimental AUTOTUNE setting.
    - augment: Whether to apply augmentation to the training dataset.

    Returns:
    - train_dataset_augmented: Training dataset, optionally augmented.
    - train_dataset: Training dataset without augmentation.
    - val_dataset: Validation dataset.
    - test_dataset: Test dataset.
    """
    # Convert the inputs to TensorFlow datasets
    train_dataset_base = tf.data.Dataset.from_tensor_slices((x_train.astype(np.float32), y_train.astype(np.float32)))
    test_dataset_base = tf.data.Dataset.from_tensor_slices((x_test.astype(np.float32), y_test.astype(np.float32)))

    # Apply augmentation if enabled
    if augment:
        train_dataset_augmented = train_dataset_base.map(transform, num_parallel_calls=AUTO)
    else:
        train_dataset_augmented = train_dataset_base

    train_dataset_augmented = train_dataset_augmented.repeat().shuffle(2048).batch(batch_size).prefetch(AUTO)
    train_dataset = train_dataset_base.repeat().shuffle(2048).batch(batch_size).prefetch(AUTO)

    val_dataset = test_dataset_base.batch(batch_size).cache().prefetch(AUTO)
    test_dataset = test_dataset_base.batch(batch_size).prefetch(AUTO)

    return train_dataset_augmented, train_dataset, val_dataset, test_dataset,batch_size

In [None]:
def build_model(strategy):
    with strategy.scope():
        inputs = Input(shape=(28, 28, 1))
        x = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')(inputs)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(128, (3, 3), activation='relu')(x)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='valid')(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.35)(x)
        x = Conv2D(512, (3, 3), activation='relu', padding='valid')(x)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(1024, (3, 3), activation='relu')(x)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(2000, (3, 3), activation='relu', padding='valid')(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.35)(x)

        cnn_output = Flatten()(x)
        cnn_output = Dense(512, activation='relu')(cnn_output)
        cnn_output = BatchNormalization()(cnn_output)
        cnn_output = Dropout(0.5)(cnn_output)
        cnn_output = DropConnectDense(512, activation='relu', prob=0.5)(cnn_output)
        cnn_output = Dense(10, activation='softmax')(cnn_output)

        base_output = x
        shape = K.int_shape(base_output)
        num_feature_maps = shape[-1]  # This would be 2000 based on your architecture
        subnet_feature_maps = num_feature_maps // 10  # This assumes an even split
        subnet_outputs = []

        for i in range(10):
            subnet_input = Lambda(lambda z: z[:, :, :, i * subnet_feature_maps:(i + 1) * subnet_feature_maps])(base_output)
            subnet_input = Reshape((shape[1] * shape[2] * subnet_feature_maps,))(subnet_input)
            fc = Dense(512, activation='relu')(subnet_input)
            fc = BatchNormalization()(fc)
            fc = Dropout(0.5)(fc)
            fc = DropConnectDense(512, activation='relu', prob=0.5)(fc)
            subnet_output = Dense(10, activation='softmax')(fc)
            subnet_outputs.append(subnet_output)
        subnet_outputs.append(cnn_output)
        full_model = Model(inputs=inputs, outputs=subnet_outputs)
        
    return full_model



In [None]:
def build_cnn_model(inputs):
    base_output = build_base_model(inputs)
    cnn_output = Flatten()(base_output)
    cnn_output = Dense(512, activation='relu')(cnn_output)
    cnn_output = BatchNormalization()(cnn_output)
    cnn_output = Dropout(0.5)(cnn_output)
    cnn_output = DropConnectDense(512, activation='relu', prob=0.5)(cnn_output)
    cnn_output = Dense(10, activation='softmax')(cnn_output)
    return cnn_output


In [None]:
def assemble_full_model(strategy):
        inputs = Input(shape=(28, 28, 1))
        '''
           cnn_output = build_cnn_model(inputs)
        subnet_outputs = build_subnet_model(inputs)
        '''
        x = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='valid')(inputs)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(128, (3, 3), activation='relu')(x)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='valid')(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.35)(x)
        x = Conv2D(512, (3, 3), activation='relu', padding='valid')(x)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(1024, (3, 3), activation='relu')(x)
        x = BatchNormalization()(x)
        x = Dropout(0.35)(x)
        x = Conv2D(2000, (3, 3), activation='relu', padding='valid')(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(0.35)(x)
     
        subnet_outputs.append(cnn_output)
        full_model = Model(inputs=inputs, outputs=subnet_outputs)
        return full_model

In [None]:
def get_adamw_optimizer(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, weight_decay=0.0):
    """
    Initialize and return the AdamW optimizer.

    Parameters:
    - learning_rate: float, learning rate for the optimizer.
    - beta_1: float, the exponential decay rate for the 1st moment estimates.
    - beta_2: float, the exponential decay rate for the 2nd moment estimates.
    - epsilon: float, a small constant for numerical stability.
    - weight_decay: float, weight decay rate to apply to weights.

    Returns:
    - adamw_optimizer: tf.keras.optimizers.Optimizer, the AdamW optimizer configured with the specified parameters.
    """
    adamw_optimizer = tfa.optimizers.AdamW(
        learning_rate=learning_rate,
        beta_1=beta_1,
        beta_2=beta_2,
        epsilon=epsilon,
        weight_decay=weight_decay
    )
    return adamw_optimizer
    
def compile_and_train_model(model, train_dataset, train_dataset_augment, val_dataset, x_train_split, batch_size, epochs):
    """
    Compile and train the given model, utilizing both augmented and non-augmented training data.

    Parameters:
    - model: tf.keras.Model, the model to compile and train.
    - train_dataset: tf.data.Dataset, the non-augmented dataset for training the model.
    - train_dataset_augment: tf.data.Dataset, the augmented dataset for training the model.
    - val_dataset: tf.data.Dataset, the dataset for validating the model during training.
    - x_train_split: numpy array, the training data, used to calculate steps per epoch.
    - batch_size: int, the batch size used for training.
    - epochs: int, the number of epochs to train the model.

    Returns:
    - history: History, the history object containing training and validation loss and accuracy metrics.
    """
    adamw_optimizer = get_adamw_optimizer()

    # Compile the model with the AdamW optimizer and accuracy metrics
    model.compile(loss='categorical_crossentropy', optimizer=adamw_optimizer, metrics=['accuracy'])

    # Combine augmented and non-augmented training datasets
    train_datasets_combined = tf.data.experimental.sample_from_datasets([train_dataset_augment, train_dataset], weights=[0.5, 0.5])

    # Callbacks for learning rate scheduling and model checkpointing
    def scheduler(epoch, lr):
        if epoch < 15:
            return lr
        elif 15 <= epoch < 30:
            return lr * math.exp(-0.1)
        else:
            return lr * math.exp(-0.2)

    lr_scheduler = LearningRateScheduler(scheduler)
    checkpoint = ModelCheckpoint('ensnet_best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)

    # Train the model
    history = model.fit(train_datasets_combined, steps_per_epoch=len(x_train_split) // batch_size, epochs=epochs, validation_data=val_dataset, callbacks=[lr_scheduler, checkpoint])
    return history


def evaluate_model(model, val_dataset):
    score = model.evaluate(val_dataset)
    print("Model saved.")
    full_model.save("ensnet.h5")


In [None]:
def load_custom_model(model_path, custom_objects=None):
    """
    Load a Keras model with custom objects.

    Parameters:
    - model_path: str, path to the saved model.
    - custom_objects: dict, mapping names (str) to custom classes or functions to be considered during load.

    Returns:
    - Loaded Keras model.
    """
    return load_model(model_path, custom_objects=custom_objects)

def predict_with_model(model, x_test):
    """
    Make predictions with a given model.

    Parameters:
    - model: Loaded Keras model.
    - x_test: np.array, test dataset.

    Returns:
    - predictions: np.array, model predictions.
    """
    predictions = model.predict(x_test)
    if isinstance(predictions, list):
        predictions = np.stack(predictions, axis=0)
    return predictions

def majority_vote(predictions):
    """
    Apply a manual majority vote on model predictions.

    Parameters:
    - predictions: np.array, model predictions with shape (num_samples, num_subnets, num_classes).

    Returns:
    - final_predictions: np.array, final class predictions after majority voting.
    """
    predictions = np.transpose(predictions, (1, 0, 2))
    votes = np.argmax(predictions, axis=-1)
    final_predictions = np.array([np.bincount(votes[i]).argmax() for i in range(votes.shape[0])])
    return final_predictions

def calculate_accuracy(y_true, y_pred):
    """
    Calculate the accuracy of predictions.

    Parameters:
    - y_true: np.array, true labels.
    - y_pred: np.array, predicted labels.

    Returns:
    - accuracy: float, accuracy of the predictions.
    """
    return np.mean(y_true == y_pred)

In [None]:
strategy, AUTO = setup_distribution_strategy()

In [None]:
x_train, y_train, x_test, y_test = load_and_preprocess_data()
batch_size = 100 * strategy.num_replicas_in_sync  # Adjust based on your hardware capabilities
train_dataset_augmented, train_dataset, val_dataset, test_dataset, batch_size = prepare_datasets(
    x_train, y_train, x_test, y_test, 
    batch_size=batch_size, 
    augment=True  # Change to False if you don't want augmentation
)

In [None]:
with strategy.scope():
    full_model =  build_model(strategy)
    history = compile_and_train_model(
        full_model, 
        train_dataset_augmented,  # Assuming you want to train with the augmented dataset
        train_dataset,  # Non-augmented dataset, used for combined training in your function
        val_dataset, 
        x_train,  # Used to calculate steps_per_epoch. Ensure this is correct.
        batch_size, 
        epochs=10  # Adjust epochs as needed
        )

In [None]:
evaluate_model(full_model,val_dataset)    
model_for_inference = load_model("ensnet.h5", custom_objects={"DropConnectDense": DropConnectDense})
predictions = predict_with_model(model_for_inference, x_test)
final_predictions = majority_vote(predictions)

In [None]:
# Convert one-hot encoded y_test_split to class indices for comparison
y_test_indices = np.argmax(y_test, axis=1)

# Calculate and print accuracy
accuracy = calculate_accuracy(y_test_indices, final_predictions)
print(f"Test Accuracy after Majority Voting: {accuracy * 100:.2f}%")