# Tutorial: Using SLDA to simulate Continual Learning Scenarios
## With a frozen pretrained feature extractor
Streaming Linear Discriminant Analysis (SLDA), is a type of generative model that learns a linear classifier over precomputed features from a frozen feature extractor.

SLDA learns a per-class Gaussian distribution with covariance matrix that is shared across all classes. 

### Imports

In [1]:
import sys
sys.path.append('/home/sunilach/openfl/fl_cl_ebm')

In [2]:
import tensorflow as tf
import tensorflow_datasets as tfds

# Config/Options
from config import Decoders
from config import IMG_AUGMENT_LAYERS

# Model/Loss definitions
from models.slda import SLDA
from models import losses
from models.utils import extract_features

# Dataset handling (synthesize/build/query)
from lib.dataset.repository import DatasetRepository
from lib.dataset.utils import as_tuple, decode_example, get_label_distribution
from lib.dataset.synthesizer import synthesize_by_sharding_over_labels

  from .autonotebook import tqdm as notebook_tqdm
2022-11-28 16:35:49.741741: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-28 16:35:50.300638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 17830 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:18:00.0, compute capability: 8.6


### Experiment Options

In [3]:
DATASET = 'cifar10' #' caltech_birds2011'   # If loading a public TensorFlow dataset
# DATASET = '/tmp/repository/vege'  # If loading a local TFRecord dataset

IMG_SIZE = (32, 32)
BATCH_SIZE = 32
SHUFFLE_BUFFER = 16384


### Load the *entire* Dataset
We deal with `tf.data.Dataset` APIs for all our simulations.

The additional argument to note here, is the `decoders`. We supply our custom `Decoders.SIMPLE_DECODER` that partially decodes the data for two main reasons:
1. It only parses `image` and `label` keys from the dataset (we're only dealing with classification problems here).
2. It 'skips' decoding the images to tensors (hence you see it as `tf.string` type). This is for performance reasons. As you'll see, we decode it when we build our data pipeline for training/testing on-the-fly.

In [4]:
"""Load the dataset: Public or Local"""
if tf.io.gfile.isdir(DATASET):
    repo = DatasetRepository(data_dir=DATASET)
    builder = repo.get_builder()  # Builds all versions by default
    ds_info = builder.info
    (raw_train_ds, raw_test_ds) = builder.as_dataset(split=['train', 'test'],
                                                     decoders=Decoders.SIMPLE_DECODER)
else:
    # Load TFDS dataset by name (publicly-hosted on TF)
    (raw_train_ds, raw_test_ds), ds_info = tfds.load(DATASET,
                                                     split=['train', 'test'],
                                                     with_info=True,
                                                     decoders=Decoders.SIMPLE_DECODER)
print('About: ', ds_info)
print('Element Spec: ', raw_train_ds.element_spec)
print('Training samples: ', len(raw_train_ds))
print('Testing samples: ', len(raw_test_ds))

About:  tfds.core.DatasetInfo(
    name='cifar10',
    full_name='cifar10/3.0.2',
    description="""
    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
    """,
    homepage='https://www.cs.toronto.edu/~kriz/cifar.html',
    data_path='/home/sunilach/tensorflow_datasets/cifar10/3.0.2',
    download_size=162.17 MiB,
    dataset_size=132.40 MiB,
    features=FeaturesDict({
        'id': Text(shape=(), dtype=tf.string),
        'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=50000, num_shards=1>,
    },
    citation="""@TECHREPORT{Krizhevsky09learningmultiple,
        author = {Alex Krizhevsky},
        title = {Learn

### Feature Extraction
Let's choose a pretrained backbone to extract features. Since in this experiment we keep the backbone frozen and finetune only a few additional layers, it is much faster to iterate if we compute all features of all images at once.

In [5]:
"""Choose Model backbone to extract features"""
backbone = tf.keras.applications.EfficientNetV2B0(
    include_top=False,
    weights='imagenet',
    input_shape=(*IMG_SIZE, 3),
    pooling='avg'
)
backbone.trainable = False

"""Add augmentation/input layers"""
feature_extractor = tf.keras.Sequential([
    tf.keras.layers.InputLayer(backbone.input_shape[1:]),
    IMG_AUGMENT_LAYERS,
    backbone,
], name='feature_extractor')

feature_extractor.summary()

Model: "feature_extractor"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 augment_layers (Sequential)  (None, 32, 32, 3)        0         
                                                                 
 efficientnetv2-b0 (Function  (None, 1280)             5919312   
 al)                                                             
                                                                 
Total params: 5,919,312
Trainable params: 0
Non-trainable params: 5,919,312
_________________________________________________________________


In [6]:
"""Extract train/test feature embeddings"""
print(f'Extracting train set features')
train_features = extract_features(dataset=(raw_train_ds
                                        .map(decode_example(IMG_SIZE))
                                        .map(as_tuple(x='image', y='label'))
                                        .batch(BATCH_SIZE)
                                        .prefetch(tf.data.AUTOTUNE)), model=feature_extractor)
print(f'Extracting test set features')
test_features = extract_features(dataset=(raw_test_ds
                                        .map(decode_example(IMG_SIZE))
                                        .map(as_tuple(x='image', y='label'))
                                        .batch(BATCH_SIZE)
                                        .prefetch(tf.data.AUTOTUNE)), model=feature_extractor)
print('Features Dataset spec: ', train_features.element_spec)

Extracting train set features


2022-11-28 16:35:54.612019: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100


Extracting test set features
Features Dataset spec:  {'image': TensorSpec(shape=(1280,), dtype=tf.float32, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}


### Creating a Continual Learning Dataset
Now that we have the extracted features, we would like to partition this entire training set into `n` parts, to train our model sequentially, without access to older data.

Each partition holds data from only a selected few classes. In literature, this is known as the 'Class Incremental Learning' setting.

In [7]:
N_PARTITIONS = 5

# This returns a dictionary of partitioned datasets, keyed by partition_id, an integer
partitioned_dataset = synthesize_by_sharding_over_labels(train_features, 
                                                         num_partitions=N_PARTITIONS, 
                                                         shuffle_labels=True)
# Check the label counts of each partition
print('Partitions:', len(partitioned_dataset))
for partition_id in partitioned_dataset:
    dist = get_label_distribution(partitioned_dataset[partition_id])
    print(f'Partition {partition_id}: {dist}')

Partitions: 5
Partition 0: {4: 5000, 6: 5000}
Partition 1: {2: 5000, 7: 5000}
Partition 2: {3: 5000, 5: 5000}
Partition 3: {0: 5000, 9: 5000}
Partition 4: {1: 5000, 8: 5000}


### Define an SLDA Model

In [15]:
# SLDA takes a feature vector, linearly maps it to the output class
model = SLDA(n_components=feature_extractor.output_shape[-1],
             num_classes=ds_info.features['label'].num_classes)

# Compile. No loss/optimizer since it is a gradient-free algorithm
model.compile(metrics=['accuracy'])

model.get_weights()

[array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32),
 array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32),
 array([[10000.,     0.,     0., ...,     0.,     0.,     0.],
        [    0., 10000.,     0., ...,     0.,     0.,     0.],
        [    0.,     0., 10000., ...,     0.,     0.,     0.],
        ...,
        [    0.,     0.,     0., ..., 10000.,     0.,     0.],
        [    0.,     0.,     0., ...,     0., 10000.,     0.],
        [    0.,     0.,     0., ...,     0.,     0., 10000.]],
       dtype=fl

In [9]:
import cloudpickle
import dill

In [10]:
import copy

In [11]:
# model = tf.keras.Sequential([
#     tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
#     tf.keras.layers.MaxPooling2D((2, 2)),
#     tf.keras.layers.BatchNormalization(),
#     tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
#     tf.keras.layers.MaxPooling2D((2, 2)),
#     tf.keras.layers.BatchNormalization(),
#     tf.keras.layers.Flatten(),
#     tf.keras.layers.Dense(10, activation=None),
# ], name='simplecnn')
# model.summary()

In [12]:
# model1 = tf.keras.Sequential()
# model1.add(tf.keras.layers.Dense(8, input_shape=(8,)))
# model1.add(tf.keras.layers.Dense(1))
# # model1.compile(optimizer='sgd', loss='mse')
# # model1 = copy.deepcopy(model1)
# # model1.summary()
# model1.compile()

In [13]:
# filename = 'cp_keras_model1'
# with open(filename, 'wb') as f:
#     cloudpickle.dump(model1, f)

### Train SLDA Model sequentially over each Task

In [16]:
# Build test dataset pipeline
test_ds = (test_features
            .cache()
            .map(as_tuple(x='image', y='label'))
            .batch(BATCH_SIZE)
            .prefetch(tf.data.AUTOTUNE))

# Incrementally train on each partition
# for partition_id in partitioned_dataset:

#     print(f'Training [{partition_id+1}/{len(partitioned_dataset)}]')


    # Build Train Dataset pipeline
train_ds = (partitioned_dataset[partition_id]
            .cache()
            .shuffle(SHUFFLE_BUFFER)
            .map(as_tuple(x='image', y='label'))
            .batch(1)  # SLDA learns 1-sample at a time. Inference can be done on batch.
            .prefetch(tf.data.AUTOTUNE))

# SLDA performs well even on a single pass over the dataset
model.fit(train_ds, epochs=1)
print("Train Acc: ", model.evaluate(train_ds))

Train Acc:  0.9305999875068665


In [None]:
val = model.evaluate(test_ds)

In [None]:
val

In [None]:
print(hist)

In [None]:
model.evaluate(test_ds)

### Summary

Try testing various partition sizes for SLDA. You'll observe the drop in accuracy isn't significant despite multiple tasks.
This is due to the generative nature of LDA.

By learning per-class Gaussians, class-incremental learning problem becomes task-incremental, making it agnostic of the order of classes during training.