# Online augmentation

Online augmentaion is a novel approach, which aims at improving the accuracy of generalization of the model. Instead of adding new samples to the training set, online augmentation aims at augmenting each sample from the test set during the inference. N augmented samples and an original one are then classified by the model and majority voting is performed to reveal the final label of a sample. This approach allows the model to 'see' more variations of a sample, and increases the probability of assigning it the correct label.

More detailed description of online augmentation can be found in our [paper](https://arxiv.org/abs/1903.05580).

In [None]:
import os
from keras.models import load_model
from keras.callbacks import EarlyStopping, ModelCheckpoint

from python_research.keras_models import build_1d_model
from python_research.dataset_structures import HyperspectralDataset, BalancedSubset
from python_research.augmentation.transformations import PCATransformation, \
    StdDevNoiseTransformation
from python_research.augmentation.online_augmenter import OnlineAugmenter

DATA_DIR = os.path.join('..', '..', 'hypernet-data')
RESULTS_DIR = os.path.join('..', '..', 'hypernet-data', 'results', 'online_augmentation')
DATASET_PATH = os.path.join(DATA_DIR, '')
GT_PATH = os.path.join(DATA_DIR, '')

os.makedirs(RESULTS_DIR, exist_ok=True)


# Prepare the data

Extract the training, validation and test sets. Trainig set will be balanced (each class will have equal number of samples)

In [None]:
# Number of samples to be extracted from each class as training samples
SAMPLES_PER_CLASS = 300 
# Percentage of the training set to be extracted as validation set 
VAL_PART = 0.1

# Load dataset
test_data = HyperspectralDataset(DATASET_PATH, GT_PATH)

test_data.normalize_labels()
test_data.expand_dims(axis=-1)

# Extract training and validation sets
train_data = BalancedSubset(test_data, SAMPLES_PER_CLASS)
val_data = BalancedSubset(train_data, VAL_PART)


# Data normalization

Data is normalized using Min-Max feature scaling. Min and max values are extracted from train and test sets.

In [None]:
# Normalize data
max_ = train_data.max if train_data.max > val_data.max else val_data.max
min_ = train_data.min if train_data.min < val_data.min else val_data.min
train_data.normalize_min_max(min_=min_, max_=max_)
val_data.normalize_min_max(min_=min_, max_=max_)
test_data.normalize_min_max(min_=min_, max_=max_)


# Build the model

In [None]:
# Number of kernels in the first convolutional layer
KERNELS = 200 
# Size of the kernel in the first convolutional layer
KERNEL_SIZE = 5 
# Number of classes in the dataset
CLASSES_COUNT = 16 
# Number of epochs without improvement on validation set after which the 
# training will be terminated 
PATIENCE = 15 

# Build 1d model
model = build_1d_model((test_data.shape[1:]), KERNELS,
                       KERNEL_SIZE, CLASSES_COUNT)

# Keras Callbacks
early = EarlyStopping(patience=PATIENCE)
checkpoint = ModelCheckpoint(
    os.path.join(RESULTS_DIR, "online_augmentation") + "_model",
    save_best_only=True)

print(model.summary())


# Train the model

In [None]:
# Number of training epochs
EPOCHS = 200 

# Train model
history = model.fit(x=train_data.get_data(),
                    y=train_data.get_one_hot_labels(CLASSES_COUNT),
                    batch_size=64,
                    epochs=EPOCHS,
                    verbose=2,
                    callbacks=[early, checkpoint],
                    validation_data=(val_data.get_data(),
                                     val_data.get_one_hot_labels(CLASSES_COUNT)))
# Load best model
model = load_model(os.path.join(RESULTS_DIR, "online_augmentation") + "_model")

# Calculate test set score without augmentation
test_score = model.evaluate(x=test_data.get_data(),
                            y=test_data.get_one_hot_labels(CLASSES_COUNT))
print("Test set score without online augmentation: {}".format(test_score[1]))


# Test score evaluation

There are four different types of online augmentation implemented:
* __Noise injection__ - For band of a given pixel, a random value from normal distribution with mean = 0 and standard deviation equal to the standard deviation of pixel's class and particular band is drawn, multiplied by scaling factor (a = 0.25) and added to the original value
* __PCA-based augmentation__ - Method based on PCA. In the first step, principal components are calculated on a training set. Then, a pixel under consideration is transformed using previously calculated principal components, first value of the resulting vector is multiplied by a random value from a given range (0.9 - 1.1 on default), and an inverse transformation is performed on such a vector, resulting in an augmented sample.
* __Highlighting/dimming__ - To each band of a given sample, a percentage (10% on default) of an average value of that band (across all samples in the training set) is added (highlighting) or subtracted (dimming)

The augmentation is performed using **`OnlineAugmenter`** class, accepting an objects of type **`Transformation`**, which encapsulates the augmentation logic. The **`Transformation`** objects need to call the `fit` method before using it for augmentation, in order to collect all necessary information about the set. Method `evaluate` performs the test set score evaluations, returing overall accuracy and a list with accuracy for each class separately.


In [None]:

# Example of PCA online augmentation

# Remove last dimension (convert column vectors to row vectors)
train_data.data = train_data.get_data()[:, :, 0]
test_data.data = test_data.get_data()[:, :, 0]
# Initialize a transformation and fit the data 
# (in the case of PCA transformation, it is important 
# to set the argument `n_components` to be equal to the number of bands 
# in the dataset, so that the reverse PCA operation does not lose information.
pca_transformation = PCATransformation(n_components=train_data.shape[-1])
pca_transformation.fit(train_data.get_data())
augmenter = OnlineAugmenter()
test_score, class_accuracy = augmenter.evaluate(model, test_data,
                                                pca_transformation)
print("Test set score with PCA online augmentation: {}".format(test_score))


In [None]:

# Example of noise augmentation

# Alphas argument indicates by what factors each sample will be multiplied, 
# indicating how many times each sample will be augmented.
noise_transformation = StdDevNoiseTransformation(alphas=[0.1, 0.2])
noise_transformation.fit(train_data.get_data())
test_score, class_accuracy = augmenter.evaluate(model, test_data,
                                                noise_transformation)
print("Test set score with noise injection online augmentation: {}".format(test_score))
