Copyright 2022 The TensorFlow Similarity Authors.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# TensorFlow Similarity ArcFace Loss Example

### Notebook goal

This notebook demonstrates how to use ArcFaceLoss implementation of TensorFlow Similarity with standalone usage and to train a `SimilarityModel()` on a fraction of the MNIST classes.

You are going to learn about the main features offered by the `ArcFaceLoss()` and will:

 1. Standalone usage of ArcFaceLoss

 2. Usage with `model.compile()`

 3. 3D-Visualization of ArcFaceLoss 

### Things to try 

Along the way you can try the following things to improve the model performance:
- Adding more "seen" classes at training time.
- Use a larger embedding by increasing the size of the output.
- Add data augmentation pre-processing layers to the model.
- Include more examples in the index to give the models more points to choose from.
- Try a more challenging dataset, such as Fashion MNIST.

In [None]:
import gc
import os

import numpy as np
from matplotlib import pyplot as plt
from tabulate import tabulate

# INFO messages are not printed.
# This must be run before loading other modules.
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"

In [None]:
import tensorflow as tf

In [None]:
import tensorflow_similarity as tfsim  # main package


In [None]:
tfsim.utils.tf_cap_memory()

In [None]:
# Clear out any old model state.
gc.collect()
tf.keras.backend.clear_session()

In [None]:
print("TensorFlow:", tf.__version__)
print("TensorFlow Similarity", tfsim.__version__)

# Standalone Usage of ArcFaceLoss

ArcFace loss alone can be used as follows when it is desired to calculate the additive angular margin loss of the existing data set.

### Initialize Loss function as ArcFaceLoss

In [None]:
loss_fn = tfsim.losses.ArcFaceLoss(num_classes=8, embedding_size=10)

### Create own simple random dataset

In [None]:
labels = tf.Variable([0, 1, 2, 3, 4, 5, 6, 7])
embeddings = tf.Variable(tf.random.uniform(shape=[8, 10]))

In [None]:
print(embeddings)

### Calculate loss

In [None]:
loss = loss_fn(labels, embeddings)

## Data preparation

We are going to load the MNIST dataset to showcase how the model is able to find similar examples from classes unseen during training. The model's ability to generalize the matching to unseen classes, without retraining, is one of the main reason you would want to use metric learning.


**WARNING**: Tensorflow similarity expects `y_train` to be an IntTensor containing the class ids for each example instead of the standard categorical encoding traditionally used for multi-class classification.

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

## Model setup

### Model definition

`SimilarityModel()` models extend `tensorflow.keras.model.Model` with additional features and functionality that allow you to index and search for similar looking examples.

As visible in the model definition below, similarity models output a 64 dimensional float embedding using the `MetricEmbedding()` layers. This layer is a Dense layer with L2 normalization. Thanks to the loss, the model learns to minimize the distance between similar examples and maximize the distance between dissimilar examples. As a result, the distance between examples in the embedding space is meaningful; the smaller the distance the more similar the examples are. 

Being able to use a distance as a meaningful proxy for how similar two examples are, is what enables the fast ANN (aproximate nearest neighbor) search. Using a sub-linear ANN search instead of a standard quadratic NN search is what allows deep similarity search to scale to millions of items. The built in memory index used in this notebook scales to a million indexed examples very easily... if you have enough RAM :)

In [None]:
def get_model():
    inputs = tf.keras.layers.Input(shape=(28, 28, 1))
    x = tf.keras.layers.experimental.preprocessing.Rescaling(1 / 255)(inputs)
    x = tf.keras.layers.Conv2D(32, 3, activation="relu")(x)
    x = tf.keras.layers.Conv2D(32, 3, activation="relu")(x)
    x = tf.keras.layers.MaxPool2D()(x)
    x = tf.keras.layers.Conv2D(64, 3, activation="relu")(x)
    x = tf.keras.layers.Conv2D(64, 3, activation="relu")(x)
    x = tf.keras.layers.Flatten()(x)
    # smaller embeddings will have faster lookup times while a larger embedding will improve the accuracy up to a point.
    outputs = tfsim.layers.MetricEmbedding(64)(x)
    return tfsim.models.SimilarityModel(inputs, outputs)

In [None]:
model = get_model()
model.summary()

### ArcFace Loss definition

Overall what makes Metric losses different from tradional losses is that:
- **They expect different inputs.** Instead of having the prediction equal the true values, they expect embeddings as `y_preds` and the id (as an int32) of the class as `y_true`. 
- **They require a distance.** You need to specify which `distance` function to use to compute the distance between embeddings. `cosine` is usually a great starting point and the default.

ArcFace Loss takes inputs as number of classes which labels includes, and embedding size which we define in model `MetricEmbedding()` layers.

In [None]:
num_classes = np.unique(y_train).size
embedding_size = model.get_layer('metric_embedding').output.shape[1]

In [None]:
loss = tfsim.losses.ArcFaceLoss(num_classes= num_classes, embedding_size=embedding_size, name="ArcFaceLoss")

### Compilation

Tensorflow similarity use an extended `compile()` method that allows you to optionally specify `distance_metrics` (metrics that are computed over the distance between the embeddings), and the distance to use for the indexer.

By default the `compile()` method tries to infer what type of distance you are using by looking at the first loss specified. If you use multiple losses, and the distance loss is not the first one, then you need to specify the distance function used as `distance=` parameter in the compile function.

In [None]:
model.compile(optimizer="sdg", loss=loss, distance="cosine")

## Training

Similarity models are trained like normal models. 

In [None]:
EPOCHS = 10  # @param {type:"integer"}
history = model.fit(x_train, y_train, epochs=EPOCHS, validation_data=(x_test, y_test))

In [None]:
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.legend(["loss", "val_loss"])
plt.title(f"Loss: {loss.name}")
plt.show()

## Prediction

Let's predict some features and visualiza them.

In [None]:
embedded_features = model.predict(x_test, verbose=1)
embedded_features /= np.linalg.norm(embedded_features, axis=1, keepdims=True)

### 3D-Visualization of ArcFace Loss

In [None]:
fig = plt.figure()
ax = Axes3D(fig2)
for c in range(len(np.unique(y_test))):
    ax.plot(embedded_features[y_test==c, 0], embedded_features[y_test==c, 1], embedded_features[y_test==c, 2], '.', alpha=0.1)
plt.title('ArcFace')
plt.show()