# Hyperbolic Learning in Action: Practice

In this notebook, we are going to train, evaluate, and compare three Convolutional Neural Networks (CNNs):

1. an ordinary, fully Euclidean one;
2. one with the last layer in hyperbolic space;
3. a fully hyperbolic network.

We will use:

- the CIFAR-10 and CIFAR-100 datasets, whereas the first is chosen for its simplicity and the second because it exhibits *hierarchical* structure;
- the hyperbolic learning library `HypLL` for the hyperbolic layers, due to its ease of use.

We will visualize data representations in the Euclidean and hyperbolic space.

## Setup

**If you are on Colab or Kaggle, get GPU acceleration and ensure GitHub can be reached**
- Colab:
    1. Click on the dropdown arrow on the right of the menu bar above the notebook, next to "Connect".
    2. Select "Change runtime type".
    3. Choose "T4 GPU" under "Hardware accelerator".
- Kaggle:
    1. Expand the section "Session options" on the right menu sidebar.
    2. Select "GPU P100" under "Accelerator".
    3. Just below, toggle the option "Internet on".

### Environment

Check if the notebook is already in the code repository.

In [1]:
import os

path_parts = os.getcwd().split(os.sep)
repository_path = ""
try:
    repository_index = path_parts.index("hyperbolic-learning-tutorial-code")
    repository_path = os.sep.join(path_parts[: repository_index + 1])
except ValueError:
    pass

Get the repository if needed.

In [2]:
if repository_path == "":
    !git clone https://github.com/Digital-Dermatology/hyperbolic-learning-tutorial-code.git
    %cd hyperbolic-learning-tutorial-code
    repository_path = "hyperbolic-learning-tutorial-code"
else:
    %cd {repository_path}

Cloning into 'hyperbolic-learning-tutorial-code'...
remote: Enumerating objects: 165, done.[K
remote: Counting objects: 100% (165/165), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 165 (delta 70), reused 148 (delta 57), pack-reused 0 (from 0)[K
Receiving objects: 100% (165/165), 31.65 KiB | 2.88 MiB/s, done.
Resolving deltas: 100% (70/70), done.
/content/hyperbolic-learning-tutorial-code


Install requirements.

In [3]:
!pip install --upgrade pip && pip install -r requirements.txt

Collecting pip
  Downloading pip-25.0.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-25.0.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.0.1
Collecting hypll==0.1.1 (from -r requirements.txt (line 1))
  Downloading hypll-0.1.1-py3-none-any.whl.metadata (664 bytes)
Collecting torchinfo==1.8.0 (from -r requirements.txt (line 8))
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Collecting torchmetrics==1.6.1 (from -r requirements.txt (line 9))
  Downloading torchmetrics-1.6.1-py3-none-any.whl.metadata (21 kB)
Collecting umap-learn==0.5.7 (from -r requirements.txt (line 12))
  Downloading umap_learn-0.5.7-py3-none-any.whl.metadata (21 kB)
Collecting nvidia-cuda-n

Add the project's root to the Python path for custom functions.

In [4]:
import sys

sys.path.append(os.path.join(repository_path, "src"))

Set the `torch` device and seeds for reproducibility.

In [5]:
import torch
from src.utils.torch_utils import get_available_device, set_seeds

device = torch.device(get_available_device())
set_seeds(42)

### Data

Get the datasets.

Since this is a demonstration, and it does not use hyperparameter tuning, it is ok to work only with one split for training and one for evaluation, i.e. testing.

In [6]:
import torchvision

transform = torchvision.transforms.Compose(
    [
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(
            mean=(0.5, 0.5, 0.5),
            std=(0.5, 0.5, 0.5),
        ),
    ]
)
train_dataset = torchvision.datasets.CIFAR10(
    root="data", train=True, download=True, transform=transform
)
test_dataset = torchvision.datasets.CIFAR10(
    root="data", train=False, download=True, transform=transform
)

classes = train_dataset.classes
assert test_dataset.classes == classes
num_classes = len(classes)
print(f"Classes in the dataset: {classes}")

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:05<00:00, 32.9MB/s]


Extracting data/cifar-10-python.tar.gz to data
Files already downloaded and verified
Classes in the dataset: ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']


Prepare the data loaders.

The batch size and the number of workers may be adjusted as needed.

In [7]:
batch_size = 128
num_workers = 0

train_dataloader = torch.utils.data.DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers
)
test_dataloader = torch.utils.data.DataLoader(
    test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers
)

## Euclidean Network

### Architecture

Start with a simple Euclidean convolutional network.

To compare with hyperbolic networks without too much pain:

- it has no batch normalization nor skip connections;
- fully connected layers are used at the end instead of e.g. global pooling;
- no transfer learning is used.

Right before classification, reduce the input dimension to 2 to enable embedding visualization.
This will lead to poor performance in Euclidean space, but it will be less of a problem for hyperbolic networks.
The constraint can be relaxed if a dimensionality reduction method such as PCA, t-SNE, or UMAP is used.

In [8]:
from torch.nn import Conv2d, Flatten, Linear, MaxPool2d, ReLU, Sequential

last_channels = 3
conv_channels = (32, 64, 128)
fc_channels = (128, 32, 2)
image_size = (32, 32)
pool_kernel_size = 2
pool_stride = 2
conv_kernel_size = 3

pool = MaxPool2d(kernel_size=pool_kernel_size, stride=pool_stride)
activation = ReLU()
current_image_size = torch.tensor(image_size)
layers = []
for channels in conv_channels:
    layers.append(
        Conv2d(in_channels=last_channels, out_channels=channels, kernel_size=3)
    )
    current_image_size -= conv_kernel_size - 1
    layers.append(activation)
    layers.append(pool)
    current_image_size //= pool_stride
    last_channels = channels
layers.append(Flatten())
last_channels *= current_image_size.prod()
for channels in fc_channels:
    layers.append(
        Linear(in_features=last_channels, out_features=channels)
    )
    layers.append(activation)
    last_channels = channels
layers = layers[:-1]  # remove the last activation
layers.append(Linear(in_features=last_channels, out_features=len(classes)))
euclidean_network = Sequential(*layers)
euclidean_network = euclidean_network.to(device)
euclidean_network

Sequential(
  (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (4): ReLU()
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
  (7): ReLU()
  (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (9): Flatten(start_dim=1, end_dim=-1)
  (10): Linear(in_features=512, out_features=128, bias=True)
  (11): ReLU()
  (12): Linear(in_features=128, out_features=32, bias=True)
  (13): ReLU()
  (14): Linear(in_features=32, out_features=2, bias=True)
  (15): Linear(in_features=2, out_features=10, bias=True)
)

In [9]:
with torch.no_grad():
    for data, labels in test_dataloader:
        outputs = euclidean_network(data.to(device))
        break

In [10]:
outputs.shape

torch.Size([128, 10])

In [11]:
from torchinfo import summary
summary(euclidean_network)

Layer (type:depth-idx)                   Param #
Sequential                               --
├─Conv2d: 1-1                            896
├─ReLU: 1-2                              --
├─MaxPool2d: 1-3                         --
├─Conv2d: 1-4                            18,496
├─ReLU: 1-5                              --
├─MaxPool2d: 1-6                         --
├─Conv2d: 1-7                            73,856
├─ReLU: 1-8                              --
├─MaxPool2d: 1-9                         --
├─Flatten: 1-10                          --
├─Linear: 1-11                           65,664
├─ReLU: 1-12                             --
├─Linear: 1-13                           4,128
├─ReLU: 1-14                             --
├─Linear: 1-15                           66
├─Linear: 1-16                           30
Total params: 163,136
Trainable params: 163,136
Non-trainable params: 0

### Evaluation

Define the metrics for evaluation.

In [12]:
from torchmetrics import MetricCollection
from torchmetrics.classification import MulticlassAccuracy, MulticlassMatthewsCorrCoef

metrics = MetricCollection(
    [
        MulticlassAccuracy(num_classes=num_classes),
        MulticlassMatthewsCorrCoef(num_classes=num_classes),
    ]
)
metrics = metrics.to(device)

Evaluate before training.

In [13]:
def print_metrics(metrics: MetricCollection, prefix: str = "") -> None:
    print(
        prefix,
        {k.replace("Multiclass", ""): v.item() for k, v in metrics.compute().items()},
    )

In [14]:
metrics.reset()
with torch.no_grad():
    for data, labels in test_dataloader:
        outputs = euclidean_network(data.to(device))
        metrics(outputs, labels.to(device))
print_metrics(metrics, "Metrics before training:")

Metrics before training: {'Accuracy': 0.10000000149011612, 'MatthewsCorrCoef': 0}


### Training

In [15]:
from torch.optim import Adam
from tqdm import tqdm
criterion = torch.nn.CrossEntropyLoss()
criterion.to(device)
optimizer = Adam(euclidean_network.parameters(), lr=1e-3)
num_epochs = 10

for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1} of {num_epochs}")
    metrics.reset()
    for data, labels in tqdm(train_dataloader):
        optimizer.zero_grad()
        outputs = euclidean_network(data.to(device))
        labels = labels.to(device)
        metrics(outputs, labels)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print_metrics(metrics, "Train: ")
    metrics.reset()
    with torch.no_grad():
        for data, labels in test_dataloader:
            outputs = euclidean_network(data.to(device))
            metrics(outputs, labels.to(device))
    print_metrics(metrics, "Test: ")

Epoch 1 of 10


100%|██████████| 391/391 [00:16<00:00, 24.25it/s]


Train:  {'Accuracy': 0.2123199999332428, 'MatthewsCorrCoef': 0.13437457382678986}
Test:  {'Accuracy': 0.3272000253200531, 'MatthewsCorrCoef': 0.26243168115615845}
Epoch 2 of 10


100%|██████████| 391/391 [00:13<00:00, 28.10it/s]


Train:  {'Accuracy': 0.3617199957370758, 'MatthewsCorrCoef': 0.29730042815208435}
Test:  {'Accuracy': 0.412200003862381, 'MatthewsCorrCoef': 0.3570978045463562}
Epoch 3 of 10


100%|██████████| 391/391 [00:14<00:00, 27.48it/s]


Train:  {'Accuracy': 0.4330199956893921, 'MatthewsCorrCoef': 0.37413567304611206}
Test:  {'Accuracy': 0.46380001306533813, 'MatthewsCorrCoef': 0.40758007764816284}
Epoch 4 of 10


100%|██████████| 391/391 [00:14<00:00, 27.50it/s]


Train:  {'Accuracy': 0.48256000876426697, 'MatthewsCorrCoef': 0.4276089668273926}
Test:  {'Accuracy': 0.5026000142097473, 'MatthewsCorrCoef': 0.4497363567352295}
Epoch 5 of 10


100%|██████████| 391/391 [00:13<00:00, 28.26it/s]


Train:  {'Accuracy': 0.5313800573348999, 'MatthewsCorrCoef': 0.48130208253860474}
Test:  {'Accuracy': 0.5347999930381775, 'MatthewsCorrCoef': 0.4850894510746002}
Epoch 6 of 10


100%|██████████| 391/391 [00:13<00:00, 28.26it/s]


Train:  {'Accuracy': 0.5748200416564941, 'MatthewsCorrCoef': 0.5291279554367065}
Test:  {'Accuracy': 0.5730000138282776, 'MatthewsCorrCoef': 0.5274108052253723}
Epoch 7 of 10


100%|██████████| 391/391 [00:13<00:00, 28.33it/s]


Train:  {'Accuracy': 0.6035999655723572, 'MatthewsCorrCoef': 0.5606430768966675}
Test:  {'Accuracy': 0.58160001039505, 'MatthewsCorrCoef': 0.538537859916687}
Epoch 8 of 10


100%|██████████| 391/391 [00:13<00:00, 28.22it/s]


Train:  {'Accuracy': 0.6241199970245361, 'MatthewsCorrCoef': 0.5835833549499512}
Test:  {'Accuracy': 0.6003000140190125, 'MatthewsCorrCoef': 0.557213544845581}
Epoch 9 of 10


100%|██████████| 391/391 [00:13<00:00, 28.76it/s]


Train:  {'Accuracy': 0.6444999575614929, 'MatthewsCorrCoef': 0.606185257434845}
Test:  {'Accuracy': 0.5855000019073486, 'MatthewsCorrCoef': 0.545138418674469}
Epoch 10 of 10


100%|██████████| 391/391 [00:13<00:00, 28.93it/s]


Train:  {'Accuracy': 0.6613399982452393, 'MatthewsCorrCoef': 0.6251767873764038}
Test:  {'Accuracy': 0.6172999739646912, 'MatthewsCorrCoef': 0.5763481855392456}


### Visualization

Get the embeddings to visualize them.

In [16]:
embeddings, predictions, labels = [], [], []
with torch.no_grad():
    for data, labels_batch in test_dataloader:
        embeddings_batch = euclidean_network[:-1](data.to(device))
        predictions_batch = euclidean_network[-1](embeddings_batch).argmax(dim=-1)
        embeddings.append(embeddings_batch)
        predictions.append(predictions_batch)
        labels.append(labels_batch)
embeddings = torch.cat(embeddings, dim=0)
predictions = torch.cat(predictions, dim=0)
labels = torch.cat(labels, dim=0)

Save the plot to HTML to avoid overloading the notebook.

In [17]:
import pandas as pd
import plotly.express as px
df = pd.DataFrame(embeddings.cpu().numpy())
df["prediction"] = predictions.cpu().numpy()
df["label"] = [classes[i] for i in labels.cpu().numpy()]
fig = px.scatter(data_frame=df, x=0, y=1, color="label")
fig.write_html("euclidean.html")

## Last hyperbolic layer

Now it is time to roll up sleeves. Enjoy hacking!

1. Define the hyperbolic manifold using [`hypll.manifolds.poincare_ball.PoincareBall`](https://hyperbolic-learning-library.readthedocs.io/en/latest/_autosummary/hypll.manifolds.poincare_ball.manifold.html) with a trainable curvature parameter [`hypll.manifolds.poincare_ball.Curvature`](https://hyperbolic-learning-library.readthedocs.io/en/latest/_autosummary/hypll.manifolds.poincare_ball.curvature.html).

In [20]:
from hypll.manifolds.poincare_ball import PoincareBall, Curvature

curvature = Curvature(value=1.0, requires_grad=True)
manifold = PoincareBall(c=curvature)

2. Starting with the Euclidean network, just before classification, lift the representation to hyperbolic space by constructing a [`hypll.tensors.TangentTensor`](https://hyperbolic-learning-library.readthedocs.io/en/latest/_autosummary/hypll.tensors.tangent_tensor.html) and using `PoincareBall`'s exponential map.
    - Hint: you may also use the convenience layer [`src.layers.to_manifold.ToManifold`](https://github.com/Digital-Dermatology/hyperbolic-learning-tutorial-code/blob/main/src/layers/to_manifold.py) or take inspiration from it.
3. Obtain the logits by replacing the linear classification layer of the Euclidean network with the calculation of the distances from (learned) hyperbolic hyperplanes.
    - This operation, known as Hyperbolic Multinomial Logistic Regression, is implemented in [`src.layers.hmlr.HMLR`](https://github.com/Digital-Dermatology/hyperbolic-learning-tutorial-code/blob/main/src/layers/hmlr.py), feel free to use it directly or as a guide.

In [None]:
last_hyperbolic_network = ...

4. Replace the Adam optimizer with Riemannian Adam from [`hypll.optim.RiemannianAdam`](https://hyperbolic-learning-library.readthedocs.io/en/latest/_autosummary/hypll.optim.adam.html).

In [None]:
riemannian_optimizer = ...

5. Train the network for 10 epochs.

In [None]:
for epoch in range(num_epochs):
    ...

6. Visualize the embeddings with their labels.

In [None]:
embeddings, predictions, labels = [], [], []
with torch.no_grad():
    for data, labels_batch in test_dataloader:
        embeddings_batch = ...
        predictions_batch = ...

In [None]:
df = ...

7. Compare the training time, final performance, and representations with the euclidean ones!

## Fully hyperbolic network

Exercise 2:

1. Define the hyperbolic manifold as in Exercise 1.
2. Immediately after getting data from the `DataLoader`, lift it to the `PoincareBall` as in the previous exercise.
3. Build a fully hyperbolic backbone using the layers `HLinear`, `HConv2D`, `HPool2D`, and `HReLU` from `hypll.nn`.
4. Add the classification layer at the end using `src.layers.hmlr.HMLR`.

## Optional: CIFAR-100

If you got this far, well done!!

You should repeat the exercise with CIFAR-100, which has a more hierarchical structure, to see the benefits of hyperbolic learning for real.