# Gesture Classification

Welcome to the IMU gesture classification training script! This was designed to run in Google Colab, but it should (in theory) run in any Jupyter Notebook or Jupyter Lab environment with PyTorch installed.

In Google Colab, select **File > Open notebook** then select the **Upload** tab. Select this file to open it in Colab.

Press **shift + enter** to execute each cell in order. Make sure you stop and read each text section, as there are some manual steps you will need to perform (e.g. upload dataset).

In [None]:
# Install specific versions of the packages
!python3 -m pip install \
    matplotlib=='3.10.0' \
    numpy=='2.0.2' \
    onnxscript=='0.5.7' \
    pandas=='2.2.2' \
    torch=='2.9.0+cpu'

In [None]:
# Import standard libraries
import os
from pathlib import Path
import random
import zipfile

# Import third-party libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

In [None]:
# Print out the versions of the libraries
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"PyTorch version: {torch.__version__}")

In [None]:
# General settings
SEED = 42
GESTURES_ZIP_PATH = Path("/content/01_imu_gestures.zip")
GESTURE_DATASET_PATH = Path("/content/imu_gestures")
COL_NAME_TIMESTAMP = "timestamp"

# Data settings
VAL_SPLIT = 0.2
TEST_SPLIT = 0.2

# Model settings
HIDDEN_1_SIZE = 64
HIDDEN_2_SIZE = 32
DROPOUT = 0.3
BATCH_SIZE = 32
LEARNING_RATE = 0.001
NUM_EPOCHS = 50

# ONNX export settings
ONNX_OPSET_VERSION = 18
ONNX_PATH = Path("/content/model.onnx")

# Calibration data settings
NUM_CALIB_SAMPLES = 100
CALIB_NPZ_PATH = Path("/content/calibration_data.npz")

In [None]:
# Set random seeds for reproducibility
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)

In [None]:
# Define the target compute device (GPU or CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## Prepare Dataset

You will need to manually upload the gesture dataset you created. This should include at least 100 samples (each sample in its own CSV file) for each of the 5 classes (_idle, _other, alpha, beta, gamma).

First, zip your dataset as follows:

```
01_imu_gestures.zip
├─ _idle/
│  ├─ 000000.CSV
│  ├─ 000001.CSV
│  └─ ...
├─ _other/
│  ├─ 000000.CSV
│  ├─ 000001.CSV
│  └─ ...
├─ alpha/
│  ├─ 000000.CSV
│  ├─ 000001.CSV
│  └─ ...
├─ beta/
│  ├─ 000000.CSV
│  ├─ 000001.CSV
│  └─ ...
└─ gamma/
   ├─ 000000.CSV
   ├─ 000001.CSV
   └─ ...
```

On the left side, click the **Folder** icon to expand the file browser tab. Click the **Upload** icon. Select the **01_imu_gestures.zip** file to upload it to this Colab instance.

Once you have uploaded the .zip file, run the following cells to unzip and prepare the dataset.

In [None]:
# Extract the zip file
with zipfile.ZipFile(GESTURES_ZIP_PATH, 'r') as zip_ref:
    zip_ref.extractall(GESTURE_DATASET_PATH)

In [None]:
# Discover class names based on folder names
class_names = sorted([d for d in os.listdir(GESTURE_DATASET_PATH)
                      if os.path.isdir(os.path.join(GESTURE_DATASET_PATH, d))])
print(f"Class names: {class_names}")

In [None]:
# Map an index number to each class name
class_to_idx = {class_name: idx for idx, class_name in enumerate(class_names)}
print(class_to_idx)

In [None]:
# Store data, labels, and channel (column) names
all_data = []
all_labels = []
channel_names = None

# Go through each directory
first_file = True
for class_name in class_names:

    # Construct path to data in a given class
    class_folder = GESTURE_DATASET_PATH / class_name
    csv_files = sorted(os.listdir(class_folder))

    # Read all CSV files
    for csv_file in csv_files:
        if csv_file.endswith('.CSV'):
            csv_path = class_folder / csv_file

            # Read CSV file
            df = pd.read_csv(csv_path)

            # Extract channel (column) names
            if first_file:
                channel_names = df.columns.tolist()
                first_file = False

                # Remove timestamp column name
                channel_names = [c for c in df.columns if c != COL_NAME_TIMESTAMP]

            # Extract the 6 IMU values (3-axis accel and gyro)
            sensor_data = df[channel_names].values

            # Append sample and class index to lists
            all_data.append(sensor_data)
            all_labels.append(class_to_idx[class_name])

# Get the total number of samples loaded
num_samples = len(all_data)
print(f"Loaded {num_samples} total samples")
print(f"Channels: {channel_names}")

In [None]:
# Pair each sample with its associated label
data_label_pairs = list(zip(all_data, all_labels))

# Shuffle the pairs randomly
random.shuffle(data_label_pairs)

# Unzip back into separate lists
shuffled_data, shuffled_labels = zip(*data_label_pairs)
shuffled_data = list(shuffled_data)
shuffled_labels = list(shuffled_labels)

# Calculate split indices
test_end_idx = int(TEST_SPLIT * num_samples)
val_end_idx = int((VAL_SPLIT + TEST_SPLIT) * num_samples)

print(f"Test end index: {test_end_idx}")
print(f"Validation end index: {val_end_idx}")

In [None]:
# The first section of shuffled samples becomes the test set
test_data = shuffled_data[:test_end_idx]
test_labels = shuffled_labels[:test_end_idx]

# The second section of shuffled samples becomes the validation set
val_data = shuffled_data[test_end_idx:val_end_idx]
val_labels = shuffled_labels[test_end_idx:val_end_idx]

# The third section of shuffled samples becomes the training set
train_data = shuffled_data[val_end_idx:]
train_labels = shuffled_labels[val_end_idx:]

print(f"Training set size: {len(train_data)}")
print(f"Validation set size: {len(val_data)}")
print(f"Test set size: {len(test_data)}")

## Examine Data

In most machine learning projects, you will want to examine your data. Look for trends or other interesting features that can help you build a model. Sometimes, you might find that you don't need machine learning after all!

For our case, we're going to look at the basic statistics of the training set and plot one of the samples (so you can see what it looks like).

You'll also want to look for bad or missing data. Sometimes, you'll have to remove bad samples or supplement the data. Our ML model will expect a very specific number of values as input and will break if our data does not match that matrix shape!

In [None]:
# Ensure all the samples have the same length
sample_lengths = [data.shape[0] for data in train_data]
min_len = min(sample_lengths)
max_len = max(sample_lengths)

# Check the sample lengths
print(f"Minimum sample length: {min_len}")
print(f"Maximum sample length: {max_len}")
if min_len != max_len:
    print("Warning: Samples have different lengths!")

In [None]:
# Concatenate all training samples
train_data_concat = np.concatenate(train_data, axis=0)

# Store mean and standard deviation for each sensor channel
means = []
stds = []

# Calculate statistics
for i, channel in enumerate(channel_names):
    mean = np.mean(train_data_concat[:, i])
    std = np.std(train_data_concat[:, i])
    means.append(mean)
    stds.append(std)

# Print statistics
for i, channel in enumerate(channel_names):
    print(f"Channel: {channel}")
    print(f"  Mean: {means[i]:.2f}")
    print(f"  Std dev: {stds[i]:.2f}")
    print()

In [None]:
# Plot a sample
def plot_sample(data):
    """
    Plot a sample
    """
    # Create a figure with 2 subplots (one for accel, one for gyro)
    fig, axes = plt.subplots(2, 1, figsize=(12, 8))

    # Plot accelerometer data
    axes[0].plot(data[:, 0], label="accel_x", color='red')
    axes[0].plot(data[:, 1], label="accel_y", color='green')
    axes[0].plot(data[:, 2], label="accel_z", color='blue')
    axes[0].set_xlabel("Timestep")
    axes[0].set_ylabel("Acceleration (raw)")
    axes[0].set_title("3-Axis Accelerometer Data")
    axes[0].legend(loc='upper right')
    axes[0].grid(True, alpha=0.3)

    # Plot gyroscope data
    axes[1].plot(data[:, 3], label="gyro_x", color='red')
    axes[1].plot(data[:, 4], label="gyro_y", color='green')
    axes[1].plot(data[:, 5], label="gyro_z", color='blue')
    axes[1].set_xlabel("Timestep")
    axes[1].set_ylabel("Angular Velocity (raw)")
    axes[1].set_title("3-Axis Gyroscope Data")
    axes[1].legend(loc='upper right')
    axes[1].grid(True, alpha=0.3)

    # Show plots
    plt.tight_layout()
    plt.show()

In [None]:
# Choose a sample (by index) in the training set
idx = 0

# Plot sample
label = train_labels[idx]
print(f"Label: {label} (Class: {class_names[label]})")
plot_sample(train_data[idx])

In [None]:
# Count the number of times each label appears in the training set
train_label_counts = np.bincount(train_labels)

plt.figure(figsize=(10, 6))
bars = plt.bar(class_names, train_label_counts, color='lightblue')
plt.xlabel("Gesture Class")
plt.ylabel("Number of Samples")
plt.title("Training Set Class Distribution")
plt.grid(axis='y', alpha=0.3)

# Show plot
plt.tight_layout()
plt.show()

## Create a Custom Dataset

PyTorch has a particular way it likes to retrieve samples during training and testing a model. It wants to get items from a custom subclass of the `Dataset` class.

We will accomplish this in two distinct steps:

 1. Normalize all data
 2. Wrap that normalized data in a custom `IMUGestureDataset` class

Neural networks learn better (i.e. converge faster and more reliably) when all the input features are on similar scales. Without normalization/standardization, sensors with large raw values (e.g. like our accelerometers readings of ~2000) would dominate the learning process over smaller values (e.g. our gyroscope readings of ~50). This type of scaling ensures every sensor contributes fairly to the model's predictions.

Some definitions:
 * **Normalization** - Scale all data to a fixed range (usually between 0 and 1)
 * **Standardization** - Scale all data to have a mean of 0 and a standard deviation of 1 (we will use this approach)

 We normalize or standardize the data once before creating the `Dataset` (rather than inside it) so that we only have to perform this operation once instead of repeating every time we retrieve a sample during training.

In [None]:
def standardize_data(data_list, mean, std):
    """
    Standardize data using: (x - mean) / std
    """
    standardize_data = []

    # Ensure we have no divide-by-zero errors (just center the data)
    std_safe = np.where(std == 0.0, 1.0, std)

    # Apply normalization: (data - mean) / std for all channels
    for data in data_list:
        scaled_data = (data - mean) / std_safe
        standardize_data.append(scaled_data)

    return standardize_data

In [None]:
# Standardize all datasets using training statistics
train_data_standardized = standardize_data(train_data, means, stds)
val_data_standardized = standardize_data(val_data, means, stds)
test_data_standardized = standardize_data(test_data, means, stds)

In [None]:
# Choose a sample (by index) in the training set
idx = 0

# Plot standardized sample (notice the DC offset is removed!)
label = train_labels[idx]
print(f"Label: {label} (Class: {class_names[label]})")
plot_sample(train_data_standardized[idx])

In [None]:
class IMUGestureDataset(Dataset):
    """
    Custom PyTorch Dataset for IMU gesture data.
    """
    def __init__(self, data_list, labels_list):
        """
        Initialize the dataset
        """
        self.data = data_list
        self.labels = labels_list

    def __len__(self):
        """
        Returns the length of the dataset
        """
        return len(self.data)

    def __getitem__(self, idx):
        """
        Returns one sample at index idx (as Pytorch tensors)
        """
        # Get the sample and flatten it from (timesteps, sensors) to 1D vector
        sample = self.data[idx]  # Shape: (timesteps, num_sensors)
        sample_flat = sample.flatten()  # Shape: (timesteps * num_sensors,)

        # Convert to PyTorch tensors
        x = torch.FloatTensor(sample_flat)
        y = torch.LongTensor([self.labels[idx]])[0]

        return x, y

In [None]:
# Wrap each standardized dataset in a custom Dataset object
train_dataset = IMUGestureDataset(train_data_standardized, train_labels)
val_dataset = IMUGestureDataset(val_data_standardized, val_labels)
test_dataset = IMUGestureDataset(test_data_standardized, test_labels)

In [None]:
# Get a single sample from the training dataset
idx = 0
x, y = train_dataset[idx]

# Print some info
print(f"Original shape of data: {train_data[idx].shape}")
print(f"New shape of data: {x.shape}")
print(f"Label: {y} (Class: {class_names[y]})")


## Create DataLoaders

A `DataLoader` takes individual samples from a `Dataset` and groups them into batches (e.g. 32 samples at a time), which allows the model to process multiple samples simultaneously for efficient training. We especially see speedup gains when we move to training on GPUs and accelerators, which are highly optimized for parallel computation.

The `DataLoader` also handles shuffling the training data between epochs (to prevent the model from learning patterns based on sample order) and automatically manages edge cases like partial batches at the end of the dataset.

In [None]:
# Create a DataLoader for each of our splits
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [None]:
# Demonstrate how to get one batch from the training DataLoader
sample_batch_x, sample_batch_y = next(iter(train_loader))

# Print the shapes
print(f"Shape of batch: {sample_batch_x.shape}")
print(f"Shape of labels: {sample_batch_y.shape}")

## Build Machine Learning Model

We're going to build a simple 2-layer dense neural network (fully connected layers).

Understanding how to build a model architecture takes time and experience. In many ways, it's the "art" side of machine learning. I settled on 2 layers because (after some tinkering) I discovered that 1 layer did not learn the gestures well and 3 layers started overfitting.

You'll see many applications of the "funnel" or "encoder" pattern where deeper layers have fewer nodes than previous layers. Earlier layers derive low-level features from the data (e.g. acceleration spikes, smooth rotations, oscillations) whereas later layers combine these features into high-level concepts (e.g. "flick" or "circle").

The final layer is composed of the same number of nodes as we have classes. The output of these nodes are the raw prediction values that provide a score (known as a "logit") of how strongly the model thinks the input data belongs to a particular class.

We can feed these logits into a softmax function to scale them between 0 and 1 as well as ensure that they sum to 1. This gives us something like a "probability score" for how closely the model believes the input data belongs to a particular class. Note that the `CrossEntropyLoss` function handles the softmax for us, so we don't explicitly add it to the model.

In [None]:
class SimpleDNN(nn.Module):
    """
    Simple 2-layer Deep Neural Network
    """
    def __init__(
        self,
        input_size,
        hidden_1_size,
        hidden_2_size,
        num_classes,
        dropout
    ):
        """
        Constructor that defines the NN layers.
        """
        super().__init__()

        # Hidden layer 1
        self.fc1 = nn.Linear(input_size, hidden_1_size)
        self.relu1 = nn.ReLU()

        # Randomly disable some outputs of fc1 neurons
        # This can help with overfitting
        self.dropout1 = nn.Dropout(dropout)

        # Hidden layer 2
        self.fc2 = nn.Linear(hidden_1_size, hidden_2_size)
        self.relu2 = nn.ReLU()

        # Randomly disable some outputs of fc2 neurons
        self.dropout2 = nn.Dropout(dropout)

        # Output Layer
        self.fc3 = nn.Linear(hidden_2_size, num_classes)

    def forward(self, x):
        """
        Defines how data flows through the model
        """
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)

        return x

In [None]:
# Get the input shape of a single sample
x, _ = train_dataset[0]
input_size = x.shape[0]

# Initialize the model
model = SimpleDNN(
    input_size=input_size,
    hidden_1_size=HIDDEN_1_SIZE,
    hidden_2_size=HIDDEN_2_SIZE,
    num_classes=len(class_names),
    dropout=DROPOUT
)

# Print model details
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters()
                        if p.requires_grad)
print(model)
print(f"Total parameters: {total_params}")
print(f"Trainable parameters: {trainable_params}")

In [None]:
# Ensure model and data are on same device (CPU or GPU)
model = model.to(device)

In [None]:
# Loss function (measure how wrong the model's predictions are)
loss_fn = nn.CrossEntropyLoss()

# Optimizer (how to adjust the model's weights to reduce loss)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

## Model Training

Training is split into two sections. In the first part, we run a forward pass using one batch of data, calculate the loss, perform a backward pass to calculate the gradients, and then update the model's weights. We repeat this for all the data in the training dataset (one epoch). In the second part, we use the validation data to see how well the model is performing for that epoch.

We repeat these training and validation processes for as many epochs as we defined. We then plot the training and validation loss over time to see how well the model performed at the task of predicting the correct classes.

In [None]:
def train_one_epoch(
    model,
    dataloader,
    loss_fn,
    optimizer,
    device
):
    """
    Train the model for one epoch
    """
    total_loss = 0.0
    num_correct = 0
    total_samples = 0

    # Enable training-specific behaviors (e.g. dropout)
    model.train()

    # Do one full training cycle on a batch of training data
    for batch_x, batch_y in dataloader:
        # Move data to the same device as the model
        batch_x = batch_x.to(device)
        batch_y = batch_y.to(device)

        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(batch_x)

        # Calculate loss
        loss = loss_fn(outputs, batch_y)

        # Backward pass
        loss.backward()

        # Update weights
        optimizer.step()

        # Get total loss for the batch (loss.item() is the average)
        total_loss += loss.item() * batch_x.size(0)
        total_samples += batch_y.size(0)

        # Count how many predictions matched the true labels
        _, predicted = torch.max(outputs, 1)
        num_correct += (predicted == batch_y).sum().item()

    # Calculate averages
    avg_loss = total_loss / total_samples
    accuracy = num_correct / total_samples

    return avg_loss, accuracy

In [None]:
def validate(model, dataloader, loss_fn, device):
    """
    Compute the loss and accuracy on a given dataset
    """
    total_loss = 0.0
    num_correct = 0
    total_samples = 0

    # Disable training-specific behaviors (e.g. dropout)
    model.eval()

    # Do not track gradients during validation
    with torch.no_grad():
        for batch_x, batch_y in dataloader:
            # Move data to the same device as the model
            batch_x = batch_x.to(device)
            batch_y = batch_y.to(device)

            # Forward pass
            outputs = model(batch_x)

            # Calculate loss
            loss = loss_fn(outputs, batch_y)

            # Get total loss for the batch (loss.item() is the average)
            total_loss += loss.item() * batch_x.size(0)
            total_samples += batch_y.size(0)

            # Count how many predictions matched the true labels
            _, predicted = torch.max(outputs, 1)
            num_correct += (predicted == batch_y).sum().item()

    # Calculate averages
    avg_loss = total_loss / total_samples
    accuracy = num_correct / total_samples

    return avg_loss, accuracy

In [None]:
# Lists to store metrics for plotting later
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []

# Training loop
for epoch in range(NUM_EPOCHS):

    # Train for one epoch
    train_loss, train_acc = train_one_epoch(
        model,
        train_loader,
        loss_fn,
        optimizer,
        device
    )

    # Validate
    val_loss, val_acc = validate(
        model,
        val_loader,
        loss_fn,
        device
    )

    # Store metrics
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accuracies.append(train_acc)
    val_accuracies.append(val_acc)

    # Print progress
    print(f"Epoch [{epoch+1:3d}/{NUM_EPOCHS}] | "
            f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:6.2f}% | "
            f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:6.2f}%")

In [None]:
# Create plots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot losses
ax1.plot(train_losses, label="Training Loss")
ax1.plot(val_losses, label="Validation Loss")
ax1.set_xlabel("Epoch")
ax1.set_ylabel("Loss")
ax1.set_title("Training and Validation Loss")
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot accuracies
ax2.plot(train_accuracies, label="Training Accuracy")
ax2.plot(val_accuracies, label="Validation Accuracy")
ax2.set_xlabel("Epoch")
ax2.set_ylabel("Accuracy (%)")
ax2.set_title("Training and Validation Accuracy")
ax2.legend()
ax2.grid(True, alpha=0.3)

# Show plots
plt.tight_layout()
plt.show()

## Evaluate the Model

Once we're happy with the performance on the training and validation sets, it's time to test! We'll use our holdout (test) dataset to see how well the model performs. The good news is that we already have a `validation()` function; we just give it our trained model and test set.

In [None]:
# Evaluate the model on our test set
test_loss, test_accuracy = validate(model, test_loader, loss_fn, device)

# Print out the results
print(f"Test Loss: {test_loss:.3f}")
print(f"Test Accuracy: {test_accuracy:.2f}")

In [None]:
# Get all predictions and true labels from test set
all_predictions = []
all_labels = []

# Put model into evaluation mode
model.eval()

# Do not track gradients
with torch.no_grad():

    # Get batches of data from the test dataset loader
    for batch_x, batch_y in test_loader:
        batch_x = batch_x.to(device)
        batch_y = batch_y.to(device)

        # Get predicted classes (max value of outputs)
        outputs = model(batch_x)
        _, predicted = torch.max(outputs, 1)

        # Move predictions and labels to CPU
        all_predictions.extend(predicted.cpu())
        all_labels.extend(batch_y.cpu())

# Convert predictions and labels to NumPy arrays
all_predictions = np.array(all_predictions)
all_labels = np.array(all_labels)

In [None]:
# Create an empty array for our confusion matrix
num_classes = len(class_names)
conf_matrix = np.zeros((num_classes, num_classes), dtype=int)

# Add the stats to the confusion matrix
for true_label, pred_label in zip(all_labels, all_predictions):
    conf_matrix[true_label, pred_label] += 1

# Print header
print("Confusion Matrix:")
print(f"{'True\\Pred':<12}", end="")
for class_name in class_names:
    print(f"{class_name:>12}", end="")
print()

# Print matrix with row labels
for i, class_name in enumerate(class_names):
    print(f"{class_name:<12}", end="")
    for j in range(num_classes):
        print(f"{conf_matrix[i, j]:>12}", end="")
    print()

In [None]:
# Calculate per-class accuracy
print("\nPer-Class Performance:")
print(f"  {'Class':<15} {'Correct':<10} {'Total':<10} {'Accuracy':<10}")
print(f"  {'-'*45}")

# Print per-class accuracies
for i, class_name in enumerate(class_names):
    correct = conf_matrix[i, i]  # Diagonal elements
    total = conf_matrix[i, :].sum()  # Sum of row
    accuracy = 100 * correct / total if total > 0 else 0
    print(f"  {class_name:<15} {correct:<10} {total:<10} {accuracy:>6.2f}%")

In [None]:
# Print header
print("Metrics per Class:")
print(f"  {'Class':<15} {'Precision':<12} {'Recall':<12} {'F1-Score':<12}")
print(f"  {'-'*51}")

# Calculate per-class metrics
precisions = []
recalls = []
f1_scores = []
for i, class_name in enumerate(class_names):
    # True Positives
    tp = conf_matrix[i, i]

    # False Positives
    fp = conf_matrix[:, i].sum() - tp

    # False Negatives
    fn = conf_matrix[i, :].sum() - tp

    # Precision: TP / (TP + FP)
    if (tp + fp) > 0:
        precision = tp / (tp + fp)
    else:
        precision = 0

    # Recall: TP / (TP + FN)
    if (tp + fn) > 0:
        recall = tp / (tp + fn)
    else:
        recall = 0

    # F1 score: single metric that combines precision (accuracy of
    # positive predictions) and recall (completeness)
    if (precision + recall) > 0:
        f1 = 2 * (precision * recall) / (precision + recall)
    else:
        f1 = 0

    # Store metrics
    precisions.append(precision)
    recalls.append(recall)
    f1_scores.append(f1)

    # Print class metrics
    print(f"  {class_name:<15} {precision:>10.4f}  {recall:>10.4f}  {f1:>10.4f}")

# Calculate simple macro average across all classes
macro_precision = np.mean(precisions)
macro_recall = np.mean(recalls)
macro_f1 = np.mean(f1_scores)

# Print overall (average) metrics
print(f"  {'-'*51}")
print(f"  {'Macro Avg':<15} {macro_precision:>10.4f}  {macro_recall:>10.4f}  {macro_f1:>10.4f}")

# Calculate overall accuracy
overall_accuracy = np.trace(conf_matrix) / conf_matrix.sum() * 100
print(f"\n  {'Overall Accuracy':<15} {overall_accuracy:>6.2f}%")

## Single Sample Inference

At this point, we have a fully trained and evaluated model. If you were writing a paper on some new model architecture or application, you could probably stop here and write up your findings. But when it comes to deploying our model to an embedded system, we're not done!

The next step is to demonstrate how the model performs inference on a single sample, which mimics how it will operate on an embedded device. During deployment, the model will receive one gesture at a time (not batches), so it's important to understand this single-sample inference process. This is also a good time to manually inspect how well the model performs on individual test samples and see the probability scores it assigns to each class.

Note that we'll demonstrate simple inference without softmax (using raw logits) and with softmax. Softmax is computationally expensive on embedded devices, so you can often leave it out for simple classification tasks. Just perform `argmax(logits)` to get the predicted class! Only add softmax if you need calibrated probability scores (e.g. selecting a class based on a confidence threshold).

In [None]:
# Choose an index (in the training set)
idx = 0

# Get a sample
x, y = test_dataset[idx]

# Disable training features
model.eval()

# Add a batch dimension (the model expects it even if batch=1)
x_batch = x.unsqueeze(0)
print(f"Sample shape: {x_batch.shape}")

# Move data to the same device as the model
x_batch = x_batch.to(device)

# Run inference (no gradient tracking)
with torch.no_grad():
    output = model(x_batch)

# Show inference results
print(f"Ground truth label: {y} (Class: {class_names[y]})")
print(f"Raw output (logits): {output[0].cpu().numpy()}")
print(f"Predicted class: {torch.argmax(output[0]).item()}")

In [None]:
# Apply softmax to convert logits to probabilities
probabilities = torch.softmax(output, dim=1)

# Remove batch dimension and convert to NumPy array on CPU
probabilities = probabilities[0].cpu().numpy()

print(f"Class probabilities:")
for i, class_name in enumerate(class_names):
    prob = probabilities[i] * 100
    print(f"  {class_name:12s}: {prob:5.2f}%")

## Export Model

At this point, we're ready to export our model to deploy to the target device. We'll use the Open Neural Network Exchange (ONNX) format, as it is commonly accepted on many frameworks and platforms.

> Note: If you would like to perform inference using the ONNX model directly on a CPU or GPU-based system (e.g. laptop, server, smartphone), check out [ONNX Runtime](https://onnxruntime.ai/).

PyTorch has a built-in exporter. It requires you to feed some dummy input into the model (this can be random values with the expected input shape), and it will trace the data as it flows through the network in order to construct the ONNX model.

We will also need to provide the ONNX *opset version*. This defines a versioned set of operators that the model can use during a forward pass. The opset you export with must be supported by both the ONNX specification and your target compiler/runtime (e.g., RUHMI/Ethos-U). You can get a list of operators and which versions support them [here](https://github.com/microsoft/onnxruntime/blob/main/docs/OperatorKernels.md).

Note that in most cases, the export process will produce 2 separate files:
* **.onnx** - Model architecture and metadata (with references to external weight data)
* **.onnx.data** - Model weights (external data file)

The two files must be kept together in the same directory whenever you download them, load them in tools like Netron, or deploy them to a runtime environment.

In [None]:
# Put the model into evaluation mode
model.eval()

# Create a dummy input tensor with the same shape as one sample (batch=1)
dummy_input = torch.randn(1, input_size).to(device)

# Export to ONNX
torch.onnx.export(
    model,                              # Model to export
    dummy_input,                        # Example input (for tracing)
    ONNX_PATH,                          # Output file path
    export_params=True,                 # Export with trained weights
    opset_version=ONNX_OPSET_VERSION,   # Which operations are supported
    do_constant_folding=True,           # Optimize constant operations
    input_names=['input'],              # Name for input layer
    output_names=['output']             # Name for output layer
)
print("Done!")

## Export Calibration Data

RUHMI needs calibration data to determine the optimal scale and zero-point for converting your float32 weights and activations to int8. By observing real data flowing through the network, it learns which value ranges actually matter and allocates the limited 8-bit precision accordingly. This prevents accuracy loss that would occur from blind quantization without understanding your data distribution.

After executing this cell, download the **.npz** file.

In [None]:
# Don't exceed the total number of available samples
num_samples = min(NUM_CALIB_SAMPLES, len(val_dataset))

# Randomly choose from validation set, even if it's already randomized
indices = random.sample(range(len(val_dataset)), num_samples)

# Get samples (ignore the labels) and convert to NumPy arrays
calib_samples = []
for i in range(num_samples):
    x, _ = val_dataset[i]
    calib_samples.append(x.numpy())

# Stack into a single array: shape (num_samples, input_size)
calib_data = np.stack(calib_samples, axis=0)

# Save samples as NPZ
np.savez(CALIB_NPZ_PATH, input=calib_data)
print(f"Calibration data shape: {calib_data.shape}")
print(f"Saved calibration data to: {CALIB_NPZ_PATH}")

## Generate Helper Code

We're going to generate some C code that we can copy directly into our embedded program to help with the data preprocessing step. Remember how we standardized our training data? Well, we need to do the exact same thing to any new data we capture before feeding it to the model. That includes using the same per-channel values for mean and standard deviation that we calculated from the training set!

Note that this is a simple example, as we'll write our preprocessing functions manually in C. You can get very creative here if you'd like: for example, having Python generate all the required preprocessing functions for you.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
c_code = f"""\
// Sensor channels: {', '.join(channel_names)}
#define NUM_CHANNELS {len(channel_names)}

// Mean for each sensor channel
const float STANDARDIZATION_MEANS[NUM_CHANNELS] = {{
    {', '.join([f'{m:.6f}f' for m in means])}
}};

// Standard deviation for each sensor channel
const float STANDARDIZATION_STD_DEVS[NUM_CHANNELS] = {{
    {', '.join([f'{s:.6f}f' for s in stds])}
}};
"""
print(c_code)


## Deploy!

At this point, you have everything you need to deploy your model to your embedded device. Download the *model.onnx* and *model.onnx.data* files. Use your vendor's toolset (e.g. Renesas RUHMI, LiteRT) to quantize, compress, and compile the model for your target device.

If you'd like to visualize your model, I recommend opening the *model.onnx* file in the [netron.app](https://netron.app/) viewer. This is a great way to see how the layers are connected and how data flows through the model.

You will also want to copy the generated C code (standardization values) to your program so that you can preprocess your data. Remember: anything you did to transform or modify your training data prior to feeding it to your model will have to be done prior to inference! This includes standardizing each sensor channel using the exact mean and standard deviation values from the training set.