# Curve Similarity Measures

## Introduction
Suppose we have a curve representation method, for example some parameterization scheme such as a
spline. We also have a particular shape specified (say the Stanford Bunny) and we want our
parameterization to represent that shape as closely as possible. To represent the given shape one
way would be to tune the parameters of the representation scheme iteratively using a gradient
descent optimization approach. This requires us to define an objective function that can then be
optimized over. An appropriate objective for this task would be some measure of
similarity(or dissimilarity) between the target curve and the one traced out by our parameterization
which can then be maximized(or minimized) to fit the parameters.

The target of this tutorial is to study **curve similarity measures**. We discuss different kinds of
measures and also see their implementations.

<p style="text-align:center;"><img src="a_basic_spline.svg" alt="a basic spline" width="25%"></p>
<p style="text-align:center;"><img src="stanford_bunny.svg" alt="stanford bunny" width="15%"></p>

## Concrete Problem Setup
First we define the different objects that we deal with:
- **_Shape parameterization_**: This is the parameterization scheme that we use to represent our
shapes. We have a set of parameters $\phi$ that represent our shape. By changing $\phi$ we trace out
different curves in the plane. We will think of $\phi$ as a column vector
$[\phi_1, \phi_2, \ldots, \phi_n]^{T}$.

- **_Parameterized Curve_**: This is the curve that is traced out by the parameterization scheme. We
denote it by $C_p$ and is obtained by sampling the scheme at different points along the actual
curve. It is specified in the form of an $N_p$ length sequence of $(x, y)$ points. These points are
ordered along the curve. We will specify the points in a matrix in $\mathbb{R}^{N_p \times 2}$ where
each row corresponds to a point $(x, y)$. We denote the matrix as $X_p$.

- **_Target Curve_**: This is the curve that we want our parameterization scheme to represent. We
denote it by $C_t$ and it is specified in the form of a $N_t$ length sequence of $(x, y)$ points.
These points are ordered along the curve. We will specify the points in a matrix in
$\mathbb{R}^{N_t \times 2}$ as with the parameterized curve. We denote the matrix as $X_t$.

- **_Loss function_**: A function denoted as $\mathcal{L}(X_t, X_p)$ that measures the degree of
dissimilarity between the target curve and the parameterized curve. It should be differentiable to
allow us to find gradients $\frac{d\mathcal{L}}{d\phi}$ that can then be used to run gradient
descent.

**_Goal_**: To tune $\phi$ such that our representation scheme traces out the target curve.

## Similarity Measures
We now discuss the different curve similarity measures. For each measure we describe the exact
mathematical definition, practical considerations, modifications to make them differentiable and
implementations in pytorch.

### Mean Squared Error (MSE)

#### Description
**_Assumption_**: $N_p = N_t = N$. That is, we sample the parameterized curve at exactly $N_t$
points.

The mean squared error loss function computes the average of the squared distance between the
corresponding points on the two curves. Mathematically,
$$
\mathcal{L} = \frac{1}{N} \sum_{i=1}^{N} \left( d(X_{p}^{i}, X_{t}^{i}) \right)^2
$$
where, $d$ is a distance function.

Though the measure is quite naive and not very robust, it is very simple and quick to implement and
is also differentiable without any modifications.

<p style="text-align:center;"><img src="mse_visualization.svg" alt="stanford bunny" width="35%"></p>

#### Implementation

```python
import torch

def mse_loss(X_p, X_t):
    # Calculate the squared Euclidean distances between corresponding rows
    squared_distances = torch.sum((X_p - X_t) ** 2, dim = 1)

    # Calculate mean of the squared distances to get the loss
    loss = torch.mean(squared_distances)

    return loss
```

### Fourier Descriptor Matching

#### Description
The idea behind Fourier descriptor matching is to compute the Fourier coefficients of both the
target and the parameterized curve and then use the difference between them as the loss function.

Concretely, given a curve we can approximate it using a complex Fourier series as follows:
$$
X(t) = \sum_{n = -\infty}^{\infty} c_n e^{n 2 \pi i t} \quad t \in [0, 1)
$$

In Fourier descriptor matching we use the FFT algorithm to compute a finite number of coefficients
$c_n$ for each of the curves which have themselves been sampled(from $X(t)$) at a finite number of
points given by $X_p$ and $X_t$. Let the coefficients be defined in vectors $F_p$ and $F_t$. The
loss is then computed as the mean squared error of the two coefficient vectors. If $k$ is the total
Fourier coefficients computed in each FFT then the loss is given by:
$$
\mathcal{L} = \frac{1}{k} \sum_{i=1}^{k} \left( d(F_{p}^{i}, F_{t}^{i}) \right)^2
$$

**_Note 1_**: Fourier Descriptor Matching works only for **closed curves**.

**_Note 2_**: For the loss to be differentiable we require that the FFT be computed in a way that
allows automatic differentiation to work.

#### Implementation

```python
import torch
import torch.fft

def fourier_descriptor_matching_loss(X_p, X_t, num_descriptors):
    # Compute Fourier transforms (using FFT)
    fft1 = torch.fft.fft(torch.complex(X_p[..., 0], X_p[..., 1]), dim=0)
    fft2 = torch.fft.fft(torch.complex(X_t[..., 0], X_t[..., 1]), dim=0)

    # Select relevant descriptors (low frequencies)
    descriptors1 = fft1[:num_descriptors]
    descriptors2 = fft2[:num_descriptors]

    # Calculate MSE loss on magnitudes or complex values
    loss = torch.mean(torch.abs(descriptors1 - descriptors2)**2)
    return loss
```

The implementation works because the FFT is differentiable in pytorch.

### Hausdorff Distance

#### Description
**_Note_**: Most of the information is taken from the Wikipedia page
[Hausdorff Distance](https://en.wikipedia.org/wiki/Hausdorff_distance).

The Hausdorff distance measures how far two subsets of a metric space are from each other.
Informally, two sets are close in the Hausdorff distance if every point of either set is close to
some point of the other set. The Hausdorff distance is the longest distance someone can be forced to
travel by an adversary who chooses a point in one of the two sets, from where they then must travel
to the other set. In other words, it is the greatest of all the distances from a point in one set to
the closest point in the other set.

Let $(M, d)$ be a metric space. For each pair of non-empty subsets $X \subset M$ and $Y \subset M$,
the Hausdorff distance between $X$ and $Y$ is defined as
$$
d_{\mathrm H}(X,Y) := \max\left\{\,\sup_{x \in X} d(x,Y),\ \sup_{y \in Y} d(X,y) \,\right\}
$$

where $\sup$ represents the supremum operator, $\inf$ the infimum operator, and where
$d(a, B) := \inf_{b \in B} d(a,b)$ quantifies the distance from a point $a \in X$ to the subset $B
\subseteq X$.

<p style="text-align:center;"><img src="Hausdorff_distance.svg" alt="stanford bunny" width="25%">
</p>

#### Differentiability
The Hausdorff distance is by itself not differentiable as we would implement using minimum and
maximum functions. Therefore we need to work with approximations to it such as:
- **_Soft Hausdorff_**: Compute the approximate Hausdorff distance by smoothing the minimum
operation to ensure differentiability. In this approach we use a smooth minimum function instead of
the minimum function directly.
  1. Let $P_1$ and $P_2$ be the sets of points on the two curves.
  2. Use a soft-minimum function to approximate the minimum distance between points, such as:
$$
d_{\text{soft}}(p, P_2) = -\log \left( \sum_{q \in P_2} \exp \left( -\frac{\|p - q\|_2^2}{\tau} \right) \right)
$$
where $\tau$, the temperature parameter, controls the sharpness of the approximation. As $\tau$
approaches 0, the softmin approaches the true minimum.
  3. Compute the smoothed Hausdorff distance:
$$
\text{Hausdorff}_{\text{soft}}(P_1, P_2) = \frac{1}{|P_1|} \sum_{p \in P_1} d_{\text{soft}}(p, P_2) + \frac{1}{|P_2|} \sum_{q \in P_2} d_{\text{soft}}(q, P_1)
$$
- **_Relaxed Hausdorff_**: Another approach is to consider the average distance to the $k$ nearest
neighbors instead of just the single nearest neighbor. This provides some smoothing.

The LogSumExp (LSE) function is a smooth maximum – a smooth approximation to the maximum function.
It is defined as the logarithm of the sum of the exponentials of the arguments:
$$
\mathrm{LSE}(x_1, \ldots, x_n) = \log\left( \exp(x_1) + \cdots + \exp(x_n) \right)
$$

Writing $\mathbf{x} = (x_1, \ldots, x_n)$ the partial derivatives are:
$$
\frac{\partial}{\partial x_i}{\mathrm{LSE}(\mathbf{x})} = 
\frac{\exp x_i}{\sum_j \exp {x_j}}
$$
which means the gradient of LogSumExp is the softmax function.

Also, note that $\min \left( x, y \right) = -\max \left(-x, -y \right)$. We can use this to get the
smooth minimum function using the LogSumExp.

#### Implementation


In [2]:
import torch

def smoothed_hausdorff_distance(P1, P2, sigma=1.0):
    """
    Compute the smoothed Hausdorff distance between two sets of points P1 and P2.
    
    Args:
        P1 (torch.Tensor): A tensor of shape (N1, D), where N1 is the number of points in P1.
        P2 (torch.Tensor): A tensor of shape (N2, D), where N2 is the number of points in P2.
        sigma (float): Controls the sharpness of the soft-minimum operation.
    
    Returns:
        float: The smoothed Hausdorff distance.
    """
    # Compute pairwise squared distances
    dist_matrix = torch.cdist(P1, P2, p=2) ** 2  # Shape: (N1, N2)
    
    # Compute soft-minimum distances
    d_soft_p1_to_p2 = -torch.logsumexp(-dist_matrix / sigma, dim=1)  # Shape: (N1,)
    d_soft_p2_to_p1 = -torch.logsumexp(-dist_matrix.t() / sigma, dim=1)  # Shape: (N2,)
    
    # Average soft-min distances
    hausdorff_p1_to_p2 = d_soft_p1_to_p2.mean()
    hausdorff_p2_to_p1 = d_soft_p2_to_p1.mean()
    
    # Combine the two directions
    hausdorff_soft = hausdorff_p1_to_p2 + hausdorff_p2_to_p1
    
    return hausdorff_soft.item()

# Example usage
P1 = torch.tensor([[0.0, 0.0], [1.0, 0.0], [0.5, 0.5]])
P2 = torch.tensor([[0.0, 1.0], [1.0, 1.0]])
sigma = 0.1

distance = smoothed_hausdorff_distance(P1, P2, sigma)
print(f"Smoothed Hausdorff Distance: {distance}")


Smoothed Hausdorff Distance: 13.095537185668945


In [1]:
import torch

def relaxed_hausdorff(set1, set2, k=3):
    """Computes a relaxed Hausdorff distance."""
    distances = torch.cdist(set1, set2) # pairwise distances
    min_distances, _ = torch.topk(distances, k, dim=1, largest=False)
    relaxed_dist1 = torch.mean(min_distances)

    distances = torch.cdist(set2, set1)
    min_distances, _ = torch.topk(distances, k, dim=1, largest=False)
    relaxed_dist2 = torch.mean(min_distances)
    return torch.max(relaxed_dist1, relaxed_dist2)

# Example usage:
set1 = torch.randn(100, 2) # 100 points in 2D
set2 = torch.randn(150, 2)
loss = relaxed_hausdorff(set1, set2)

In [1]:
print('hello')

hello


In [None]:
print('hello')

: 