# Learning a quadratic pseudo-metric from distance measurements

Recall that pseudo-metric is a generalization of a metric space in which the distance between two distinct points can be zero.
We are given a set of $N$ pairs of points in $\mathbf{R}^n$, $x_1, \ldots, x_N$, and $y_1, \ldots, y_N$, together with a set of distances $d_1, \ldots, d_N > 0$.
  The goal is to find (or estimate or learn) a quadratic pseudo-metric $d$
  $$d(x,y) =  \left( (x-y)^T P(x-y) \right)^{1/2},$$
  $P\in \mathbf{S}^n_{+}$, which approximates the given distances, i.e., $d(x_i, y_i) \approx d_i$. (The pseudo-metric $d$ is a metric only when $P \succ 0$; when $P\succeq 0$ is singular, it is a pseudo-metric.)
  
  To do this, we will choose $P\in \mathbf{S}^n_+$ that minimizes the mean squared error objective
  
  $$f(S)=\frac{1}{N}\sum_{i=1}^N (d_i - d(x_i,y_i))^2.$$
  
  ### Theoretical part.
  1. Show that the objective function $f$ is convex (Hint: expand the square and see what happens.)
  2. Show that the convex program $\text{minimize }f(S)$, $S\succeq 0$ can be expressed by an equivalent conic program with linear objective and a number of conic constraints using the $R^n_+$ (nonnegative orthant cone), $Q^n$ (second order cone), $Q_r^n$ (rotated second order cone), $S^n_+$ (positive semidefinite cone).
  
  ### Programming Part
  1. Solve the program $\text{minimize }f(S)$, $S\succeq 0$, preferably using a modelling package like ``cvxpy``. Note that "under the hood" your modelling package translates the program to the conic form in point 2. above.
  2. Use the obtained $P$ to measure the mean square error for the test data ``X_test``, ``Y_test``, ``d_test``.
  
---- 
*This exercise originates from "Additional Exercises" collection for Convex Optimization textbook of S. Boyd and L. Vandenberghe. Used under permission*

## Solving the Quadratic Pseudo-Metric Problem: Two Approaches

### Problem Statement

Given:
- \( N \) pairs of points \( (x_i, y_i) \in \mathbb{R}^n \)
- Distances \( d_i > 0 \) for \( i = 1, \ldots, N \)

The goal is to find a positive semidefinite matrix \( P \in \mathbf{S}^n_+ \) that minimizes the sum of the squared errors:
$$ \text{minimize} \quad \sum_{i=1}^N \left( d_i - \sqrt{(x_i - y_i)^T P (x_i - y_i)} \right)^2 $$

### Reformulated Problem

#### Introducing Slack Variables

Define \( t_i \) to represent \(\sqrt{(z_i^T P z_i)}\), where \( z_i = x_i - y_i \):
$$ t_i = \sqrt{(x_i - y_i)^T P (x_i - y_i)} $$

Rewrite the objective function using slack variables \( t_i \) and \( u_i \) to represent \((d_i - t_i)^2\):
$$ u_i = (d_i - t_i)^2 $$

The objective function becomes:
$$ \text{minimize} \quad 1^T \mathbf{u} $$

### Approach 1: Using SOC and RSOC Constraints

#### Objective
$$ \text{minimize} \quad 1^T \mathbf{u} $$

#### Constraints

1. **Second-Order Cone (SOC) Constraint for \( t_i \)**:
   $$ t_i^2 \geq (x_i - y_i)^T P (x_i - y_i), \quad \forall i $$
   This can be written as:
   $$
   \left\| \begin{pmatrix} t_i \\ (x_i - y_i)^T P^{1/2} \end{pmatrix} \right\|_2 \leq t_i
   $$

2. **Rotated Second-Order Cone (RSOC) Constraint for \( u_i \)**:
   To express \( (d_i - t_i)^2 \leq u_i \):
   $$
   \left\| \begin{pmatrix} 2\sqrt{u_i} \\ d_i - t_i \end{pmatrix} \right\|_2 \leq d_i + t_i, \quad \forall i
   $$

3. **Positive Semidefinite Cone (PSD) Constraint**:
   $$ P \succeq 0 $$

4. **Non-Negativity Constraint**:
   $$ u_i \geq 0, \quad t_i \geq 0, \quad \forall i $$

#### Final Formulation for Approach 1

The final conic program is:

$$
\text{minimize} \quad 1^T \mathbf{u}
$$

subject to:

$$
\left\| \begin{pmatrix} t_i \\ (x_i - y_i)^T P^{1/2} \end{pmatrix} \right\|_2 \leq t_i, \quad \forall i \quad \text{(SOC)}
$$

$$
\left\| \begin{pmatrix} 2\sqrt{u_i} \\ d_i - t_i \end{pmatrix} \right\|_2 \leq d_i + t_i, \quad \forall i \quad \text{(RSOC)}
$$

$$
P \succeq 0 \quad \text{(PSD)}
$$

$$
u_i \geq 0, \quad t_i \geq 0, \quad \forall i \quad \text{(Non-Negativity)}
$$

### Approach 2: Using Dual Formulation with SOC and RSOC Constraints

#### Objective
$$ \text{minimize} \quad 1^T \mathbf{u} $$

#### Constraints

1. **Second-Order Cone (SOC) Constraint for \( t_i \)**:
   $$ t_i^2 \geq (x_i - y_i)^T P (x_i - y_i), \quad \forall i $$
   This can be written as:
   $$
   \left\| \begin{pmatrix} t_i \\ (x_i - y_i)^T P^{1/2} \end{pmatrix} \right\|_2 \leq t_i
   $$

2. **Rotated Second-Order Cone (RSOC) Constraint for \( u_i \)**:
   To express \( (d_i - t_i)^2 \leq u_i \):
   $$
   \left\| \begin{pmatrix} 2\sqrt{u_i} \\ d_i - t_i \end{pmatrix} \right\|_2 \leq d_i + t_i, \quad \forall i
   $$

3. **Positive Semidefinite Cone (PSD) Constraint**:
   $$ P \succeq 0 $$

4. **Non-Negativity Constraint**:
   $$ u_i \geq 0, \quad t_i \geq 0, \quad \forall i $$

#### Final Formulation for Approach 2

The final conic program is:

$$
\text{minimize} \quad 1^T \mathbf{u}
$$

subject to:

$$
\left\| \begin{pmatrix} t_i \\ (x_i - y_i)^T P^{1/2} \end{pmatrix} \right\|_2 \leq t_i, \quad \forall i \quad \text{(SOC)}
$$

$$
\left\| \begin{pmatrix} 2\sqrt{u_i} \\ d_i - t_i \end{pmatrix} \right\|_2 \leq d_i + t_i, \quad \forall i \quad \text{(RSOC)}
$$

$$
P \succeq 0 \quad \text{(PSD)}
$$

$$
u_i \geq 0, \quad t_i \geq 0, \quad \forall i \quad \text{(Non-Negativity)}
$$

### Key Annotations for Constraints:
- **SOC (Second-Order Cone Constraint)**: $$\left\| \begin{pmatrix} t_i \\ (x_i - y_i)^T P^{1/2} \end{pmatrix} \right\|_2 \leq t_i$$
- **RSOC (Rotated Second-Order Cone Constraint)**: $$\left\| \begin{pmatrix} 2\sqrt{u_i} \\ d_i - t_i \end{pmatrix} \right\|_2 \leq d_i + t_i$$
- **PSD (Positive Semidefinite Constraint)**: $$P \succeq 0$$
- **Non-Negativity Constraint**: $$u_i \geq 0, \quad t_i \geq 0, \quad \forall i$$

By leveraging these constraints and clearly annotating each to its respective cone, both approaches effectively transform the original optimization problem into a conic optimization problem.


In [1]:
import cvxpy as cp
import numpy as np
from scipy import linalg as la

In [2]:
# In this box we generate the input data

np.random.seed(5680)

n = 5 # Dimension
N = 100 # Number of samples

P = np.random.randn(n,n)
P = P.dot(P.T) + np.identity(n)
sqrtP = la.sqrtm(P)

x = np.random.randn(N,n)
y = np.random.randn(N,n)

d = np.linalg.norm(sqrtP.dot((x-y).T),axis=0)    # distances according to metric P
d = np.maximum(d+np.random.randn(N),0)           # add random noise

N_test = 10 # Samples for test set
X_test = np.random.randn(N_test,n)
Y_test = np.random.randn(N_test,n)
d_test = np.linalg.norm(sqrtP.dot((X_test-Y_test).T),axis=0)  # distances according to metric P
d_test = np.maximum(d_test+np.random.randn(N_test),0)         # add random noise
