# Formalizing the Distance Matrix Between Two Matrices

Let's formalize the concept of the **distance matrix** between two matrices \( A \) and \( B \).
You can think of this as outer product as well.

### Given:

$$
\begin{aligned}
\text{1.} \quad & \textbf{Matrix } A: \text{ An } m \times d \text{ matrix} \\
& A = \begin{bmatrix}
\mathbf{a}_1 \\
\mathbf{a}_2 \\
\vdots \\
\mathbf{a}_m
\end{bmatrix} \quad \text{where each } \mathbf{a}_i \text{ is a row vector in } \mathbb{R}^d \\
\text{2.} \quad & \textbf{Matrix } B: \text{ An } n \times d \text{ matrix} \\
& B = \begin{bmatrix}
\mathbf{b}_1 \\
\mathbf{b}_2 \\
\vdots \\
\mathbf{b}_n
\end{bmatrix} \quad \text{where each row vector } \mathbf{b}_j \text{ is in } \mathbb{R}^d
\end{aligned}
$$

### Distance Matrix \( D \):

The **distance matrix** $D$ is an $m \times n$ matrix where each element $D_{i,j}$ represents the distance between the $i$-th row vector of \( A \) and the \( j \)-th row vector of \( B \).

$$
D = \begin{bmatrix}
D_{1,1} & D_{1,2} & \cdots & D_{1,n} \\
D_{2,1} & D_{2,2} & \cdots & D_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
D_{m,1} & D_{m,2} & \cdots & D_{m,n}
\end{bmatrix}
$$

### Mathematical Definition of Each Element $D_{i,j}$:

Each element $D_{i,j}$ is defined as the **Euclidean distance** between \( \mathbf{a}_i \) and \( \mathbf{b}_j \):

$$
\begin{aligned}
\text{1.} \quad & D_{i,j} = \| \mathbf{a}_i - \mathbf{b}_j \|_2 \\
& = \sqrt{(a_{i,1} - b_{j,1})^2 + (a_{i,2} - b_{j,2})^2 + \cdots + (a_{i,d} - b_{j,d})^2}
\end{aligned}
$$

Where:
$$
\begin{aligned}
\text{1.} \quad & \mathbf{a}_i = [a_{i,1}, a_{i,2}, \dots, a_{i,d}] \text{ is the } i\text{-th row of matrix } A. \\
\text{2.} \quad & \mathbf{b}_j = [b_{j,1}, b_{j,2}, \dots, b_{j,d}] \text{ is the } j\text{-th row of matrix } B. \\
\text{3.} \quad & \| \cdot \|_2 \text{ denotes the Euclidean (L2) norm.}
\end{aligned}
$$

### Applying to Your Example:

Given your matrices \( A \) and \( B \):

$$
A = \begin{bmatrix}
1 & 2 \\
3 & 4 \\
5 & 6 \\
6 & 7
\end{bmatrix} \quad (4 * 2)
$$

$$
B = \begin{bmatrix}
7 & 8 \\
9 & 10 \\
0 & 1
\end{bmatrix} \quad (3 * 2)
$$

The distance matrix \( D \) will be a \( 4 \times 3 \) matrix where each element \( D_{i,j} \) is calculated as:

$$
D_{i,j} = \sqrt{(A_{i,1} - B_{j,1})^2 + (A_{i,2} - B_{j,2})^2}
$$

For example:

$$
\begin{aligned}
\text{1.} \quad & \text{Distance between } \mathbf{a}_1 = [1, 2] \text{ and } \mathbf{b}_1 = [7, 8] \\
& D_{1,1} = \sqrt{(1-7)^2 + (2-8)^2} = \sqrt{36 + 36} = \sqrt{72} \approx 8.485 \\
\text{2.} \quad & \text{Distance between } \mathbf{a}_2 = [3, 4] \text{ and } \mathbf{b}_3 = [0, 1] \\
& D_{2,3} = \sqrt{(3-0)^2 + (4-1)^2} = \sqrt{9 + 9} = \sqrt{18} \approx 4.243 \\
\end{aligned}
$$

And so on for each element of \( D \).

### Summary:

$$
\begin{aligned}
\text{1.} \quad & \text{Rows of } D \text{ correspond to the rows of matrix } A. \\
\text{2.} \quad & \text{Columns of } D \text{ correspond to the rows of matrix } B. \\
\text{3.} \quad & \text{Each element } D_{i,j} \text{ represents the Euclidean distance between the } i\text{-th row of } A \text{ and the } j\text{-th row of } B.
\end{aligned}
$$

This formalization allows you to understand precisely how each entry in the distance matrix relates to the original data matrices \( A \) and \( B \).


# Question: Diff Between Pairwise Distance and Loss Functions:

If you learned loss function with some basic ml experience. You might wonder the difference between distance and loss. **pairwise distances** and **loss functions** are distinct concepts in the realms of data analysis and machine learning, although they can sometimes intersect or be used together depending on the context. Understanding their differences and how they relate can help clarify their roles in various applications.

## 1. Definitions

### a. Pairwise Distance

**Pairwise distance** refers to the computation of distances between all possible pairs of points in a dataset. It is a fundamental concept in areas like clustering, nearest neighbor searches, and similarity measurements.

- **Purpose:** To quantify the similarity or dissimilarity between data points.
- **Applications:** Clustering algorithms (e.g., K-Means, Hierarchical Clustering), nearest neighbor algorithms (e.g., K-Nearest Neighbors), and visualization techniques (e.g., Multidimensional Scaling).

### b. Loss Function

A **loss function** (also known as a cost function) is a function that measures the discrepancy between the predicted values by a model and the actual target values. It is a crucial component in the training of machine learning models, guiding the optimization process to improve model performance.

- **Purpose:** To quantify how well or poorly a model's predictions align with the actual data.
- **Applications:** Training machine learning models (e.g., Linear Regression, Neural Networks, Support Vector Machines), evaluating model performance.

## 2. Key Differences

| Aspect                 | Pairwise Distance                             | Loss Function                                       |
|------------------------|-----------------------------------------------|-----------------------------------------------------|
| **Primary Purpose**    | Measure similarity/dissimilarity between data points | Measure prediction error to guide model training     |
| **Typical Use Cases**  | Clustering, similarity searches, dimensionality reduction | Model optimization, training, evaluation           |
| **Nature**             | Often used as a feature or metric in algorithms | Used as an objective to minimize or maximize during training |
| **Examples**           | Euclidean distance, Manhattan distance, Cosine similarity | Mean Squared Error (MSE), Cross-Entropy Loss, Hinge Loss |
| **Data shape**           | Will be like matrix shape | Will be mostly like be single numerical value but with exception |


## 3. Relationship and Overlaps

While pairwise distances and loss functions serve different primary purposes, there are scenarios where they intersect:

### a. Distance-Based Loss Functions

Some loss functions are inherently based on distance metrics. In such cases, the concept of pairwise distance is integral to how the loss is computed.

- **Examples:**
  - **Mean Squared Error (MSE):** Measures the average squared Euclidean distance between predicted and actual values.
    
    $$
    \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
    $$
  
  - **Cosine Similarity Loss:** Uses the cosine of the angle between predicted and actual vectors.
  
    $$
    \text{Loss} = 1 - \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}
    $$
  
  - **Contrastive Loss:** Utilizes pairwise distances between pairs of data points to learn embeddings where similar points are closer and dissimilar points are farther apart.
  
    $$
    \text{Loss} = y \cdot D^2 + (1 - y) \cdot \max(0, \text{margin} - D)^2
    $$
    
    where $D$ is the distance between a pair of points, and $y$ indicates whether the pair is similar or dissimilar.
### b. Pairwise Distances in Model Evaluation

Pairwise distances can also be used to evaluate certain aspects of model performance, especially in tasks involving similarity or ranking.

- **Examples:**
  - **k-Nearest Neighbors (k-NN) Classification:** Relies on pairwise distances to classify data points based on their neighbors.
  - **Clustering Validation Metrics:** Use pairwise distances to assess the quality of clusters (e.g., Silhouette Score).



# When to Use Each

a. Use **Pairwise Distances** When:

You need to measure similarity or dissimilarity between data points.
Performing clustering or nearest neighbor searches.
Visualizing data relationships in reduced dimensions.

b. Use **Loss Functions** When:

Training machine learning models to optimize their predictive performance.
Quantifying the error between predictions and actual outcomes.
Guiding the optimization algorithms (e.g., gradient descent) during model training.

In [1]:
import numpy as np
from scipy.spatial import distance_matrix

In [4]:
A = np.array(
    [
        [1, 2],
        [3, 4],
        [5, 6],
        [6, 7]
    ]
)
B = np.array(
    [
        [7, 8], [9, 10], [0, 1]
    ]
)
# A: (4, 2) and B (3, 2) ---> D (4, 3)
D = distance_matrix(A, B) # pairwise distance between matrix A and matrix B
D

# to calculate the pairwise distance a native way is to do triple loop O(n^3) super slow.
# for each row_a in A: # O(n)
#   for each row_b in B: # O(n)
#        dis(row_a, row_b) # iterate through both r_a and r_b O(n)

# see here for related leetcode: https://leetcode.com/problems/dot-product-of-two-sparse-vectors/description/

array([[ 8.48528137, 11.3137085 ,  1.41421356],
       [ 5.65685425,  8.48528137,  4.24264069],
       [ 2.82842712,  5.65685425,  7.07106781],
       [ 1.41421356,  4.24264069,  8.48528137]])

In [9]:
np.argmin(D, axis=1)

array([2, 2, 0, 0])

In [None]:
# see here to see how to leverage: numpy broadcasting and vectorization:
# 1. https://jbencook.com/pairwise-distance-in-numpy/
# 2. https://sparrow.dev/pairwise-distance-in-numpy/
# 3. https://github.com/eth-cscs/PythonHPC/blob/master/numpy/02-broadcasting.ipynb
np.linalg.norm(A[:, None, :] - B[None, :, :], axis=-1)

array([[ 8.48528137, 11.3137085 ,  1.41421356],
       [ 5.65685425,  8.48528137,  4.24264069],
       [ 2.82842712,  5.65685425,  7.07106781],
       [ 1.41421356,  4.24264069,  8.48528137]])