# Quantization of embeddings
In similarity search, one often needs to store and process thousands or even millions of high-dimensional embedding vectors. These vectors consume substantial memory, and as the dataset grows, search operations become increasingly slow. To address these challenges, we seek alternative representations of embedding vectors that reduce storage requirements and enhance computational efficiency.

## Motivation
Quantization is a technique that helps to:
- Reduce memory usage by storing embeddings in a compressed form.
- Improve computational efficiency by enabling faster similarit calculations using lower-precision representations.

## Mathematical background
Let $\mathrm{min}, \mathrm{max}\in\mathbb{R}$ and $a, b\in\mathbb{Z}$. Suppose we have a value $x$ in the range $[\mathrm{min}, \min{max}]$, and we want to map it to an integer $\overline{x}$ in the discrete range $[a, b]\cap\mathbb{Z}$. We achieve this as follows: first we define the $\textbf{quantization step size}$ $\Delta = \frac{\mathrm{max} - \mathrm{min}}{b - a}$ and set

$$\tilde{x} = a + \frac{x - \mathrm{min}}{\Delta}\in [a,b].$$

Then we define $\overline{x}$ as the result of the of one of the rounding operators round, ceil or floor on $\tilde{x}$, e.g. 
$$\overline{x} = \operatorname{round}(\tilde{x}) = a  + \operatorname{round}\left(\frac{x-\mathrm{min}}{\Delta}\right).$$

In [None]:
from collections.abc import Callable
import math

min_v = -1
max_v = 1
a = -128
b = 127

delta = (max_v - min_v) / (b - a)

def forward_transformation(x: float, transform: Callable[[float], int] = math.floor) -> int:
    x_tilde = a + (b - a) / (max_v - min_v) * (x - min_v)
    return transform(x_tilde)

The backward transformation is defined as $\hat{x} = (\overline{x} - a) \cdot \Delta + \mathrm{min}$.

In [13]:
def backward_transformation(x: int) -> float:
    return (x - a) * (max_v - min_v) / (b - a) + min_v

## Estimation of the quantization error
Because of the rounding operation, we will not get back the original value. This error is called $\textbf{quantization error}$. For the flooring and ceiling operation we can show that $\vert x - \hat{x}\vert < \Delta$. In particular, with the flooring operation the reconstructed value will be smaller whereas with the ceiling operation the value will be higher.

For the rounding operation we can improve the estimation to $\vert x - \hat{x}\vert \leq \frac{\Delta}{2}$.

<details open>
<summary>Proof</summary>

<b>Flooring</b>\
From the definition of $\overline{x}=a + \lfloor\frac{x - \mathrm{min}}{\Delta}\rfloor$ it follows that
$$
\overline{x} \leq a + \frac{x - \mathrm{min}}{\Delta} < \overline{x} + 1.
$$
Solving for $x$ we see that
$$
\mathrm{min} + (\overline{x} - a)\cdot\Delta \leq x < \mathrm{min} + (\overline{x} - a + 1)\cdot\Delta
$$
and hence by the definition of the backwards transformation it follows that
$$
\hat{x} \leq x < \hat{x} + \Delta.
$$
Consequently, it holds $\vert x - \hat{x}\vert < \Delta$.

<b>Ceiling</b>\
Analogously, we see from the definition $\overline{x} = a + \lceil \frac{x - \mathrm{min}}{\Delta}\rceil$ that
$$
\overline{x} - 1 < a + \frac{x - \mathrm{min}}{\Delta} \leq \overline{x}
$$
and therefore
$$
\mathrm{min} + (\overline{x} - a - 1)\cdot\Delta < x \leq \mathrm{min} + (\overline{x} - a)\cdot\Delta
$$
and hence
$$
\hat{x} - \Delta < x \leq \hat{x}.
$$
In other words, it holds $\vert x - \hat{x}\vert < \Delta$.

<b>Rounding</b>
Similarly, from the definition $\overline{x} = a + \operatorname{round}(\frac{x - \mathrm{min}}{\Delta})$ we see that
$$
\overline{x} - \frac{1}{2} \leq a + \frac{x - \mathrm{min}}{\Delta} \leq \overline{x} + \frac{1}{2}.
$$
Again, it follows that
$$
\mathrm{min} + \left(\overline{x} - a - \frac{1}{2}\right) \leq x \leq \mathrm{min} + \left(\overline{x} - a + \frac{1}{2}\right)
$$
and hence
$$
\hat{x} - \frac{1}{2} \leq x \leq \hat{x} + \frac{1}{2}.
$$
Therefore, it holds $\vert x - \hat{x}\vert \leq \frac{\Delta}{2}$. $\square$
</details>

- $\vert \langle x, y\rangle - \langle\hat{x}, \hat{y}\rangle\vert$
- maximal error estimation -> table

In [14]:
import random
import numpy as np

d = 5

def pnorm(x: list[float], p: int = 2) -> float:
    return math.pow(sum(np.abs(x_i) ** p for x_i in x), 1 / p)

def normalize(x: list[float]) -> list[float]:
    norm = pnorm(x, 2)
    return [x_i / norm for x_i in x]

x = [random.random() for _ in range(d)]
y = [random.random() for _ in range(d)]

x = normalize(x)
y = normalize(y)

x_bar = [forward_transformation(x_i) for x_i in x]
y_bar = [forward_transformation(y_i) for y_i in y]

x_hat = [backward_transformation(x_i) for x_i in x_bar]
y_hat = [backward_transformation(y_i) for y_i in y_bar]

In [15]:
integer_dot_product = sum(x_i * y_i for (x_i, y_i) in zip(x_bar, y_bar))

dot_product_x_part = min_v * delta * (sum(x_i for x_i in x_bar) - d * a) - a * delta ** 2 * sum(x_i for x_i in x_bar)
dot_product_y_part = min_v * delta * (sum(y_i for y_i in y_bar) - d * a) - a * delta ** 2 * sum(y_i for y_i in y_bar)

In [16]:
dot_product = np.dot(x, y)
dot_product_hat = np.dot(x_hat, y_hat)
dot_product_hat_improved = delta ** 2 * integer_dot_product + dot_product_x_part + dot_product_y_part + d * a ** 2 * delta ** 2 + d * min_v ** 2

print("<x,y> =", dot_product)
print("<x_hat,y_hat> = ", dot_product_hat)
print("Improved calculation: ", dot_product_hat_improved)

print("Difference = ", np.abs(dot_product - dot_product_hat))
print("Error estimation = ", delta * (pnorm(x_hat, 1) + pnorm(y_hat, 1)) + d * delta ** 2)
print("Error estimation 2 = ", 2 * delta * math.sqrt(d) + d * delta ** 2)

<x,y> = 0.6603766820913078
<x_hat,y_hat> =  0.6397078046905038
Improved calculation:  0.6397078046905023
Difference =  0.02066887740080403
Error estimation =  0.030265282583621683
Error estimation 2 =  0.035383150127639915


## Efficient calculation of the dot product