In this notebook, we try to provide some calculations that might help clarifying the precision in z-normalize case.

Let us assume $d$ is the precise z-normalized distance between two subsequence, and $\hat{d}$ is the z-normalized distance with imprecision.


$$
\begin{align}
d - \hat{d} ={}& \Delta{d}
\\
\sqrt{2m(1-\rho)} - \sqrt{2m(1 - \hat{\rho})} ={}& \Delta{d}
\\
\sqrt{1-\rho} - \sqrt{1 - \hat{\rho}} ={}& \frac{\Delta{d}}{\sqrt{2m}}
\end{align}
$$

Also, recall that $\rho = \frac{cov}{stddev}$, where `cov` is the covariance between two subsequence and `stddev` is the mutiplication of the standard devations of the two subsequences. Hereafter, it is denoted as `s`. We also assume `s` is exact and has no imprecision.

$\rho = \frac{cov}{s}$

$\hat{\rho} = \frac{\hat{cov}}{s}$


Hence:

$\rho - \hat{\rho} = \frac{\Delta{cov}}{s}$

Now, let us solve the following equations:


$$
\begin{align}
   \begin{cases}
      \sqrt{1-\rho} - \sqrt{1 - \hat{\rho}} ={}& \frac{\Delta{d}}{\sqrt{2m}}
      \\
      \rho - \hat{\rho} = \frac{\Delta{cov}}{s}
    \end{cases}
\end{align}
$$

we define new variables:

$\rho' \triangleq 1 - \rho$; note that $\rho'$ is in range `[0, 2]`

$\hat{\rho}' \triangleq 1 - \hat{\rho}$; note that $\hat{\rho}'$ is in range `[0, 2]`

Hence, we now have:


$$
\begin{align}
   \begin{cases}
      \sqrt{\rho'} - \sqrt{\hat{\rho}'} ={}& \frac{\Delta{d}}{\sqrt{2m}}
      \\
      \hat{\rho}' - \rho' ={}& \frac{\Delta{cov}}{s}
    \end{cases}
\end{align}
$$

And, we define new variables:

$x \triangleq \sqrt{\rho'}$, note that x is in range $[0, \sqrt{2}]$

$y \triangleq \sqrt{\hat{\rho}'}$, note that x is in range $[0, \sqrt{2}]$

Hence:


$$
\begin{align}
   \begin{cases}
      x - y ={}& \frac{\Delta{d}}{\sqrt{2m}}
      \\
      y^{2} - x^{2} ={}& \frac{\Delta{cov}}{s}
    \end{cases}
\end{align}
$$


Thus,


$$
\begin{align}
   \begin{cases}
      x - y ={}& \frac{\Delta{d}}{\sqrt{2m}}
      \\
      (y - x)(y + x) ={}& \frac{\Delta{cov}}{s}
    \end{cases}
\end{align}
$$


By dividing the second equation by the first equation, we can find the equation for `y + x`. Hence:


$$
\begin{align}
   \begin{cases}
      x - y ={}& \frac{\Delta{d}}{\sqrt{2m}}
      \\
      y + x ={}& \frac{\Delta{cov}\sqrt{2m}}{s\Delta{d}}
    \end{cases}
\end{align}
$$


Now, we can solve this system of equations for `x` and `y`. Hence:


$$
\begin{align}
   \begin{cases}
      x  ={}& \frac{1}{2}\left(
      \frac{\Delta{d}}{\sqrt{2m}}
      + 
      \frac{\Delta{cov}\sqrt{2m}}{s\Delta{d}}
      \right)
      \\
      y  ={}& \frac{1}{2}\left(
      \frac{\Delta{cov}\sqrt{2m}}{s\Delta{d}}
      -
      \frac{\Delta{d}}{\sqrt{2m}}
      \right)
    \end{cases}
\end{align}
$$


Now, recall that `x` and `y` are in range $[0, \sqrt{2}]$. Let us work with x:



$$
\begin{align}
0 \le x \le \sqrt{2}
\end{align}
$$



$$
\begin{align}
0 \le 
\frac{1}{2}\left(
      \frac{\Delta{d}}{\sqrt{2m}}
      + 
      \frac{\Delta{cov}\sqrt{2m}}{s\Delta{d}}
      \right)
\le \sqrt{2}
\end{align}
$$

In the STUMPY, we would like to have $|\Delta{d}| \le \epsilon_{d}$, where $\epsilon_{d}$ is set to `1e-5` (because `stumpy.config.STUMPY_TEST_PRECISION = 5`. 

For now, let us assume $\Delta{d}$ is positive. By multiplying the inequality above by $\Delta{d}$, we will have:



$$
\begin{align}
0 \le 
\frac{1}{2}\left(
      \frac{(\Delta{d})^{2}}{\sqrt{2m}}
      + 
      \frac{\Delta{cov}\sqrt{2m}}{s}
      \right)
\le \sqrt{2}\Delta{d}
\end{align}
$$


Also, let us ignore $\frac{(\Delta{d})^{2}}{\sqrt{2m}}$ as it will be a very small value. Hence:




$$
\begin{align}
0 \le 
\frac{1}{2}\left(
      \frac{\Delta{cov}\sqrt{2m}}{s}
      \right)
\le \sqrt{2}\Delta{d}
\end{align}
$$



Let us consider the right inequality: 
(also recall that we would like to have: $\Delta{d} \le \epsilon_{d}$)



$$
\begin{align}
\frac{1}{2}\left(
      \frac{\Delta{cov}\sqrt{2m}}{s}
      \right)
\le \sqrt{2}\Delta{d} \le \sqrt{2}\epsilon_{d}
\end{align}
$$



Now, we can see:


$$
\begin{align}
s \ge \frac{\Delta{cov}\sqrt{m}}{2\epsilon_{d}}
\end{align}
$$



Note that if $\Delta{cov}$ becomes 0, then we have $s \ge 0$, which is obvious because `s` is multiplication of two (non-negative) standard devation. We are interested in maximizing the right hand side so we can modify our code accordingly and make sure it will have no imprecision. It should be reasonable* to consider the highest error `1e-10` for $\Delta{cov}$.

*note: this is based on the observation I got after exploring cov values.

Therefore:


$$
\begin{align}
s \ge \frac{10^{-10}\sqrt{m}}{2(10^{-5})} 
\\
s \ge 10^{-5}\sqrt{\frac{m}{4}}
\end{align}
$$


So, if we get a imprecision of `10e-10` or lower (i.e. better imprecision), then, we would be fine if we have $s \ge s{*}$, where $s^{*} = 10^{-5}\sqrt{\frac{m}{4}}$. 