In [1]:
import julia; julia.install(quiet=True)
from julia import Main

import holoviews as hv
hv.extension('bokeh', logo=False)

import numpy as np

In [2]:
%load_ext julia.magic

Initializing Julia interpreter. This may take some time...


In [3]:
%%julia
using Statistics, LinearAlgebra

In [4]:
def spikes(data, y_base=0, dims=["Time", "x"], label="Signal", curve=True):
    '''plot stems and a curve'''
    if isinstance(data, tuple):
        t,s=data
    else:
        t=np.arange(0,len(data), 1)
        s=data

    vlines = [ np.array( [[t[i], y_base], [t[i], s[i]]]) for i in range(len(s)) ]

    hs = hv.Path( vlines, kdims=dims, label=label ).opts( show_legend=True, muted_alpha=0., color='black')

    if curve: hs = hs * hv.Curve((t,s), dims[0], dims[1], label=label).opts(line_width=0.8)
    return hs

**Remarks:**
* This notebook uses the Python holoviews library, but performs all computations in Julia
* The exposition closely follows the [MIT 18.06 notebook](https://github.com/mitmath/1806/blob/master/notes/Statistics-and-PCA.ipynb)

<div style="float:center;width:100%;text-align: center;"><strong style="height:60px;color:darkred;font-size:40px;">Mean and Standard Deviation</strong><br><strong style="height:100px;color:darkred;font-size:25px;">formulated with Projections</strong></div>

# 1. Mean and Deviation from the Mean

## 1.1 The Mean of a Set of Samples

Let $o = \begin{pmatrix} 1 \\ 1 \\ \dots \\ 1 \end{pmatrix}\quad$ be a vector in $\mathbb{R}^n$ with all entries equal to one.

Then $\;o^t o = n \quad$ and $\quad P_o = \frac{ o\; o^t }{ o^t o } = \frac{1}{n} o\;o^t\quad$ is the **orthogonal projection** matrix onto the line spanned by $o$.

We can express the mean of a set of values in terms of the vector $o$:

Let $x$ be a vector. The **mean** of the entries in $x$ is $\mu = \frac{ o^t x }{ o^t o }$

We note that $\mu$ is **the coefficient of the orthogonal projection** $P_o = \frac{ o\ o^t }{ o^t o } $
of the vector $x$ onto $\operatorname{span} \{ o \}$:<br>
$\qquad\qquad\qquad P_o x = \frac{ o\ o^t }{ o^t o } x  = \mu o $.

**Remarks:**
* The projection matrix is given by $\quad P_o = \frac{1}{n} \begin{pmatrix} 1 & 1 & \dots & 1 \\
                                                                 1 & 1 & \dots & 1 \\
                                                                   &   & \dots &   \\
                                                                 1 & 1 & \dots & 1 \end{pmatrix}$

* The projected vector $P_o x$ has all entries equal to $\mu$.

In [5]:
%%julia

function μ(x)
    o = ones( length(x) )
    o'x / o'o, o
end

n = 20
x = rand(n)
ave,o = μ(x)
ave, mean(x)  # mean via projection onto o and via builtin function

(0.6010983908520345, 0.6010983908520344)

In [6]:
h=\
spikes( Main.x, y_base=Main.ave, curve=False, dims=["index", "x"] )*\
hv.Scatter( Main.x ).opts(size=5,color='black')*\
hv.HLine(Main.ave).opts(line_width=1)

h.opts(width=500, title='Projection onto Vector of Ones', show_legend=False)

## 1.2 The Deviation from the Mean

The **deviation from the mean** $x - \mu\ o = \left( I - P_o \right) x$ is therefore the orthogonal projection of $x$ onto the kplane orthogonal to the $o$ vector:<br>
it projects the vector $x$ onto the subspace of vectors with zero mean.

In the following, we will set the demeaning projection $P_{perp} =  \left( I - P_o \right)$.

In [7]:
%%julia

P_perp = 1I(n) - 1//n * ones(n,n)

x_mu = P_perp * x;

In [8]:
h=\
spikes( Main.x_mu, curve=False, dims=["index", "x_mu"] )*\
hv.Scatter( Main.x_mu ).opts(size=5,color='black')*\
hv.HLine(0).opts(line_width=1)

h.opts(width=500, title='Deviation from the Mean', show_legend=False)

# 2 Sandard Deviation, Covariance and Correlation

## 2.1 Standard Deviation

The **sample variance** is the mean-square deviation from the mean:

$\qquad
\operatorname{Var}(x) = \frac{1}{n-1}\sum_{k=1}^n (x_k - \mu)^2 = \frac{1}{n-1} \lVert P_{perp}\ x \rVert^2
$

where the denominator $n-1$ is [Bessel's correction](https://en.wikipedia.org/wiki/Bessel%27s_correction).

**Remark:** **Bessel's correction** is often explained with a degrees of freedom argument.
Note that the kplane orthogonal to $o$ has dimension $n-1$<br>
(It is the orthogonal complement of the line spanned by $o$).

The orthonormal eigendecomposition of the projection operator $P_{perp} = Q \Lambda Q^t$ has
an eigenvalue $\lambda_1 = 0$ and $n-1$ eigenvalues $\lambda_i = 1, i=2,3,\dots n$.

Using this eigenvector basis for $x$, i.e., setting $x = Q \tilde{x}$, we have

$\qquad
\lVert P_{perp} x \rVert^2 = \lVert Q \Lambda Q^t Q \tilde{x} \rVert^2 = \lVert Q \Lambda \tilde{x} \rVert^2
=  \lVert \Lambda \tilde{x} \rVert^2 = \tilde{x}_2^2 + \tilde{x}_3^2 + \dots \tilde{x}_n^2 
$

To complete the argument, it suffices to note that the linear combination $\tilde{x} = Q^t x$ of a random variable $x$ is a random variable.

In [9]:
%%julia
# Compute the variance of x by projection, and compare to the builtin function

norm(P_perp * x)^2 / (n-1), var(x)

(0.12673863453019224, 0.1267386345301923)

**Remark:** As always in linear algebra, concise algebraic formulae often do not lend themselves to efficient computation:<br>
$\qquad$ Rather than compute the projection matrix, rewrite the formula as

$\qquad
P_{perp}\ x = \left(I - \frac{o\ o^t}{o^t o}\right) x = x - o\frac{o^T x}{o^T o} = x - \frac{ o \cdot x }{n} o
$

## 2.2 Covariance and Correlation

The **covariance** provides a measure of whether whenever $x$ is grater than it's mean $\mu_x$, $y$ is also greater than it's mean $\mu_y$.

$\qquad
\operatorname{Covar}(x,y) = \frac{1}{n-1}\sum_{k=1}^n (x_k - \mu_x) (y_k - \mu_y) = \frac{(P_{perp} x)^t (P_{perp} y)}{n-1} = \frac{x^t P_{perp} y}{n-1}
$

since $P_{perp}^t = P_{perp}$ and $P_{perp}^2 = P_{perp}$.

In [10]:
%%julia

y = sin.( range(1,2π,length=n) ) .+ 0.2 * rand(n)

# We need to project only one of the sample vectors!
(P_perp * x)' * (P_perp * y ) / (n-1), (P_perp * x)' * y  / (n-1), cov(x,y)

(0.020700636047945094, 0.020700636047945097, 0.020700636047945097)

One problem with the covariance is that it is affected by scaling of the variables and by vector lengths.<br>
Another problem is that the dimension of the covariance is the square of the dimension of $x$.

To address this issue, it is frequently desirable to work with the **correlation**

$\qquad
\operatorname{Cor}(x,y) = \frac{\operatorname{Covar}(x,y)}{\sqrt{\operatorname{Var}(x)  \operatorname{Var}(y)}} = \frac{(P_{perp} x)^t (P_{perp} y)}{\Vert P_{perp} x \Vert \; \Vert P_{perp} y \Vert}
$

In [11]:
%%julia

cor_xy = cor(x,y)
x̃ = P_perp*x
ỹ = P_perp*y
( x̃ / norm(x̃) )' * (ỹ / norm(ỹ)), cor_xy

(0.07728450481928514, 0.07728450481928509)

In [12]:
h=\
hv.Scatter( Main.x, "index", "value")*hv.Scatter( Main.y )
h.opts("Scatter", size=4).opts( "Overlay", width=500, title=f"Two Data Sets, correlation = {Main.round(Main.cor_xy, digits=3)}")

The correlation of the data is hard to judge in the plot above. A better idea is to plot $y$ versus $x$
Let's produce three data sets and compare them.

In [13]:
%%julia

n = 400
t = range(-7.,7.,length=n)
x =     sin.( .2*t ) + .4*rand(n) .+ 3
y = 0.8*cos.( .3*t ) + .6*rand(n)
z =-1.5*sin.( .4*t ) + .8*rand(n)
cor_xy=cor(x,y)
cor_xz=cor(x,z)
cor_yz=cor(y,z);

In [14]:
h=\
hv.Scatter(Main.x,"index","x").opts(title=f"Sample Vector x")+\
hv.Scatter(Main.y,"index","y").opts(title=f"Sample Vector y")+\
hv.Scatter(Main.z,"index","z").opts(title=f"Sample Vector z")
h #.opts(axiswise=True)

In [15]:
h=\
hv.Scatter((Main.x,Main.y),"x","y").opts(title=f"y versus x,  cor = {np.round(Main.cor_xy,3)}")+\
hv.Scatter((Main.x,Main.z),"x","z").opts(title=f"z versus x,  cor = {np.round(Main.cor_xz,3)}")+\
hv.Scatter((Main.y,Main.z),"y","z").opts(title=f"z versus y,  cor = {np.round(Main.cor_yz,3)}")
h.opts(axiswise=True)

**Remark:** $Cor(x,y) = cos \left( \angle P_{perp}x, P_{perp} y \right)$

For highly correlated data $x, y$, the points $\left( x_i, y_i \right)$ will lie close to a line, i.e.,<br>
$\qquad\qquad
P_{perp}\ y \approx \alpha\ P_{perp}\ x \quad \text{ for some value } \alpha.
$

## 2.3 Covariance and Correlation Matrices

In the example above, we had several data sets for which we computed the covariances and correlations.<br>
The computations can be conveniently combined in matrix form.

Consider the matrix $X$ with columns formed from sample vectors. E.g., the previous example would have $X = \left( x \mid y \mid z \right)$.<br>
$X$ has size $n,m$, where $n$ is the size of a sample vector, and $m$ is the number of sample vectors.

* Demeaning the columns of $X$ is accomplished by computing $A = P_{perp}\ X$
* The covariance of all pairs of columns of $X$ is given by the **Covariance matrix** $S = \frac{1}{n-1} A^t A = \frac{1}{n-1} X^t P_{perp}\ X$<br>
This matrix is symmetric. Entry $(i,j)$ for $i \ne j$ is the **covariance** of the sample vectors number $i$ and $j$.<br>
The diagonal entries $i = j$ are the **variances** of the sample vector number $i$.
* The **Correlation matrix** $C = \hat{A}^t \hat{A}$, where $\hat{A}$ is simply the matrix $A$ scaled so that each column has unit length.<br>  i.e. $\hat{A} = AD$, where $D$ is a diagonal matrix whose entries are the inverse of the length of each row,<br>
Thus, $D$ is the inverse square root of the diagonal entries of $A^t A$.<br>
The correlation matrix $C$ is related to the covariance matrix $S$ by $\; C = (n-1) D^t S D$.<br><br>
The factor $(n-1)$ can be absorbed into $D$ so that we can use the covariance matrix $S$ instead:<br>
$\;C = \mathscr{D}^t S \mathscr{D},\;$ where $\mathscr{D}$ is the diagonal matrix with $\mathscr{D}_{i,i} = \frac{1}{\sqrt{S_{i i}}}$. 

In [16]:
%%julia
X = [ x y z ]
A = [ (x .- mean(x)) (y .- mean(y)) (z .- mean(z)) ]  # numerically more efficient then A = P_perp * X
S = 1/(length(x)-1) * A'A
println("The covariance matrix S =")
round.(S, digits=3)

The covariance matrix S =

array([[ 0.463,  0.002, -0.697],
       [ 0.002,  0.182, -0.002],
       [-0.697, -0.002,  1.324]])

In [17]:
%%julia
println("Using the builtin function cov(X), we get the same result:")
round.(cov(X), digits=3)


Using the builtin function cov(X), we get the same result:

array([[ 0.463,  0.002, -0.697],
       [ 0.002,  0.182, -0.002],
       [-0.697, -0.002,  1.324]])

In [18]:
%%julia
# Check
@show S ≈ cov(X);


S ≈ cov(X) = true


In [19]:
%%julia
D = Diagonal( [ 1/norm(A[:,i]) for i in 1:size(A,2)] )
Â = A * D
println("The correlation matrix is C=")
C = Â' * Â
round.(C, digits= 3)

The correlation matrix is C=

array([[ 1.   ,  0.008, -0.89 ],
       [ 0.008,  1.   , -0.004],
       [-0.89 , -0.004,  1.   ]])

In [20]:
%%julia
println("Using the builting function cor(X), we get the same result:")
round.( cor(X), digits=3)


Using the builting function cor(X), we get the same result:

array([[ 1.   ,  0.008, -0.89 ],
       [ 0.008,  1.   , -0.004],
       [-0.89 , -0.004,  1.   ]])

In [21]:
%%julia
println( "Compute the correlation matrix from the covariance matrix:")

Ds = Diagonal(1 ./ sqrt.(diag(S)))
round.( Ds'*S*Ds, digits = 3)


Compute the correlation matrix from the covariance matrix:

array([[ 1.   ,  0.008, -0.89 ],
       [ 0.008,  1.   , -0.004],
       [-0.89 , -0.004,  1.   ]])

In [22]:
%%julia
# Check
@show (Ds'*S*Ds) ≈ cor(X);


true* S * Ds ≈ cor(X) = 