### Current solution

* Measurements $(x_i,y_i,z_i)$ for $i=1:m$ where $m$ is the number of arrays.
* $M$ is the d top eigenvectors from PCA
* $s$ is the shift vector that is 3m dimensional. Mean of the measurements across time.

$$\vec{r} = M (\vec{U} - \vec{s})$$

* Some coordinates of U are ?. We plug in the mean $s$

* Suppose we have a source that does not move, but some of arrays do not detect some of the time. For detecting arrays we have the measured values (x, y and z) and for the rest, we have the mean imputed in the place of (x, y and z).

* Imputed values are not good enough.

* Two sets of unknowns: $\vec{r}$ and the unkown coordinates of $\vec{U}$

* Set unknowns to be the imputed value to obtain $\vec{r}$. 
* Then, keeping $\vec{r}$ fixed, we want to find the missing values of $\vec{U}$.

* After we have an estimate for $\vec{r}$, we want to find the unknown coordinates of $\vec{U}$. that would give you this $\vec{r}$. (Many solutions). We use sum of the squares of the unknown coordinates of U as a regularizer.

How different is r from the current gu
Say we have already subtracted s.  

$\vec{V} = \vec{U} - \vec{s}$



$\|\vec{r} - M \vec{V}\|_{2}^{2} + \|V\|_{2}^{2}$  



* Least mean squared / Ridge regression / Quadratic minimization problem

### Setup

* Measurements $(x_i,y_i,z_i)$ for $i=1:m$ where $m$ is the number of microphones. For some of the microphone there might not be measurements, which we indicate by $x_i=y_1=z_i=?$
* Redundancy: $x_i^2+y_i^2+z_i^2 = 1$ thus there are only two degrees of freedom. A simple reduction to two is $ x \to x/z$, $y \to y/z$
* Over-determined system: We choose $d=2,3,4$ the number of PCA coordinates to use. 

We want to solve for two sets of variables: the coefficients in front of the $d$ vectors, and the values corresponding to unknown entries in $x$. Assuming we reduced from 3 to 2 coordintes per array, lets indicate the missing coordinates by $u_i,v_i$ and indicate the combined vector by $\vec{U} = \{(x_1,y_1),\ldots,(x_k,y_k),(u_{k+1},v_{k+1}),\ldots,(u_m,v_m)\}$

We want an affine transformation which can be described by a matrix and a shift vector.

$$\vec{r} = \vec{s} + M \vec{U}$$

Here $M$ and $\vec{s}$ are known. The unknowns are the last $m-k$ coordinates of $U$ and the vector $\vec{r}$

### Solving
As this system is not linear, we can't solve in a single step, instead, we solve using two-step iterations.

0. **init** set the unknown coordinates $(u_i,v_i)$ to their mean value.
1. **update $\vec{r}$** using the fully defined $\vec{U}$
2. **update unknown measurements** update the unknown coordinates of $\vec{U}$ according to the current value of $\vec{R}.

Steps 1,2 are repeated until convergence to some tolerance $\epsilon$.


U: 12 D vector with x and y.
M: d x 12
s: mean of vectors in PCA space
r: 

U is 12x1.
So M should have 12 columns and 2 (d) rows.


In [1]:
import numpy as np
import pickle
import sys
from numpy import linalg as LA
from sklearn.cluster import KMeans
sys.path.append('/home/ardelalegre/SoundMapping/Analysis/Util')
from get_time_interval_matrix_data import get_time_interval_matrix_data
from PCA import get_cdata
from PCA import get_eigen_vectors
from PCA import project_to_eigen_vectors
from Plot import plot_data
import matplotlib.pyplot as plt

In [2]:
training_data = get_time_interval_matrix_data('Sep 29 2020 11:00AM', 'Sep 30 2020 05:00PM')

In [None]:
+

In [3]:
data = training_data[:,1:]

In [39]:
data[0,0]

nan

### Removing redundancies is not easy. In this notebook, we proceed without doing it.

In [40]:
data_mean = np.nanmean(data,axis = 0,keepdims = True)

tmp = data - data_mean

cdata=np.nan_to_num(tmp)

dimensions = cdata.shape[1]
n=cdata.shape[0]
block_size=10000

# calculate covariance matrix
outters = np.zeros((dimensions, dimensions))
for j in range(n):
    outters += np.outer(cdata[j,:], cdata[j,:])

_cov = outters/n

#eigen values
eigen_values, eigen_vectors = LA.eig(_cov)

  """Entry point for launching an IPython kernel.


### Aug 19 eigvecs

In [42]:
data_0819 = pickle.load(open('/home/ardelalegre/CSE4223-ODAS/preprocessing/python/aug 19/exp_08_19_better_data.p', 'rb'))
ind = [18]+[i for i in range(18)]
data_0819=data_0819[:,ind] 
cdata_0819 = get_cdata(data_0819)
eigen_values_0819, eigen_vectors_0819 = get_eigen_vectors(data_0819) 

1. Get EigVecs corresponding to largest Eigenvalues (18 D vectors)

In [43]:
eig_val_sorted_indices = np.argsort(eigen_values)
eig_val_sorted_indices = eig_val_sorted_indices[-1::-1]
sorted_eigvec = eigen_vectors[:,eig_val_sorted_indices]

In [44]:
sorted_eigvec.shape

(18, 18)

In [49]:
first_two_eig_vecs = sorted_eigvec[:2,:]

In [50]:
first_two_eig_vecs

array([[ 0.0440695 , -0.12640601, -0.06115034, -0.04309325,  0.06564219,
        -0.01143408,  0.06461866,  0.06313246,  0.00161783, -0.47736868,
         0.05929697,  0.01161443, -0.1473491 , -0.13689741,  0.83180222,
         0.        ,  0.        ,  0.        ],
       [ 0.18387933,  0.19523205,  0.34953642,  0.48597118, -0.11975702,
         0.07844055, -0.17280307, -0.1894296 ,  0.07939067,  0.53058225,
        -0.11940499, -0.01470404, -0.00466373, -0.15745007,  0.39545361,
         0.        ,  0.        ,  0.        ]])

In [53]:
M = first_two_eig_vecs
print(M.shape, ": shape of M")

(2, 18) : shape of M


### SVDImpute

SVDImpute is a method for learning the matrix $M$ from a large set of vectors $\bf{U}$, which contains measurement vector where some of the coordinates might be missing.
Here we iteration between:

1. Estimating $\vec{s}$ and $\vec{M}$
2. Estimating $\vec{r}$ and the unknown components of the vectors in $\vec{U}$