### Setup

* Measurements $(x_i,y_i,z_i)$ for $i=1:m$ where $m$ is the number of microphones. For some of the microphone there might not be measurements, which we indicate by $x_i=y_1=z_i=?$
* Redundancy: $x_i^2+y_i^2+z_i^2 = 1$ thus there are only two degrees of freedom. A simple reduction to two is $ x \to x/z$, $y \to y/z$
* Over-determined system: We choose $d=2,3,4$ the number of PCA coordinates to use. 

We want to solve for two sets of variables: the coefficients in front of the $d$ vectors, and the values corresponding to unknown entries in $x$. Assuming we reduced from 3 to 2 coordintes per array, lets indicate the missing coordinates by $u_i,v_i$ and indicate the combined vector by $\vec{U} = \{(x_1,y_1),\ldots,(x_k,y_k),(u_{k+1},v_{k+1}),\ldots,(u_m,v_m)\}$

We want an affine transformation which can be described by a matrix and a shift vector.

$$\vec{r} = \vec{s} + M \vec{U}$$

Here $M$ and $\vec{s}$ are known. The unknowns are the last $m-k$ coordinates of $U$ and the vector $\vec{r}$

### Solving
As this system is not linear, we can't solve in a single step, instead, we solve using two-step iterations.

0. **init** set the unknown coordinates $(u_i,v_i)$ to their mean value.
1. **update $\vec{r}$** using the fully defined $\vec{U}$
2. **update unknown measurements** update the unknown coordinates of $\vec{U}$ according to the current value of $\vec{R}.

Steps 1,2 are repeated until convergence to some tolerance $\epsilon$.


In [21]:
import pickle
import sys
from numpy import linalg as LA
from sklearn.cluster import KMeans
sys.path.append('/home/ardelalegre/SoundMapping/Analysis/Util')
from get_time_interval_matrix_data import get_time_interval_matrix_data
from PCA import get_cdata
from PCA import get_eigen_vectors
from PCA import project_to_eigen_vectors
from Plot import plot_data
import matplotlib.pyplot as plt
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [2]:
training_data = get_time_interval_matrix_data('Sep 29 2020 11:00AM', 'Sep 30 2020 05:00PM')

## Calculate M

In [18]:
data = np.zeros((training_data.shape[0], 12))
data[:, 0] = training_data[:, 1]/training_data[:, 3]
data[:, 1] = training_data[:, 2]/training_data[:, 3]
data[:, 2] = training_data[:, 4]/training_data[:, 6]
data[:, 3] = training_data[:, 5]/training_data[:, 6]
data[:, 4] = training_data[:, 7]/training_data[:, 9]
data[:, 5] = training_data[:, 8]/training_data[:, 9]
data[:, 6] = training_data[:, 10]/training_data[:, 12]
data[:, 7] = training_data[:, 11]/training_data[:, 12]
data[:, 8] = training_data[:, 13]/training_data[:, 15]
data[:, 9] = training_data[:, 14]/training_data[:, 15]
data[:, 10] = training_data[:, 16]/training_data[:, 18]
data[:, 11] = training_data[:, 17]/training_data[:, 18]

  
  This is separate from the ipykernel package so we can avoid doing imports until
  after removing the cwd from sys.path.
  after removing the cwd from sys.path.
  """
  """
  
  
  import sys
  import sys
  
  if __name__ == '__main__':
  if sys.path[0] == '':
  del sys.path[0]


In [50]:
data_mean = np.nanmean(data[:,:],axis = 0,keepdims = True)
tmp = data[:,:] - data_mean
cdata=np.nan_to_num(tmp)


dimensions = cdata.shape[1]
n=cdata.shape[0]
block_size=10000

# calculate covariance matrix
outters = np.zeros((dimensions, dimensions))
for j in range(n):
    outters += np.outer(cdata[j,:], cdata[j,:])

_cov = outters/n

#eigen values
eigen_values, eigen_vectors = LA.eig(_cov)

  """Entry point for launching an IPython kernel.
  
  del sys.path[0]
  del sys.path[0]


LinAlgError: Array must not contain infs or NaNs

In [55]:
# array_sum = np.sum(cdata)
bool_nan = np.isnan(cdata)
a = np.where(bool_nan ==True)

In [56]:
a

(array([], dtype=int64), array([], dtype=int64))

In [60]:
outters

array([[ 4.32747677e+05, -4.53661036e+05,             nan,
                    nan,             nan,             nan,
         1.52492743e+04, -7.44254614e+04,  0.00000000e+00,
         0.00000000e+00,  2.81964397e+04, -4.79030134e+04],
       [-4.53661036e+05,  1.85085710e+06,             nan,
                    nan,             nan,             nan,
         3.19715149e+05,  9.68439051e+04,  0.00000000e+00,
         0.00000000e+00,  1.51773211e+05,  6.34280597e+04],
       [            nan,             nan,             inf,
                    inf,            -inf,            -inf,
                    nan,             nan,  0.00000000e+00,
         0.00000000e+00,             nan,             nan],
       [            nan,             nan,             inf,
                    inf,            -inf,            -inf,
                    nan,             nan,  0.00000000e+00,
         0.00000000e+00,             nan,             nan],
       [            nan,             nan,           

In [58]:
_cov

array([[ 1.31096887e-01, -1.37432395e-01,             nan,
                    nan,             nan,             nan,
         4.61962592e-03, -2.25465018e-02,  0.00000000e+00,
         0.00000000e+00,  8.54184935e-03, -1.45117727e-02],
       [-1.37432395e-01,  5.60699956e-01,             nan,
                    nan,             nan,             nan,
         9.68547329e-02,  2.93379610e-02,  0.00000000e+00,
         0.00000000e+00,  4.59782837e-02,  1.92149412e-02],
       [            nan,             nan,             inf,
                    inf,            -inf,            -inf,
                    nan,             nan,  0.00000000e+00,
         0.00000000e+00,             nan,             nan],
       [            nan,             nan,             inf,
                    inf,            -inf,            -inf,
                    nan,             nan,  0.00000000e+00,
         0.00000000e+00,             nan,             nan],
       [            nan,             nan,           

### SVDImpute

SVDImpute is a method for learning the matrix $M$ from a large set of vectors $\bf{U}$, which contains measurement vector where some of the coordinates might be missing.
Here we iteration between:

1. Estimating $\vec{s}$ and $\vec{M}$
2. Estimating $\vec{r}$ and the unknown components of the vectors in $\vec{U}$