In [None]:
import numpy as np
import matplotlib.pyplot as plt
from global_variables import DIM

from simulation import read_results, read_params
from global_variables import DIM
from plotting_tools import add_plot_decoration, generate_labels
from matplotlib.colors import LogNorm
from plotting_tools import plot_noise

%matplotlib inline
%reload_ext autoreload
%autoreload 2

# Noise analysis

In order to analyse algorythms robustness to noise we have to know the scale of measrurements. 
We can see on the plot below, that in the current setup, the distances have a multimodal distribution that is looks like a mixture of gaussians (with the difference that it's only positive, or that the bigger gaussian is squed). The distance wary between $0.1$ and $25$.

For analysis of real distances, see below.

In [None]:
key = 'get_distances'
save_figures = True

resultfolder = 'results/{}/'.format(key)
results = read_results(resultfolder + 'result_')
parameters = read_params(resultfolder + 'parameters.json')

In [None]:
from pylab import rcParams
rcParams['figure.figsize'] = 10, 5

ist, bins, _ = plt.hist(np.sqrt(results['distances']), bins=100)
if save_figures:
    plt.savefig(resultfolder + "simulated_distances.pdf", bbox_inches="tight")
plt.title("simulated distance distribution")
plt.show()
logbins = np.logspace(np.log10(bins[0]),np.log10(bins[-1]),len(bins))
plt.hist(np.sqrt(results['distances']), bins=logbins)
plt.xscale('log')
plt.title("simulated distance distribution (log scale)")
if save_figures:
    plt.savefig(resultfolder + "simulated_distances_log.pdf", bbox_inches="tight")
plt.show()

## Real distance distribution

In [None]:
from evaluate_dataset import read_anchors_df, read_dataset

names =  ['circle2_double.csv', 
             'circle3_triple.csv', 
             'clover.csv',
             'eight2_double.csv', 
             'rounds.csv', 
             'straight1.csv', 
             'straight2.csv', 
             'straight3.csv', 
             'straight4.csv', 
             'straight5.csv', 
             'straight6.csv', 
             'triangle_double.csv']

anchorsfile = 'experiments/anchors.csv'
anchors_df = read_anchors_df(anchorsfile)
anchors = anchors_df[["px", "py", "pz"]].values.T

noisy_distances = []

for name in names:
    datafile = 'experiments/robot_test/' + name
    data_df = read_dataset(datafile, anchors_df)
    noisy_distances.extend(data_df[data_df.system_id=="RTT"]["distance"].values.tolist())
np.savetxt(resultfolder + 'noisy_distances.csv', np.array(noisy_distances))

In [None]:
import pickle
from trajectory import Trajectory
from measurements import get_D_topright

MM = 0.001

with open('controls/robot_trajectores.pkl', 'rb') as f:
    trajectories = pickle.load(f)

distances = []
n_samples = 400
    
for traj in trajectories:
    trajectory = Trajectory(**traj['parameters'])
    # Dont strech trajectories that have hand designed coefficients
    trajectory.scale_bounding_box(traj['box'] * MM, keep_aspect_ratio='coeffs' in traj['parameters'])
    trajectory.center()
    basis = trajectory.get_basis(n_samples=n_samples)
    positions = trajectory.get_sampling_points(basis)
    positions = np.concatenate([positions, np.zeros((1, n_samples))])
    distances_squared = get_D_topright(anchors=anchors, samples=positions)
    distances.extend(np.sqrt(distances_squared.flatten()).tolist())

In [None]:
noisy_distances = np.loadtxt(resultfolder + "noisy_distances.csv")
bins = np.linspace(0.5, 20, 100)
plt.hist(distances, bins=bins, label="true distances", alpha=0.7)
plt.hist(noisy_distances, bins=bins, label="noisy distances", alpha=0.7)
plt.title("distance distribution")
plt.legend()
if save_figures:
    plt.savefig(resultfolder + "distances.pdf", bbox_inches="tight")
plt.show()
logbins = np.logspace(np.log10(bins[0]),np.log10(bins[-1]),len(bins))
plt.hist(distances, bins=logbins, label="true distances", alpha=0.7)
plt.hist(noisy_distances, bins=logbins, label="noisy distances", alpha=0.7)
plt.xscale('log')
plt.title("noisy distance distribution (log scale)")
plt.legend()
if save_figures:
    plt.savefig(resultfolder + "distances_log.pdf", bbox_inches="tight")
plt.show()

Above we can see real distance distribution. Note that the true distances are not realy the true distances, but a simulation where the trajectory is in the middle of the room. I think for our pourposes it's enough. Might be good to calibrate the noisy distances before plotting them here (as it's what we will be using for reconstruction).

We can see at the plots below, that the noisy distances (without callibration) range between $1$ and $20$ meters, where $20$ meters is clearly and outlier. The shape of the disriburion seems to be unimodal (which is good, in general), and match the shape of the bigger of the simulated modes. The true distances fit nicely between $1$ and $8$, and are (more or less) unimodal. The log plot looks shifted, which would suggest that noisy distances are multiplied by some factor (again, maybe I should calibrate distances before plotting).

It seems to me that the simulated distribution is more difficult to work with than the real one (which means that if algorytms below work for simulations, they will work for true data). The only issue is that we don't know the noise distribution of the real data.

## Noise added to distances, basic right inverse
In this case, the noise is added to measurements, what leads to adding `distance x noise` to squared distances (assuming that we can neglect $\sigma^2$). If we use a version of OLS (right inverse), we assume the noise to be gaussian. This might lead to bigger errors. 

Below, we can see reconstruction errors for different number of available measurements and different noise $\sigma$s. For large noise $\sigma = 100$, we can see that for small number of measurements the error is larger than the noise but with oversampling drops to below $10$. It seems that $10\times$ more samples leads to $10$-fold decrease in error.

Notably, the relative error and absolute error have similar values, with relative error having higher variance. The reported errors are on the **coefficients** not distances. **TODO distance errors?**

In [None]:
rcParams['figure.figsize'] = 10, 5
plot_noise('noise_right_inverse', save_figures=True, noise_index=7)

## Noise added to distances, weighted right inverse
In this case, the noise is added to measurements, what leads to adding `distance x noise` to squared distances (assuming we can neglect $\sigma^2$). We can try to normalize the error, by dividing by the (noisy) distance. I divide by the noisy distance $+10^{-3}$, to avoid dividing by really small distances.

Below, we can see reconstruction errors for different number of available measurements and different noise $\sigma$s. For large noise $\sigma = 100$, we can see that for small number of measurements the error is larger than the noise $10^{3}$ but with oversampling drops to below $1$.  It seems that $10\times$ more samples leads to almost $100$-fold decrease in error, **better than OLS**.

In [None]:
plot_noise('noise_right_inverse_weighted', save_figures=True, noise_index=7)

## Noise added to squared distances
In this case the model matches OLS assumptions, so there is no need to use WLS. Here we have, however, a strange behaviour for small number of measurements and large noise. Note however, that additive noise with $\sigma = 100$ is rather large, so we don't need worry about it. For other noise variances, the error decreases roughly linearly with number of measurements, as expected.

In [None]:
plot_noise('noise_to_square_right_inverse', save_figures=True, noise_index=6)