### The goal is to approximate a 2d interpolation funciton by using 1d interpolation funcitons

In [60]:
import sys
sys.path.insert(0,'..')
from collections import namedtuple
from typing import List

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
from algorithms import create_interp_1d_funcs, create_interp_2d_funcs, sigma_f1d, sigma_f2d
from data_utils import create_1d_data, create_2d_data
%matplotlib notebook


## Test approximate vs. 2d interpolation

In [231]:
        def f(x):
            return np.array([datum[0]*np.sin(datum[1]) + datum[1] + datum[2] for datum in x]).reshape(-1, 1)

        # ----creating training data----
        primary_cut_center = [0, 0, 0]
        x_1d = create_1d_data(x_range=(-8, 8, 1), cut_center=primary_cut_center)
        x_2d = create_2d_data(x1_range=(-8, 8, 1), x2_range=(-8, 8, 1), cut_center=primary_cut_center)
        y_1d = f(x_1d)
        y_2d = f(x_2d)

        # ----creating the interpolation functions----
        f0 = f(np.array([primary_cut_center]))
        f_1d = create_interp_1d_funcs(x_1d, y_1d, primary_cut_center)
        f_2d = create_interp_2d_funcs(x_2d, y_2d, primary_cut_center)

        # ----Testing----
        test_datum = [1.45, 2.2823, 0.2782]

In [232]:
# ---- the negative part of sigma2d for f01-----
neg_part = -(f_1d[0](test_datum[0]) - f0) - (f_1d[1](test_datum[1]) - f0) - f0
print(f'net_part of sigma_2d for f01: {neg_part}')

net_part of sigma_2d for f01: [[-2.2823]]


In [233]:
# ----testing the first 2d interpolated funciton --> f01----

f01_interpolated = f_2d[0][0](test_datum[0], test_datum[1])
print(f'f01_interpolated: {f01_interpolated}')

f01_interpolated: [3.37773063]


In [236]:
# ----approximating f01----
secondary_cut_center = [1, 1 ,primary_cut_center[2]] # or different cutcenter altoghether? TESTED: Z has to be the same as primary cut_center
secondary_x_1d = create_1d_data(x_range=(-8, 8, 1), cut_center=secondary_cut_center)
secondary_y_1d = f(secondary_x_1d)
secondary_f_1d = create_interp_1d_funcs(secondary_x_1d, secondary_y_1d, secondary_cut_center)
secondary_f0 = f(np.array([secondary_cut_center]))

f01_approximated = secondary_f0 + (secondary_f_1d[0](test_datum[0]) - secondary_f0) + (secondary_f_1d[1](test_datum[1]) - secondary_f0) 
print(f'f01_approximated: {f01_approximated}')

f01_approximated: [[3.42484694]]


Surprizingly changing the **x and y** coordinate of the **secondary_cut_center** does **NOT** change the **f01_approximated**. This is the same as saying:

$$F_{01}(x, y, \hat{z}) = F(\hat{x}, \hat{y}, \hat{z}) + [F(x, \hat{y}, \hat{z}) - F(\hat{x}, \hat{y}, \hat{z})] +  [F(\hat{x}, y, \hat{z}) - F(\hat{x}, \hat{y}, \hat{z})]$$ 
is the same as:
$$F_{01}(x, y, \hat{z}) = F(\bar{x}, \bar{y}, \hat{z}) + [F(x, \bar{y}, \hat{z}) - F(\bar{x}, \bar{y}, \hat{z})] +  [F(\bar{x}, y, \hat{z}) - F(\bar{x}, \bar{y}, \hat{z})]$$

where primary cut center: $$(\hat{x}, \hat{y}, \hat{z})$$ 
and secondary cut center: $$(\bar{x}, \bar{y}, \hat{z})$$

**Turns out this is only the case if the x, and y variables are decouples (are linear) in the F function**

# Automation
Now we try to automate the approximation part

The most important task seems to be the orgonization of the data that is used to create the 1D interpolation functions.
I am going to think of this task as follows:
1. There are two cut centers CC1 and CC2
2. Each CC has n dimentions
3. We need data that lies on each of the axis of the two cut centers (imagine each cut center as origin point of a coordinate system)
4. Hence we need 2*n set of points
5. Then for the original 1D interpolation functions such as (e.g. n=3, CC1=$(\hat{x},\hat{y},\hat{z})$ ) $f(x,\hat{y},\hat{z})$ we need a set of three dimensional points with their first dimension vary around $\hat{x}$, the second dimension is set to $\hat{y}$ and their third dimension is set to $\hat{z}$
6. And hence for an approximation 1D interpolation such as (e.g. n=3, CC1=$(\hat{x},\hat{y},\hat{z})$ and CC2=$(\bar{x},\bar{y},\bar{z})$ ) $f(x,\bar{y},\hat{z})$ we need a set of three dimensional data points with their first dimension vary around $\bar{x}$, the second dimension fixed at $\bar{y}$ and their third dimension set to $\hat{z}$
    - I was just now wondering if in this case the first dimention should vary around $\hat{x}$ or $\bar{x}$, I believe that should not matter, in fact there might be a overlap if the range is large enough and $\hat{x}$ and $\bar{x}$ are close to each other. Perhaps using the one that is closer to the test data ( but we should not be thinking about test points at the training time (otherwise we will overfit) so discard this thought).
    
**Note:**
We have so far thought of the points without their label or value (we have only been talking about how to construct (x,y,z) by mix and matching different dimentions, but how about F(x,y,z)? 
So in practice we would need $k*(n + 2*\binom{n}{2})$  (x,y,z) and F(x,y,z) for training (k is the number of data points in each of n dimensions per CC) and each 1D interpolation function only uses one set of k points. 

So how should we do this? 
- should we create all data-value pairs before hand (which is similar to real world)
- or should we take in the (x,y,z) points and calculate their values on demand by passing the F (which is more simulation, but easier)

It seems more robust to create the training data beforehand (this way we can also compare it with other standard ML algorithms). So then the question becomes how do we orgonize them in a way that we would be able to access them for training each of the 1D interpolation functions? basically we need $n + 2*\binom{n}{2}$ bins of coordinate-value pairs which can be accessed by an n (or so) variable keys e.g: I need all point-values for a 1D interpolation function that is related to $(\bar{x}free,\bar{y}fixed,\hat{z}fixed)$

How can I orgonize this? Use a dictionary with key being a string made from i, j, CC1, CC2

In [29]:
def make_key(indexes, fixed_secondary_index, cut_centers):
    """
    Args:
        indexes: a python dictionary of {i,j) or (i)
        cut_centers: a python tuple of (CC1, CC2) or (CC1)
    
    Returns:
        a string key
    """
    key = f'{indexes}_{fixed_secondary_index}_{cut_centers}'.strip(' ')
    return key

make_key([0,1], 1, [[0,0,0], [1,2,3]])

'[0, 1]_1_[[0, 0, 0], [1, 2, 3]]'

In [4]:
data_set = np.array([-5,5, 0.5])
CC1 = [0,0,0,0]
n = len(CC1)

In [28]:
def make_1d_data(cut_center, axis_index, data_range):
    """
    Args:
        cut_center: pyhton list
        axis_index: which axis to vary
        data_range: python list [start, end, step]
    """
    single_axis_data = np.arange(*data_range)
    dataset = np.repeat([cut_center], len(single_axis_data), axis=0)
    dataset[:,axis_index] = single_axis_data + cut_center[axis_index]
    return dataset
    
make_1d_data([3,4,-1], 2, [-5,5,1])

array([[ 3,  4, -6],
       [ 3,  4, -5],
       [ 3,  4, -4],
       [ 3,  4, -3],
       [ 3,  4, -2],
       [ 3,  4, -1],
       [ 3,  4,  0],
       [ 3,  4,  1],
       [ 3,  4,  2],
       [ 3,  4,  3]])

In [33]:
def make_2d_data(primary_cut_center, secondary_cut_center, varying_axis_index, fixed_axis_index, data_range):
    """
    primary_cut_center: python list
    secondary_cut_center: python list
    varying_axis_index: int
    fixed_axis_index: int
    data_range: python list [start, end, step]
    """
    single_axis_data = np.arange(*data_range)
    dataset = np.repeat([primary_cut_center], len(single_axis_data), axis=0)
    dataset[:, fixed_axis_index] = secondary_cut_center[fixed_axis_index]
    dataset[:, varying_axis_index] = single_axis_data + secondary_cut_center[varying_axis_index]
    return dataset

make_2d_data([0,0,0], [1,1,1], 0, 1, [-5,5,1])

array([[-4,  1,  0],
       [-3,  1,  0],
       [-2,  1,  0],
       [-1,  1,  0],
       [ 0,  1,  0],
       [ 1,  1,  0],
       [ 2,  1,  0],
       [ 3,  1,  0],
       [ 4,  1,  0],
       [ 5,  1,  0]])

In [37]:
from collections import namedtuple
CutIndexPair = namedtuple('CutIndexPair', ['cut_center', 'index', 'primary_flag'])
DatasetInfo = namedtuple('DatasetInfo', ['data_range', 'cut_index_pairs'])

cut_index_1 = CutIndexPair(cut_center=[0,0,0], index=0, primary_flag=True)
cut_index_2 = CutIndexPair(cut_center=[1,1,1], index=1, primary_flag=False)
data_range = [-5, 5, 1]
dataset_info = DatasetInfo(data_range=data_range, cut_index_pairs=[cut_index_1, cut_index_2])

In [76]:
DatasetInfo = namedtuple('DatasetInfo', ['data_range', 'primary_cut_center' ,'varying_index', 'secondary_cut_center', 'fixed_index'], defaults=(None,)*2)


def make_interpolation_data(dataset_info: DatasetInfo) -> np.ndarray:
    assert dataset_info.varying_index != dataset_info.fixed_index, 'varying index cannot be the same as the fixed index'
    single_axis_data = np.arange(*dataset_info.data_range)
    dataset = np.repeat([dataset_info.primary_cut_center], len(single_axis_data), axis=0)
    if dataset_info.secondary_cut_center:
        dataset[:, dataset_info.fixed_index] = dataset_info.secondary_cut_center[dataset_info.fixed_index]
        dataset[:, dataset_info.varying_index] = single_axis_data + dataset_info.secondary_cut_center[dataset_info.varying_index]
    else:
        dataset[:, dataset_info.varying_index] = single_axis_data + dataset_info.primary_cut_center[dataset_info.varying_index]
    return dataset

In [94]:
dsetinfo = DatasetInfo(data_range=[-5,5,1],
                       primary_cut_center=[1.1, 1.2, 1.3],
                       varying_index=0,
                       secondary_cut_center=[2.1, 2.2, 2.3],
                       fixed_index=2
                      )

make_interpolation_data(dsetinfo)

array([[-2.9,  1.2,  2.3],
       [-1.9,  1.2,  2.3],
       [-0.9,  1.2,  2.3],
       [ 0.1,  1.2,  2.3],
       [ 1.1,  1.2,  2.3],
       [ 2.1,  1.2,  2.3],
       [ 3.1,  1.2,  2.3],
       [ 4.1,  1.2,  2.3],
       [ 5.1,  1.2,  2.3],
       [ 6.1,  1.2,  2.3]])