# Interpolation Parameter Stability
Interpolating the simulation intensity data in a proper way seems to improve inverse model performance. One criteria any interpolation strategy should satisfy is the fitting parameter stability. We define stability as if a small change in the intensity data(think a small change in the input TMPs used to generate that data) should produce a small change in the fitting parameters. In other words, small changes in the input data should not create wild fluctuations in the fitting parameter space. 

# How to Test
To test this, we fit the our simulation dataset and pick row combinations. On each case, we pick a group of rows such that all the labels(TMPs) similar for them except for 1 pre-speicified label(TMP). Then we generate all possible 'combinations' from each group and use them to create new rows.  

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from inverse_modelling_tfo.data import equidistance_detector_normalization
from inverse_modelling_tfo.data.intensity_interpolation import interpolate_exp, get_interpolate_fit_params
from inverse_modelling_tfo.data.interpolation_function_zoo import *
from inverse_modelling_tfo.models import RandomSplit, ValidationMethod, HoldOneOut, CVSplit
from inverse_modelling_tfo.models.custom_models import SplitChannelCNN, PerceptronReLU, PerceptronBN, PerceptronDO, PerceptronBD
from inverse_modelling_tfo.features.build_features import create_spatial_intensity, create_row_combos

In [4]:
RAW_DATA_PATH_NEW = r'/home/rraiyan/personal_projects/tfo_inverse_modelling/data/intensity/s_based_intensity_low_conc3.pkl'
data= pd.read_pickle(RAW_DATA_PATH_NEW)
equidistance_detector_normalization(data)

DETECTOR_COUNT = 20
break_indices = [0, 4, 12, 20]
piece_count =len(break_indices) - 1
feature_count = piece_count * 2

# data = get_interpolate_fit_params(data, (1.0, 0.8), DETECTOR_COUNT, exp_piecewise_affine, break_indices=break_indices)
data.head()

Unnamed: 0,SDD,Intensity,Wave Int,Uterus Thickness,Maternal Wall Thickness,Maternal Hb Concentration,Maternal Saturation,Fetal Hb Concentration,Fetal Saturation
0,10,9.918276e-06,1.0,5.0,6.0,12.0,0.9,0.11,0.1
1,14,6.663165e-08,1.0,5.0,6.0,12.0,0.9,0.11,0.1
2,19,5.222098e-10,1.0,5.0,6.0,12.0,0.9,0.11,0.1
3,23,7.187244e-11,1.0,5.0,6.0,12.0,0.9,0.11,0.1
4,28,2.637992e-11,1.0,5.0,6.0,12.0,0.9,0.11,0.1


In [10]:
data['SDD'].unique()


array([10, 14, 19, 23, 28, 32, 37, 41, 46, 50, 55, 59, 64, 68, 73, 77, 82,
       86, 91, 95])

In [3]:
feature_columns = [f'alpha{n}' for n in range(feature_count)]
all_labels = ['Wave Int',	'Maternal Wall Thickness',	'Maternal Hb Concentration',	'Maternal Saturation',	'Fetal Hb Concentration',	'Fetal Saturation']
dynamic_label = ['Maternal Saturation']
fixed_labels = [x for x in all_labels if x not in dynamic_label]
combined_data, _, __ = create_row_combos(data, feature_columns, fixed_labels, dynamic_label, 'comb', 2)

In [4]:
# Create difference columns & remove older columns
for i in range(feature_count):
    # combined_data[f'alpha{i}_diff'] = combined_data[f'x_{i}'] - combined_data[f'x_{i + feature_count}']
    combined_data[f'relative alpha{i}_diff'] = (combined_data[f'x_{i}'] - combined_data[f'x_{i + feature_count}']) / combined_data[f'x_{i + feature_count}']
    # combined_data.drop(columns = [f'x_{i}', f'x_{i + feature_count}'],inplace=True)
combined_data[f'label_diff'] = combined_data[f'{dynamic_label[0]} 0'] - combined_data[f'{dynamic_label[0]} 1']
# combined_data.drop(columns=[f'{dynamic_label[0]} 0', f'{dynamic_label[0]} 1'], inplace=True)
combined_data.head()

Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,...,Fetal Saturation,Maternal Saturation 0,Maternal Saturation 1,relative alpha0_diff,relative alpha1_diff,relative alpha2_diff,relative alpha3_diff,relative alpha4_diff,relative alpha5_diff,label_diff
0,-7.111556,-0.41652,-12.518989,-0.163022,-14.57913,-0.123006,-7.05591,-0.41362,-12.409437,-0.163074,...,0.1,0.9,0.925,0.007887,0.007012,0.008828,-0.000321,0.007285,0.000132,-0.025
1,-7.111556,-0.41652,-12.518989,-0.163022,-14.57913,-0.123006,-6.997527,-0.410749,-12.297721,-0.16313,...,0.1,0.9,0.95,0.016296,0.014051,0.017993,-0.000662,0.01482,0.000259,-0.05
2,-7.111556,-0.41652,-12.518989,-0.163022,-14.57913,-0.123006,-6.936344,-0.407902,-12.183683,-0.163189,...,0.1,0.9,0.975,0.02526,0.021128,0.027521,-0.001025,0.022627,0.000379,-0.075
3,-7.111556,-0.41652,-12.518989,-0.163022,-14.57913,-0.123006,-6.872297,-0.405074,-12.067149,-0.163253,...,0.1,0.9,1.0,0.034815,0.028256,0.037444,-0.001412,0.030726,0.000491,-0.1
4,-7.05591,-0.41362,-12.409437,-0.163074,-14.473694,-0.122989,-6.997527,-0.410749,-12.297721,-0.16313,...,0.1,0.925,0.95,0.008343,0.00699,0.009084,-0.000341,0.007481,0.000127,-0.025


In [10]:
alpha_columns = list(filter(lambda X: 'alpha' in X, combined_data.columns))
combined_data[alpha_columns + ['label_diff']].describe()

Unnamed: 0,relative alpha0_diff,relative alpha1_diff,relative alpha2_diff,relative alpha3_diff,relative alpha4_diff,relative alpha5_diff,label_diff
count,80000.0,80000.0,80000.0,80000.0,80000.0,80000.0,80000.0
mean,-0.24772,0.020717,0.01603,0.004397,0.013348,0.007399,-0.05
std,22.119747,0.032158,0.026077,0.016046,0.021431,0.02498,0.025
min,-3769.557438,-0.01087,-0.051034,-0.014682,-0.013985,-0.032181,-0.1
25%,-0.05958,-0.00482,-0.003546,-0.00128,-0.002736,-0.003648,-0.075
50%,-0.015922,0.002854,0.005567,4.1e-05,0.002858,-0.000519,-0.05
75%,0.007413,0.043536,0.030505,0.001209,0.024417,0.006774,-0.025
max,2117.159528,0.113763,0.098895,0.15768,0.128842,0.24257,-0.025


In [7]:
all_correlations = []
for i in range(feature_count):
    all_correlations.append(combined_data[f'relative alpha{i}_diff'].corr(combined_data['label_diff'], 'pearson'))
print(all_correlations)

[0.003663235753908606, -0.3313615194433914, -0.3147066118701059, -0.1405327162485311, -0.3175079764756656, -0.15359964664798223]


# Remarks
Checking the correlation, alpha0 barely has any impact, leading to a nearly zero correlation. This makes sense that the initial bias should be more or less close to 1 (Since the data is normalized, the intercept for the first line should be unity/I(SDD=0) = 1). So it's values for the relative difference are very high(on both the + and - sides). Ignoring alpha 0, for the other fitting parameters, the maximum relative difference is actually relatively small!