<a href="https://colab.research.google.com/github/comet-toolkit/comet_training/blob/main/interpolation_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Examples of comet_maths interpolation
=======================================

Normal 1D interpolation
---------------------------

We first install and import our comet_maths package (and punpy, numpy and matplotlib), and define some example measurement functions:

In [None]:
!pip install comet_maths>=0.22.0
!pip install punpy>=0.44.0
!pip install obsarray>=1.0.0

In [None]:
import comet_maths as cm
import punpy

import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')


# your measurement function
def function1(x):
    return 15*x-20

def function2(x):
    """The function to predict."""
    return x * np.sin(x*10)

Next we define some example data, and do a simple interpolation without uncertainties.

In [None]:
xi=np.arange(0,3.,0.2)
yi=function2(xi)
u_yi=np.abs(0.05*yi)

#add noise
yi = cm.generate_sample(1, yi, u_yi, corr_x="rand")

x=np.arange(0,2.5,0.02)

#It is possible to do interpolation without uncertainties
y=cm.interpolate_1d(xi,yi,x,method="quadratic")

Next we do an example with uncertainties. First we do an example in which we don't specify any input uncertainties, so will only get model uncertainties. Here (for analystical methods) the model uncertainties are calculated by talking the standard deviation between using various interpolation methods. For "cubic" as interpolation method, typically the results from "linear", "quadratic" and "cubic" are compared. Alternatively, the methods to be compared can be set using the unc_method keyword. We then also provide some examples with measurement uncertainties.

In [None]:
#Interpolation with uncertainties
y2,u_y2,corr_y2=cm.interpolate_1d(xi,yi,x,method="cubic",unc_methods=["linear", "quadratic","cubic"],return_uncertainties=True,return_corr=True)

In [None]:
#This time with measurement uncertainties, this is more time consuming as needs to run MC
y3,u_y3,corr_y3=cm.interpolate_1d(xi,yi,x,u_y_i=u_yi,method="cubic",return_uncertainties=True,return_corr=True)

When Monte Carlo is necessary to propagate the uncertainties, the interpolation takes longer. Next we use the gpr method. Note that for this method, there is scale parameter for which a minimum value needs to be set (if not there is no constraint on how much variation is allowed between the data points). 

In [None]:
#While using gpr, random measurement uncertainties can also be propagated quickly (implemented inherently in the gpr algorithm)
y4,u_y4 = cm.interpolate_1d(xi,yi,x,method="gpr",u_y_i=u_yi,min_scale=0.3,return_uncertainties=True)

In [None]:
#However, when the uncertainties are not random (as defined in corr_y_i keyword), the propagation is slower as MC needs to be used 
y5,u_y5,corr_y5 = cm.interpolate_1d(xi,yi,x,method="gpr",u_y_i=u_yi, corr_y_i = "syst",min_scale=0.3,return_uncertainties=True,return_corr=True)

We can also use extrapolation. For gpr, the extrapolation is built in in the algorithm (as well as its uncertainties). For the analytical methods, the model uncertainty for extrapolation is determined from comparing extrapolation using the "nearest" method (the extrapolated values are equal to the bound values) and using the "extrapolate" method (the extrapolated values are calculated using the same method as the interpolation method; e.g. linear extrapolation is used when selecting linear interpolation). When determining model uncertainties, the interpolation method is varied in order to quantify the uncertainties (see above). This variation in the interpolation method will also affect the extrapolation as different methods will be used when selecting the "extrapolate" option.

In [None]:
x2=np.arange(0,3.5,0.02)
y6,u_y6= cm.interpolate_1d(xi,yi,x2,method="gpr",u_y_i=u_yi,min_scale=0.3,return_uncertainties=True,return_corr=False)
y7,u_y7=cm.interpolate_1d(xi,yi,x2,method="cubic",extrapolate="extrapolate",return_uncertainties=True,return_corr=False)

Next, we plot the results:

In [None]:
fig=plt.figure(figsize=(10,5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(x, function2(x), "k", label="true values")
ax.errorbar(xi, yi, yerr=u_yi, fmt="ko", ls=None, label="observed values")
ax.plot(x, y, "b:", label="quadratic interpolation")
ax.plot(x, y2, "r:", label="cubic interpolation (model error only)")
ax.fill_between(
x,
y2 - 1.9600 * u_y2,
(y2 + 1.9600 * u_y2),
alpha=0.25,
fc="r",
ec="None",
label="95% confidence interval",
lw=0,
)
ax.plot(x, y2, "m", label="cubic interpolation")
ax.fill_between(
x,
y2 - 1.9600 * u_y2,
(y2 + 1.9600 * u_y2),
alpha=0.25,
fc="m",
ec="None",
label="95% confidence interval",
lw=0,
)
ax.plot(x, y4, "g", label="gpr interpolation")
ax.fill_between(
x,
y4 - 1.9600 * u_y4,
(y4 + 1.9600 * u_y4),
alpha=0.25,
fc="g",
ec="None",
label="95% confidence interval",
lw=0,
)
ax.plot(x, y5, "c", label="gpr interpolation with systematic measurement error")
ax.fill_between(
x,
y5 - 1.9600 * u_y5,
(y5 + 1.9600 * u_y5),
alpha=0.25,
fc="c",
ec="None",
label="95% confidence interval",
lw=0,
)
ax.plot(x2, y6, "g--", label="gpr interpolation with extrapolation")
ax.fill_between(
x2,
y6 - 1.9600 * u_y6,
(y6 + 1.9600 * u_y6),
alpha=0.15,
fc="g",
ec="None",
lw=0,
)
ax.plot(x2, y7, "m--", label="cubic interpolation with extrapolation")
ax.fill_between(
x2,
y7 - 1.9600 * u_y7,
(y7 + 1.9600 * u_y7),
alpha=0.15,
fc="m",
ec="None",
lw=0,
)
ax.set_ylim(-5,5)
ax.legend(ncol=2)
fig.show()

fig2=plt.figure(figsize=(10,5))
ax = fig2.add_subplot(1, 3, 1)
ax2 = fig2.add_subplot(1, 3, 2)
ax3 = fig2.add_subplot(1, 3, 3)
p1=ax.imshow(corr_y2, vmin=-1, vmax=1, cmap="bwr")
ax.set_title("cubic interpolation (model error only)")
p2=ax2.imshow(corr_y3, vmin=-1, vmax=1, cmap="bwr")
ax2.set_title("cubic interpolation")
p3=ax3.imshow(corr_y5, vmin=-1, vmax=1, cmap="bwr")
ax3.set_title("gpr interpolation with systematic measurement error")
fig2.colorbar(p2)
fig2.show()

1D interpolation along high-resolution example
------------------------------------------------

We again start by defining some example data, and do an interpolation alond a high resolution example without uncertainties, followed by an example with uncertainties. Here we do have 

In [None]:
xi = np.arange(0, 2.8, 0.25)
yi = function2(xi)
u_yi = 0.03 * np.ones_like(yi)
yi = cm.generate_sample(1, yi, u_yi, corr_x="rand").squeeze()
x_HR = np.arange(-0.5, 4., 0.09)
y_HR = function2(x_HR)
u_y_HR_syst = 0.9 * np.ones_like(y_HR)
u_y_HR_rand = 0.02 * y_HR
cov_y_HR = cm.convert_corr_to_cov(
    np.ones((len(y_HR), len(y_HR))), u_y_HR_syst
) + cm.convert_corr_to_cov(np.eye(len(y_HR)), u_y_HR_rand)
corr_y_HR = cm.correlation_from_covariance(cov_y_HR)
u_y_HR = cm.uncertainty_from_covariance(cov_y_HR)

y_HR = cm.generate_sample(1, y_HR, u_y_HR, corr_x=corr_y_HR)

xx = np.arange(0.1, 2.5, 0.02)

y_hr_cubic = cm.interpolate_1d_along_example(
    xi,
    yi,
    x_HR,
    y_HR,
    xx,
    relative=False,
    method="cubic",
    method_hr="cubic",
)

y_hr_cubic2, u_y_hr_cubic2 = cm.interpolate_1d_along_example(
    xi,
    yi,
    x_HR,
    y_HR,
    xx,
    relative=False,
    method="cubic",
    method_hr="cubic",
    u_y_i=u_yi,
    corr_y_i="rand",
    u_y_hr=u_y_HR,
    corr_y_hr="syst",
    return_uncertainties=True,
    plot_residuals=True,
    return_corr=False,
)

Then, we calculate (for comparison), the interpolated data points and uncertainties using the gpr method when not using a high resolution example. Here the min_scale needs to be set again. Next, we also use the gpr method together with using a high resolution example, without and with uncertainties.

In [None]:
y_gpr, u_y_gpr = cm.interpolate_1d(
    xi,
    yi,
    xx,
    method="gpr",
    u_y_i=u_yi,
    min_scale=0.3,
    return_uncertainties=True,
)

y_hr_gpr = cm.interpolate_1d_along_example(
    xi,
    yi,
    x_HR,
    y_HR,
    xx,
    relative=False,
    method="gpr",
    method_hr="gpr",
    min_scale=0.3,
)
y_hr_gpr2, u_y_hr_gpr2= cm.interpolate_1d_along_example(
    xi,
    yi,
    x_HR,
    y_HR,
    xx,
    relative=False,
    method="gpr",
    method_hr="gpr",
    u_y_i=u_yi,
    u_y_hr=u_y_HR,
    corr_y_i="rand",
    corr_y_hr=corr_y_HR,
    min_scale=0.3,
    return_uncertainties=True,
    plot_residuals=False,
    return_corr=False
)

As a sanity check, we can propagate the uncertainties externally to the comet_maths tool. We can use the punpy tool to this. This is effectively what happens internally when propagating the measurement uncertainties through the tool. In order to do this, we use the Interpolator class since this has predefined functions that take the right input quantities as arguments, and where all other optional parameters are set in the class initialiser. Here, we set add_model_error to true, so that in each MC iterations a model error is added to account for the uncertainty in the interpolation method.

In [None]:
mcprop = punpy.MCPropagation(100, parallel_cores=4)

inp2 = cm.Interpolator(
    relative=False, method="gpr", method_hr="gpr", min_scale=0.3,add_model_error=True
)
u_y_hr, corr2 = mcprop.propagate_random(
    inp2.interpolate_1d_along_example,
    [xi, yi, x_HR, y_HR, xx],
    [None, u_yi, None, u_y_HR, None],
    corr_x=[None, "rand", None, corr_y_HR, None],return_corr=True
)

We also again give an example with extrapolation:

In [None]:
xx2= np.arange(0.1, 3.5, 0.02)
y_hr_gpr3, u_y_hr_gpr3= cm.interpolate_1d_along_example(
    xi,
    yi,
    x_HR,
    y_HR,
    xx2,
    relative=False,
    method="gpr",
    method_hr="gpr",
    u_y_i=u_yi,
    u_y_hr=u_y_HR,
    corr_y_i="rand",
    corr_y_hr=corr_y_HR,
    min_scale=0.3,
    extrapolate="nearest",
    return_uncertainties=True,
    plot_residuals=False,
    return_corr=False
)

Finally, we again make some plots:

In [None]:
fig3=plt.figure(figsize=(10,5))
ax = fig3.add_subplot(1, 1, 1)
ax.plot(xx, function2(xx), "b", label="True line")
ax.plot(xi, yi, "ro", label="low-res data")
ax.plot(x_HR, y_HR, "go", label="high-res data")
ax.plot(
    xx,
    cm.interpolate_1d(xi, yi, xx, method="cubic"),
    "r:",
    label="cubic spline interpolation",
)
ax.plot(xx, y_gpr, "c:", label="GPR interpolation")
ax.plot(xx, y_hr_gpr, "g", label="GPR interpolation with HR example")
ax.fill_between(xx,y_hr_gpr2-1.9600*u_y_hr_gpr2,(y_hr_gpr2+1.9600*u_y_hr_gpr2),alpha=0.25,fc="g",ec="None",
                    label="95% confidence interval",lw=0)
ax.plot(xx2, y_hr_gpr3, "g--", label="GPR interpolation with HR example and extrapolation")
ax.fill_between(xx2,y_hr_gpr3-1.9600*u_y_hr_gpr3,(y_hr_gpr3+1.9600*u_y_hr_gpr3),alpha=0.15,fc="g",ec="None",
                    lw=0)
ax.plot(
    xx, y_hr_cubic, "m-.", label="cubic spline interpolation with HR example"
)
ax.fill_between(
    xx,
    y_hr_cubic2 - 1.9600 * u_y_hr_cubic2,
    (y_hr_cubic2 + 1.9600 * u_y_hr_cubic2),
    alpha=0.25,
    fc="m",
    ec="None",
    label="95% confidence interval",
    lw=0,
)
ax.legend(ncol=2, prop={"size": 6})
ax.set_ylim(-5,5)