# __Puffins__

## NB 05 - Model stability as a function of N and N / p

As we saw in Notebooks 01 and 02, the Feature-Weight Least-Squares regression provides an apparently stable solution when comparing models trained on both 15,000 data points and only 1,500 data points.

In this notebook, we're going to systematically investigate how stable our model is for a range of N values and for a range of N / p values, where N is the number of data points and p is the number of features (columns in our design matrix).

In [2]:
%matplotlib widget
import numpy as np
import matplotlib.pyplot as plt

from puffins.tuner import Tuner
from puffins.weight_functions import matern32
from puffins.data import TimeSeries,LinearModel
from puffins.uncertainties import UncertaintyEstimator
from puffins.basis_functions import basis_constant, basis_linear

np.random.seed(42) # to enforce reproducibility
RCOND = 1e-14 # numerical rounding for matrix conditioning
plotnum=1

You know the routine, unpack the data, simulate some _more or less_ realistic uncertainties, and tune our models hyperparameters.

In [3]:
time, flux = np.loadtxt('../data/ugru.dat').T
flux_err = np.random.normal(0,0.005,len(flux))
period = 1.88045
n_harmonics = 100
feature_weighting_width = 0.5

In [4]:
data = TimeSeries(time, flux)
fwls = LinearModel('fw', basis_functions=[basis_constant,], feature_embedding='fourier', 
                   feature_weighting_function=matern32, feature_weighting_width=feature_weighting_width,
                   period=period, n_harmonics=100, W=None)

In [5]:
joint_ = {'feature_weighting_width': [0.001,1,'uniform'], 'period': [1.87,1.89,'uniform']}
joint_tuner = Tuner(fwls, hyperpars=joint_, n_trials=200, direction='minimize')
joint_tuner.run_tune(data.predictors, data.targets)

[I 2025-02-06 08:35:11,759] A new study created in memory with name: no-name-16f820c3-f81c-4eda-ba28-d9fdbbe4e5cc
[I 2025-02-06 08:35:11,829] Trial 0 finished with value: 0.002861038966726475 and parameters: {'feature_weighting_width': 0.10567331740867354, 'period': 1.8738878049197811}. Best is trial 0 with value: 0.002861038966726475.
[I 2025-02-06 08:35:11,891] Trial 1 finished with value: 0.0010258268249119023 and parameters: {'feature_weighting_width': 0.394663755252485, 'period': 1.8797627825874723}. Best is trial 1 with value: 0.0010258268249119023.
[I 2025-02-06 08:35:11,946] Trial 2 finished with value: 0.009395685918245324 and parameters: {'feature_weighting_width': 0.8217411108182091, 'period': 1.8879605565150837}. Best is trial 1 with value: 0.0010258268249119023.
[I 2025-02-06 08:35:12,004] Trial 3 finished with value: 0.006901742597557516 and parameters: {'feature_weighting_width': 0.4428345981828354, 'period': 1.870062981273098}. Best is trial 1 with value: 0.001025826824

In [7]:
print(joint_tuner)
fwls.set_X_kwargs(update=True, **joint_tuner.best_hyperpars)
fwls.set_X_train(data.predictors)
fwls.train(data.targets)

Tuner:
 feature_weighting_width: 0.023203539528116962
period: 1.880482581895621


Great, now we have our model that was tuned to 100% of the data. 

First, at fixed p = 2 * n_harmonics + 1 = 201, we'll explore how the regression coefficients behave for decreasing N, in 5% increments.

In [16]:
n = len(time)
coefficients_i = []
means_i = []
vars_i = []

step_size = int(n * 0.05)

estimator = UncertaintyEstimator(data, fwls)

for step in range(step_size, n - step_size + 1, step_size):
    
    ddjk_i = estimator.run_delete_d_jackknife_sampling(n_groups=500, n_delete=step)
    coefficients_i.append(ddjk_i['sampled_coefs'])
    means_i.append(ddjk_i['coefs_mean'])
    vars_i.append(ddjk_i['coefs_var'])
    


Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:26<00:00, 19.07it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:24<00:00, 20.12it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:23<00:00, 21.01it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:23<00:00, 21.55it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:21<00:00, 22.89it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:20<00:00, 24.39it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:20<00:00, 24.75it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:20<00:00, 24.09it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:18<00:00, 26.48it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:16<00:00, 29.90it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:17<00:00, 28.12it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/500 [00:14<00:00, 35.70it/s]
Delete-d Jackkinfe Sampling: 100%|██████████| 500/50

TypeError: unsupported operand type(s) for /: 'float' and 'NoneType'

In [18]:
print(np.shape(vars_i))

(19, 201)


In [None]:
# Box and whisker plot for the first harmonic, 5th harmonic, 10 harmonic, and 50th harmonic.