# Toy model hyper experiments

## Toy problem setting

Lets consider a simple analytical function to be learned by a network architecture.

y = f(x) = x^2

From the work of Weinan et al https://arxiv.org/pdf/1807.00297.pdf,  
we expect that a Deep Relu network with:  
- depth = L + 1  
- width = M + d + 1

where d is the dimensionality of x contained in [-1, 1], L is a positive integer and M = 2,  
that we can eventually find a function g for which  
- sup|g(x) - f(x)| < 2^(-2L)

With the work of MacKay et al as presented in https://arxiv.org/pdf/1903.03088.pdf, we can only fine tune hyperparameters that constraint the training loss and so constraint the weights/parameters of the network directly or indirectly during the training process. Examples of these type of hyperparameters can be:
- data modification/augmentation parameters like masks applied to image data or weighting data samples
- l1 and l2 regularization of the loss function
- functions that apply on weight/parameters like dropout. Notice that since the method proposed operates on expectation it is possible to take into account hyperparameters with stochastic effects like dropout.

Examples of hyperparameter cases not covered:
- When the validation loss is impacted as well like by changing the number of neurons. Notice dropout is not supposed to be used during validation and the loss would not be subject ot the l1 or l2 regularizations of weights on training.
- When the parameter is linked to the optimization procedure itself, like the learning rate of any optimizer employed.

For this toy problem we propose to verify the effect of the x_range trivial hyperparameter where:
- We have a training randomly uniformly sampled interval [-t, t]
- To evaluate the function on equally spaced segments of an interval [-e, e]

## One experiment example

In [None]:
# %load_ext autoreload
# %autoreload 2

from tqdm.notebook import tqdm
import numpy as np
from self_tuning_nets.hyper.experiments.relu_toy_model import \
    ExperimentConfig, run_deterministic_cpu_hyper_relu_experiment
from self_tuning_nets.visualization import function_animation

from dataclasses import replace
import matplotlib.pyplot as plt
from self_tuning_nets.visualization import function_animation, trajectories_plot, \
    trajectories_legend, trajectories_dist_from_target, trajectories_general_plot
from itertools import product

In [None]:
experiment_config = ExperimentConfig()
X_eval = np.expand_dims(np.linspace(
    -experiment_config.EVAL_RANGE,
    experiment_config.EVAL_RANGE,
    experiment_config.EVAL_SIZE), 1)

In [None]:
f_trajectory, x_range_trajectory, x_scaling_trajectory, \
dist_trajectory, wlosses, hlosses = \
run_deterministic_cpu_hyper_relu_experiment(experiment_config, verbose=0)

In [None]:
function_animation(X_eval, [f_trajectory], ["b"])

## Effect of random initialization

In [None]:
SAMPLING_SPEED = 100
sample_seeds = range(40, 50)
results = [
    run_deterministic_cpu_hyper_relu_experiment(
        replace(experiment_config,
                FRAMEWORK_SEED=sample_seed,
                MAX_TRAINING_CYCLES=3000,
                PRED_SAMPLING_STEP=SAMPLING_SPEED))
    for sample_seed in sample_seeds
]

In [None]:
function_trajectories, x_range_trajectories, x_scaling_trajectories, \
dist_trajectories, wlosses, hlosses = \
zip(*results)

In [None]:
min_dist = [min(dist) for dist in dist_trajectories]

def rescaled_color(v, group, palette="Reds"):
    maxv = max(group)
    minv = min(group)
    if maxv == minv:
        return plt.get_cmap(palette)(1.0)
    return plt.get_cmap(palette)((v - minv) / (maxv - minv))

g1 = [d for d in min_dist if d < 1.0]
g2 = [d for d in min_dist if d >= 1.0]
lines_palette = []
for d in min_dist:
    g = g1
    c = "Reds"
    if d >= 1.0:
        g = g2
        c = "Blues"
    lines_palette.append(rescaled_color(d, g, c))

In [None]:
[d < 1.0 for d in min_dist]

In [None]:
function_animation(X_eval, function_trajectories, lines_palette)

In [None]:
trajectories_legend(sample_seeds, lines_palette)
plt.gcf().set_size_inches(1.0, 1.0)
plt.show()

In [None]:
trajectories_general_plot(x_range_trajectories, lines_palette, ylabel="x_range")
plt.show()
trajectories_general_plot(dist_trajectories, lines_palette, ylabel="Function distance from x^2")
plt.show()

## Demonstration of converging hyperparameter trajectories

In [None]:
SAMPLING_SPEED = 100
exp_settings = list(product([42, 47], [0.5, 1.0, 2.0, 3.0]))
results = [
    run_deterministic_cpu_hyper_relu_experiment(
        replace(experiment_config,
                FRAMEWORK_SEED=sample_seed,
                MAX_TRAINING_CYCLES=10000,
                INIT_TRAIN_RANGE=x_range,
                PRED_SAMPLING_STEP=SAMPLING_SPEED))
    for sample_seed, x_range in exp_settings
]

In [None]:
function_trajectories, x_range_trajectories, x_scaling_trajectories, \
dist_trajectories, wlosses, hlosses = \
zip(*results)

In [None]:
min_dist = [min(dist) for dist in dist_trajectories]

def rescaled_color(v, group, palette="Reds"):
    maxv = max(group)
    minv = min(group)
    if maxv == minv:
        return plt.get_cmap(palette)(1.0)
    return plt.get_cmap(palette)((v - minv) / (maxv - minv))

g1 = [d for d in min_dist if d < 1.0]
g2 = [d for d in min_dist if d >= 1.0]
lines_palette = []
for d in min_dist:
    g = g1
    c = "Reds"
    if d >= 1.0:
        g = g2
        c = "Blues"
    lines_palette.append(rescaled_color(d, g, c))

In [None]:
# lines_palette = [plt.get_cmap("viridis")(i) for i in np.linspace(0, 0.7, len(function_trajectories))]
function_animation(X_eval, function_trajectories, lines_palette)

In [None]:
trajectories_legend(exp_settings, lines_palette)
plt.gcf().set_size_inches(1.0, 1.0)
plt.show()

In [None]:
trajectories_general_plot(x_range_trajectories, lines_palette, ylabel="x_range")
plt.show()
trajectories_general_plot(dist_trajectories, lines_palette, ylabel="Function distance from x^2")
plt.show()
trajectories_general_plot(x_scaling_trajectories, lines_palette, ylabel="Scale x_range")
plt.show()