# Introduction to Fixed hyperparam comparison of pytorch and tensorflow

We want to test the self tuning network proposal of MacKay et al in https://arxiv.org/pdf/1903.03088.pdf in the simplest setting possible. Such that we can verify basic behaviors expected from the method parallely on the provided pytorch implementation and a tensorflow implementation to be used on other networks.

# Toy problem setting

Lets consider a simple analytical function to be learned by a network architecture.

y = f(x) = x^2

From the work of Weinan et al https://arxiv.org/pdf/1807.00297.pdf,  
we expect that a Deep Relu network with:  
- depth = L + 1  
- width = M + d + 1

where d is the dimensionality of x contained in [-1, 1], L is a positive integer and M = 2,  
that we can eventually find a function g for which  
- sup|g(x) - f(x)| < 2^(-2L)


With the work of MacKay et al as presented in https://arxiv.org/pdf/1903.03088.pdf, we can only fine tune hyperparameters that constraint the training loss and so constraint the weights/parameters of the network directly or indirectly during the training process. Examples of these type of hyperparameters can be:
- data modification/augmentation parameters like masks applied to image data or weighting data samples
- l1 and l2 regularization of the loss function
- functions that apply on weight/parameters like dropout. Notice that since the method proposed operates on expectation it is possible to take into account hyperparameters with stochastic effects like dropout.

Examples of hyperparameter cases not covered:
- When the validation loss is impacted as well like by changing the number of neurons. Notice dropout is not supposed to be used during validation and the loss would not be subject ot the l1 or l2 regularizations of weights on training.
- When the parameter is linked to the optimization procedure itself, like the learning rate of any optimizer employed.

For this toy problem we propose to verify the effect of the x_range trivial hyperparameter where:
- We have a training randomly uniformly sampled interval [-t, t]
- To evaluate the function on equally spaced segments of an interval [-e, e]

# Torch basics

## Network training example

In [None]:
from tqdm.notebook import tqdm
import numpy as np
from self_tuning_nets.linear_experiment import \
    ExperimentConfig, run_deterministic_cpu_basic_torch_experiment

experiment_config = ExperimentConfig()
X_eval = np.expand_dims(np.linspace(
    -experiment_config.EVAL_RANGE,
    experiment_config.EVAL_RANGE,
    experiment_config.EVAL_SIZE), 1)

f_trajectory = run_deterministic_cpu_basic_torch_experiment(experiment_config, X_eval, verbose=0)

In [None]:
from self_tuning_nets.visualization import function_animation
function_animation(X_eval, [f_trajectory], ["b"])

## Effect of different parameters

In [None]:
from tqdm.notebook import tqdm
import numpy as np
from self_tuning_nets.linear_experiment import \
    ExperimentConfig, run_deterministic_cpu_basic_torch_experiment

experiment_config = ExperimentConfig()
X_eval = np.expand_dims(np.linspace(
    -experiment_config.EVAL_RANGE,
    experiment_config.EVAL_RANGE,
    experiment_config.EVAL_SIZE), 1)
experiment_config

### Weight seeds

In [None]:
from dataclasses import replace

SAMPLING_SPEED = 100
function_trajectories = [
    run_deterministic_cpu_basic_torch_experiment(
        replace(experiment_config,
                WEIGHTS_SEED=sample_seed,
                PRED_SAMPLING_STEP=SAMPLING_SPEED),
        X_eval)
    for sample_seed in range(40, 50)
]

In [None]:
import matplotlib.pyplot as plt
from self_tuning_nets.visualization import function_animation

lines_palette = [plt.get_cmap("viridis")(i) for i in np.linspace(0, 0.7, len(function_trajectories))]
function_animation(X_eval, function_trajectories, lines_palette)

In [None]:
from self_tuning_nets.visualization import trajectories_plot, trajectories_legend, trajectories_dist_from_target

trajectories_distances = trajectories_dist_from_target(function_trajectories, X_eval ** 2, 100)
trajectories_plot(trajectories_distances, lines_palette, SAMPLING_SPEED, title_extra="weights")
plt.show()
trajectories_legend(range(40, 50), lines_palette)
plt.show()

### Batches seed

In [None]:
from dataclasses import replace
from itertools import product

SAMPLING_SPEED = 1
MAX_BATCHES=400
function_trajectories = [
    run_deterministic_cpu_basic_torch_experiment(
        replace(experiment_config,
                WEIGHTS_SEED=weight_seed,
                DATA_SEED=data_seed,
                PRED_SAMPLING_STEP=SAMPLING_SPEED,
                MAX_BATCHES=MAX_BATCHES),
        X_eval)
    for weight_seed, data_seed in list(product([40, 45, 47], [42, 43, 44, 45]))
]

In [None]:
from self_tuning_nets.visualization import trajectories_plot, trajectories_legend, trajectories_dist_from_target
import matplotlib.pyplot as plt

lines_palette = [plt.get_cmap("viridis")(i) for i in np.linspace(0, 0.7, len(function_trajectories))]
trajectories_distances = trajectories_dist_from_target(function_trajectories, X_eval ** 2, 100)
trajectories_plot(trajectories_distances, lines_palette, SAMPLING_SPEED, title_extra="(weights, data)")
plt.show()
trajectories_legend(list(product([40, 45, 47], [42, 43, 44, 45])), lines_palette)
plt.show()

## Fixed hyperparameter experiment

In [None]:
from dataclasses import replace

SAMPLING_SPEED = 1
MAX_BATCHES = 2000
function_trajectories = [
    run_deterministic_cpu_basic_torch_experiment(
        replace(experiment_config,
                WEIGHTS_SEED=40,
                TRAIN_RANGE=train_range,
                PRED_SAMPLING_STEP=SAMPLING_SPEED,
                MAX_BATCHES=MAX_BATCHES),
        X_eval)
    for train_range in [4.0, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5, 0.25]
]

In [None]:
from self_tuning_nets.visualization import trajectories_plot, trajectories_legend, trajectories_dist_from_target
import matplotlib.pyplot as plt

lines_palette = [plt.get_cmap("viridis")(i) for i in np.linspace(0, 0.7, len(function_trajectories))]
trajectories_distances = trajectories_dist_from_target(function_trajectories, X_eval ** 2, 100)
trajectories_plot(trajectories_distances, lines_palette, SAMPLING_SPEED, title_extra="X max abs value\nEval max abs value of 1")
plt.show()
trajectories_legend([4.0, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5, 0.25], lines_palette)
plt.show()

# Keras basics

In [None]:
from tqdm.notebook import tqdm
import numpy as np
from self_tuning_nets.linear_experiment import \
    ExperimentConfig, run_deterministic_cpu_basic_keras_experiment

experiment_config = ExperimentConfig()
X_eval = np.expand_dims(np.linspace(
    -experiment_config.EVAL_RANGE,
    experiment_config.EVAL_RANGE,
    experiment_config.EVAL_SIZE), 1)

f_trajectory = run_deterministic_cpu_basic_keras_experiment(experiment_config, X_eval, verbose=0)

In [None]:
from self_tuning_nets.visualization import function_animation
function_animation(X_eval, [f_trajectory], ["b"])

## Fixed hyperparameter experiment reproduction

In [None]:
from dataclasses import replace

SAMPLING_SPEED = 1
MAX_BATCHES = 1000
function_trajectories = [
    run_deterministic_cpu_basic_keras_experiment(
        replace(experiment_config,
                WEIGHTS_SEED=40,
                TRAIN_RANGE=train_range,
                PRED_SAMPLING_STEP=SAMPLING_SPEED,
                MAX_BATCHES=MAX_BATCHES),
        X_eval)
    for train_range in [4.0, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5, 0.25]
]

In [None]:
from self_tuning_nets.visualization import trajectories_plot, trajectories_legend, trajectories_dist_from_target
import matplotlib.pyplot as plt

lines_palette = [plt.get_cmap("viridis")(i) for i in np.linspace(0, 0.7, len(function_trajectories))]
trajectories_distances = trajectories_dist_from_target(function_trajectories, X_eval ** 2, 100)
trajectories_plot(trajectories_distances, lines_palette, SAMPLING_SPEED, title_extra="X max abs value\nEval max abs value of 1")
plt.show()
trajectories_legend([4.0, 3.0, 2.5, 2.0, 1.5, 1.0, 0.5, 0.25], lines_palette)
plt.show()