In [None]:
import json
import os
import pathlib

In [None]:
from autocrit_tools.util import random_string

Note: all of the python scripts below have documentation accessible with `-h`. If anything is unclear, check the docs.

## Preliminaries

We begin by setting some path variables and hyperparameters.

### Paths and File Structure

The experiments we run involve specifying a neural network to run on a given dataset, training that network with an optimization algorithm, and then using the trajectories of that optimizer as initialization points for a critical point-finding algorithm.

To organize all of this inter-related information, we use the following file structure:

In [None]:
!cat ../etc/example_file_structure.txt

The top-level directory (`results/` above) is called `root_path`. Its contents are directories for datasets. The `data_ID` variable identifies a data directory. Data ready for use by a network is stored within that directory as `data.npz`.

For this example, these files will be created for you by the cells below. You just need to change the `root_path`. I'm trying out a new workflow where my Dropbox syncs results across machines, so that's where I set my root_path.

In [None]:
root_path = pathlib.Path("~").expanduser() / "Dropbox" / "OptimizationLandscapes" / "results_mb"

The `setup_network` script can create test Gaussian data with linearly-spaced eigenvalues for use with autoencoders. This determines the value of `data_ID` below.

In [None]:
data_ID = "gaussian_16_linspace"
data_dir = root_path / data_ID

data_dir

The same dataset can be analyzed with different networks, so the dataset directories contain directories for networks. The `network_ID` variable identifies a directory.

Networks are specified by a `network.json` file inside this directory. See the **Network** section below for more on how these are generated.

In [None]:
network_ID = "test_network" + "_" + random_string(6)

A network can be trained with any of a number of optimizers, so the network directories contain directories for optimizers. The `optimizer_ID` variable identifies an optimizer.

Optimizers are specified by an `optimizer.json` file inside this directory. See the **Optimization Experiment** section below for more on how these are generated and executed.

The resulting trajectories are saved in the `trajectories/` sub-folder. By default, they are identified by charwidth-4 integers, e.g `0000`. These trajectories are `npz` files; compressed dictionary-like collections of numpy arrays.

In [None]:
optimizer_ID = "test_optimizer" + "_" + random_string(6)

optimizer_dir = data_dir / network_ID / optimizer_ID

optimizer_dir

The path of a given optimizer can be used to seed any of a number of critical point finding algorithms, so the optimizer directories contain directories for critfinders. The `critfinder_ID` variable identifies an optimizer.

The outputs of a critfinder are saved in the `outputs/` sub-folder, again as `npz`s with charwidth-4 integer identifiers.

In [None]:
critfinder_base_ID = "test_finder"

### Hyperparameters

Lastly, there are some hyperparameters that are typically not changed across a wide variety of experiments. We set these at the top. The values below are chosen for speed, not accuracy.

There are many more hyperparameters. See the docs to determine their default values.

In [None]:
num_optimizer_steps = 100
num_gnm_steps = 500
num_newton_steps = 50

## Network

The `setup_network` script connects a dataset and a specification of network layers.

This specification is a `json` file. It can be automatically generated by first building a network with tools from `autocrit.nn` and then calling `json.dump` on the network's `layer_dicts` attribute. If this attribute doesn't edxist, first build it with `construct_dict`.

Below, we specify a simple group of layers by hand, then save the result.
The network below has _no nonlinearities_.

In [None]:
layers_dir = root_path / "layer_specs"

layers_dir

In [None]:
k = 4
p = 16

layer_dicts = [{"type": "fc",
                "params": {"out_nodes": k, "has_biases": False}},
               {"type": "fc",
                "params": {"out_nodes": p, "has_biases": False}}
              ]


In [None]:
layers_path = layers_dir / "16_4_fcae.json"

layers_path.parent.mkdir(exist_ok=True, parents=True)

In [None]:
with open(layers_path, "w") as f:
    json.dump(layer_dicts, f)

In [None]:
!python ../scripts/setup_network.py -v --results_path {root_path} \
        --ID {network_ID} \
        --data_ID {data_ID} \
        --zero_centering "subtract_mean" \
        --generate_data \
        --task "autoencoding" \
        --layers_path {layers_path}

## Optimization Experiment

Both experiments are executed in the same fashion: first, a `setup` python script is run to create all of the configuration files for each component of the experiment: the data, the network, the optimizer/finder, etc.

The `setup_XYZ_experiment.py` scripts take a very large number of keyword arguments, so they are equipped with more thorough documentation. Run `setup_XYZ_experiment.py -h` to see them.

### Setup

In [None]:
!python ../scripts/setup_optimization_experiment.py\
    --ID {optimizer_ID} \
    --data_dir {data_dir} \
    --network_ID {network_ID} \
    --optimizer "gd"\
    --optimizer_lr 0.01

### Run

We now run the optimization experiment by passing its directory path, a trajectory identifier, and a number of iterations to run to `run_optimization_experiment`.

In [None]:
optimizer_trajectory_increment = 0

In [None]:
trajectory_ID = str(optimizer_trajectory_increment).zfill(4)

!python ../scripts/run_optimization_experiment.py \
    --optimizer_dir {optimizer_dir} --trajectory_ID {trajectory_ID} \
    {num_optimizer_steps}
    
optimizer_trajectory_increment += 1

## Critfinder Experiments

Critfinder experiments are executed in much the same fashion: `setup` and then `run`.

### Setup

The most important variable is the `finder_str`, which identifies the critfinding algorithm. Current choices are `gnm` (gradient norm minimization, as in Pennington and Bahri), `newtonMR` (`m`in`r`es, by Roosta et al.), and `newtonTR` (trust region, as in Dauphin et al.).

See the docs with `!python setup_critfinder_experiment.py -h` for details on the various arguments.

The argument structure is quite different depending on which method is being called: for example, `gnm` needs `minimizer`, either `g`radient `d`escent, `momentum`, or `b`ack`t`racking `l`ine `s`earch, while `newtonXY` methods do not.

This makes it more convenient to encapsulate the setup in a function.

For greater reusability/abstraction, consider using `subprocess` and building the `args` lists, as in `utils/run`.

In [None]:
def setup_critfinder(critfinder_ID, finder_str, optimizer_dir,
                     trajectory_ID, init_theta="uniform_f",
                     theta_perturb=None):

    if isinstance(trajectory_ID, int):
        trajectory_ID = str(trajectory_ID).zfill(4)
        
    if finder_str == "gnm":
        if theta_perturb is not None:
            !python ../scripts/setup_critfinder_experiment.py \
            {optimizer_dir} {finder_str} \
                --ID {critfinder_ID} \
                --minimizer "btls" \
                --init_theta {init_theta} \
                --trajectory_ID {trajectory_ID} \
                --theta_perturb {theta_perturb}
        else:
            !python ../scripts/setup_critfinder_experiment.py \
            {optimizer_dir} {finder_str} \
                --ID {critfinder_ID} \
                --minimizer "btls" \
                --init_theta {init_theta} \
                --trajectory_ID {trajectory_ID}
            
    if "newton" in finder_str:
        if theta_perturb is not None:
            !python ../scripts/setup_critfinder_experiment.py \
            {optimizer_dir} {finder_str} \
                --ID {critfinder_ID} \
                --init_theta {init_theta} \
                --trajectory_ID {trajectory_ID} \
                --gamma_mx 2 \
                --gamma_k 10 \
                --theta_perturb {theta_perturb}
        else:
            !python ../scripts/setup_critfinder_experiment.py \
            {optimizer_dir} {finder_str} \
                --ID {critfinder_ID} \
                --init_theta {init_theta} \
                --trajectory_ID {trajectory_ID} \
                --gamma_mx 2 \
                --gamma_k 10

### Run

For the original set of experiments, it was most important to compare lots of configurations of critfinders on the same data and network, so the code for running critfinders was organized for looping over those configurations, as below.

In [None]:
runs_per_critfinder = 1

In [None]:
theta_perturbs = [None]

trajectories = [0]

init_theta = "uniform_f"
finder_str = "newtonMR"

for theta_perturb in theta_perturbs:
    
    for trajectory in trajectories:

        finder_ID = critfinder_base_ID + "_" + random_string(6)

        print(trajectory, finder_ID)

        setup_critfinder(finder_ID, finder_str, optimizer_dir,
                         trajectory,
                         init_theta=init_theta,
                         theta_perturb=theta_perturb)

        for ii in range(0, runs_per_critfinder):
            print("\t" + str(ii))
            critfinder_dir = optimizer_dir / finder_ID

            output_ID = str(ii).zfill(4)
            
            if finder_str == "gnm":
                num_steps = num_gnm_steps
            else:
                num_steps = num_newton_steps

            !python ../scripts/run_critfinder_experiment.py \
            {critfinder_dir} {output_ID} {num_steps}