In [None]:
import autograd.numpy as np
import matplotlib.pyplot as plt

import autocrit
import autocrit.nn as nn

In [None]:
def shortdir(obj):
    return  [elem for elem in dir(obj) if not elem.startswith("_")]

`autocrit` has two inter-mixed functions:
1. Provide implementations of critical point-finding algorithms for neural networks in `autograd`.
2. Allow for saving and reproduction of critical point-finding experiments.

As such, there are often two APIs: one aimed at easing goal 1, the other aimed at easing goal 2.

When fiddling around in a notebook, the first type of API is preferred -- it's easy for humans to work with.
When programmatically executing experiments, the latter type is preferred -- it's easy to write programs that use it

Programmatic execution of reproducible experiments is supported by the scripts in the `scripts/` folder in `autocrit_tools/`.

## The top-level namespace

Modules plus convenient access to main classes.

In [None]:
top_lvl = shortdir(autocrit)

top_lvl

Uncomment and run the cell to see the docstrings for all of the classes (and their methods) and modules.

In [None]:
# help(autocrit)

### Main Classes

These are the classes that are most useful for finding the critical points of simple neural network loss functions.

In [None]:
main_classes = [elem for elem in shortdir(autocrit) if elem[0].isupper()]

main_classes

#### `FastNewton{MR, TR}` and `GradientNormMinimizer`

These are the critical point-finding algorithms.

Each is a sub-class of `autocrit.finders.base.Finder`,
an abstract base class that handles basics like
logging.

See the help for details.

ACHTUNG: **The functions optimized by the `Finder` need to take _column vectors_ as inputs**.

In [None]:
issubclass(autocrit.FastNewtonMR, autocrit.finders.base.Finder)

In [None]:
print(autocrit.finders.base.__doc__)

In [None]:
# help(autocrit.finders.base)

#### `FullyConnectedNetwork`

This is the easiest way to specify a simple neural network.

Each layer must be fully connected and have the same hyperparameters (`has_biases`, `nonlinearity_str`).

For more general neural networks, see `autocrit.nn.networks`.

In [None]:
# help(autocrit.FullyConnectedNetwork)

#### `CritFinderExperiment` and `OptimizationExperiment`

These are `Experiments`, which know how to use `Finder`s or `Optimizer`s and save their results to files.

They also know how to both convert an `Experiment` to a `.json` file and how to recreate an `Experiment` from its `.json` file.
For more on how these are used, see the `scripts/` in `autocrit_tools`.

This functionality is only important for running lots of reproducible experiments
and tracking the results.
These classes are unnecessary for doing simple things
(for example, they aren't used in the tests of the `Optimizer`s or `Finder`s).

In [None]:
issubclass(autocrit.CritFinderExperiment, autocrit.experiments.Experiment)

In [None]:
print(autocrit.experiments.Experiment.__doc__)

The last bit you'll need is a way to define optimizers,
since optimization trajectories are often used as "seeds"
for critical point-finding methods.

See `optimizers` below.

### Modules

In [None]:
modules = [elem for elem in shortdir(autocrit) if not elem[0].isupper()]

modules

#### `defaults` 

Shared default values of all of the major numerical parameters.

In [None]:
shortdir(autocrit.defaults)

Uncomment the cell below for (terse) definitions.

They should point you to the place where the values are used.

In [None]:
# autocrit.defaults??

#### experiments

This is where `CritFinderExperiment` and `OptimizationExperiment` are defined. See discussion above.

#### nn

`nn` is the library for building `n`eural `n`etworks.

In [None]:
shortdir(nn)

It has somewhat the same style as the `Sequential` API in `pytorch`:
networks are made of `Layer`s, and the output of one `Layer` is the input to the next.

`Layer`s are defined inside `nn.layers`.

In [None]:
print(nn.layers.Layer.__doc__)

Aribtrary non-parameterized transformations are supported by a `LambdaLayer`,
but `Network`s containing a `LambdaLayer` can't be rebuilt,
so they're incompatible with `Experiment`s.

In [None]:
print(nn.layers.LambdaLayer.__doc__)

`Network`s are built from `Layer`s based on the `layer_spec` argument to a call to `Network()`. `layer_spec` can either be

1. A literal list of `Layer`s
2. A list of dictionaries, whose keys are `"type"` and `"params"`. `"type"` is the name of the layer type, as below. `"params"` is a dictionary used as the `kwargs` to the construction of the layer.

In [None]:
nn.layers._LAYERS

For example, a fully connected layer (`FCLayer`)
with four output nodes would be specified by

In [None]:
{"type": "fc",
 "params": {"out_nodes": 4}}

See the docstrings for `__init__` methods for details about the parameters.

In [None]:
print(nn.layers.FCLayer.__init__.__doc__)

When building networks by hand, it's usually easier to just build them directly with the `Layer` constructors.
This API is intended for use with rebuilding networks from their `.json` representation.

Data must be provided to the network as
a tuple of inputs and targets,
and is stored as an attribute
`network.data`,
which has attributes `data.x` and `data.y`
for inputs and targets.

The loss is calculated by a method called `.loss`,
which calculates the loss on the entire dataset.

In [None]:
nn.networks.Network.loss??

To do stochastic gradient descent,
you need to use `.loss_on_random_batch`.

Note that if the `batch_size` is not specified during creation of the network,
then it defaults to the entire dataset.

#### `optimizers`

`FirstOrderOptimizer`s (the only kind I ever got around to implementing)
use an `autograd` function `f` or optional `grad_f` calculator
to do first-order optimization.

In [None]:
print(autocrit.optimizers.__doc__)

In [None]:
print(autocrit.optimizers.FirstOrderOptimizer.__doc__)

The key method defined in the base class is `.run`:

In [None]:
autocrit.optimizers.FirstOrderOptimizer.run??

It punts on implementation of the algorithm to the concrete class,
which must implement a `.update`,
as in `GradientDescentOptimizer`.

In [None]:
autocrit.optimizers.GradientDescentOptimizer??

#### `finders` and `gradnormin`/`newtons`

`finders` contains the implementations of the `Finder` classes,
inside the submodules
`gradnormmin` and `newtons`,
which are also accessible from the top-level namespace.

The various Newton methods are defined by over-riding methods of a base class,
`NewtonMethod`,
with the inheritance structure below:

```
            NewtonBTLS - NewtonMR - FastNewtonMR
        /
NewtonMethod 
        \
            NewtonPI  - NewtonTR - FastNewtonTR
```

`BTLS` stands for "back-tracking line search" and `PI` stands for "pseudo-inverse".

The docs for `NewtonMethod` explain this well:

In [None]:
print(autocrit.finders.newtons.NewtonMethod.__doc__)

`gradnormmin.GradientNormMinimizer` makes use of the `FirstOrderMinimizer` classes,
but applies them to the squared gradient norm.