## Theanets
- [main site](http://theanets.readthedocs.org/en/stable/quickstart.html)
- [git repository](https://github.com/lmjohns3/theanets)

The library implementation and its documents are both elegant and detailed. Please read the online tutorial by [Leif Johnson](http://lmjohns3.com/) if you need to learn the package. I will just try to put some notes here from my learning experience

In [2]:
import theanets
import numpy as np

### main steps in typical workflow for theanets
1. Create a Network Layout
2. Train the Network on Data
3. Use the network to make prediction

### usage patterns for deep learning tasks
1. Most of time you need to pick or customize a `loss` function, it defines your problem, e.g., classification (cross entropy), regression (mse) or autoencoder
    - 1.1. `loss` function composes of `error` and `regularization`, it usually involves both the output of differnet layers and the weights of them.
    - 1.2 in theanets you customize `loss` and `error` by inheritating `theanets.feedforward.Network` and overwrite the loss or err function. Using `find` method is the recommended way of getting parameters.
    - details: the `error` function only utilizes the outputs of different layers by default. And in the `loss` function, the built-in regularizaion include `l1/l2 norm for weights`, `l1/l2 norm for all hiddens`, and `contractive - Frobenius norm of hidden Jacobian`. The `setup_vars` method defines the variables needed to calculate the error.
2. The second part usually involves defining layout of `layers` in the network, e.g., their # of neurons, connections, activations, weight initializations, noise mechanism for outputs. It essentially defines a function mapping from inputs to outputs, with assistance from parameters.
    - 2.1 As common pratice, most of time you need to re-use pre-defined layers, with different number of neurons or activation functions.
    - 2.2 Sometime when you need to totally re-define the input-output mapping, you need to override the `transform` method (or `output` method if you need things other than dropout or noise), and register all the used parameters in `setup` method by calling factory method like `add_weights` or `add_bias`.
3. You also need to specify the trianing algorithm. Most of time you will choose different hyperparameters e.g., batch_size, learning_rate, momentum and etc, instead of inventing your own optimizing method (because it is hard to come up with good ones).
    - 3.1 common scinarios include (1) supervised training of the whole network (2) layer-wise unsupervised training of hidden layers and fine-turning the last one
    - 3.2 batch_size is an parameter to the trainer
    
### summary of theanets (v 0.5.3) parameters 
Some of them are hidden in the code, so if you are interested in the details, read the code! It is definitely worth the time. Most of the parameters are used as the arguments to **`theanets.Experiment` constructer** or its `train/itertrain` method

1. Construction of the model
    - `network_class`: e.g., theanets.Classifier, specify the problem type, specially the `loss` and `error` function
    - `layers`: list of layers
    - `save_progress`: filename, if present, the constructor will restore the saved model
2. training algorithm (can be used in sequence to approximate simulated annealing)
    - `optimize`: string for optimize method
    - `learning_rate`:
    - `momentum`: 
    - `save_progress`: filename, where the model will be saved during training
    - `save_every`:  +10 (every 10 trairning iterations), -10 (every 10 mins)
    - `weight_l1`: float, l1-norm regluarization for weights in all layers
    - `weight_l2`: float, l2-norm regluarization for weights in all layers
    - `hidden_l1`: float, l1-norm regluarization for outputs of all hidden layers
    - `hidden_l2`: float, l2-norm regluarization for outputs of all hidden layers
    - `input_noise`: std for added gaussian noise to the input layer (parameterless/activationless)
    - `input_dropout`: [0,0.1] as proportion for zero-outed inputs
    - `batch_size`: size for mini-batch based optimization, default 32