# Some notes taken for studying pytorch

## What:
It covers common patterns found in most deep learning libraries

## What are the building blocks in pytorch
- **Tensors**: equavilents of np.array, directly working with many operations provided in `torch` module
- **Variables**: think wrappers of Tensors - most layers in `nn` module would expect a Variable input than a Tensor. Compared to raw tensors, Variables remember the dependencies among them, and thus can do the autograd.
- **Operations**: `torch` module provides low level operations (most of them are applicable to tensors and variables). `nn` module provides high level layers (simliar to `tf.contrib.layers`), and most of them expects "Variables" than "Tensors".
- **Layers**: layers in `nn` function are functors (function objects with states). It expects a Variable input and returns a Variable output.
- **Models**: Models (derived from `nn.Module`) are essentially layers? Most importantly, it must implement `forward` and `backward` (auto implemented) methods for the bidirectional computation. Use `parameters()` to get all its parameters and `children()` to get all its layers/submodels. 
- **Sequential**: a container of different layers into a submodel. it is a container to help organize linearly stacked layers.
- **loss function**: it is just another variable with dependencies on all other variables. `loss.backward()` is usually the start of back-propagation.
- **optimizer**: the optimizer in pytorch is quite lightwight compared to other frameworks. It takes a list of parameters (usually from a call to `model.parameters()`) and do all the house-keeping - `zerograd()`, `step()` (do backward for each param one by one and update them) and etc.
- **dataset**: a dataset (`torchvision.datasets`) is conceptually a list (with `__getitem__()` and `__len__()` implementation. It is usually used preprocessing transforms (a pipe of multipe steps) and wrapped in a data loader.
- **dataloader** (`torch.utils.data.DataLoader`) is the "batch generator" with multiple thread support. It has an iterator interface, which can be directly used in a loop or with `iterator()`.


## How to use regularization

## How to use weights initialization

## How to use batch normalization
- nn.BatchNorm2d takes dim (# of neurons) as input
- Use `model.eval()` to switch to eval mode (so BN will use recorded moving mean/variance).  
- Batchnorm usually used before nonlinear activation

## How to group layers into blocks (e.g., for resnet)
- Use `Sequential` to group layers into blocks

## How to calculate image size in cnn with padding and kernel_size, and how to use the correct padding to implement “SAME” mode
- new_image_size = image_size + padding * 2 - kernel_size + 1.
- So to achieve “SAME” mode, choose a odd number as kernel_size, use padding = (kernel_size-1) // 2

## How to change “training/evaluation” mode of a model (e.g., for batch normalisation, or dropout)
- Under the hood, the status is kept as the boolean indicator of model in `training`, it can be switched by `model.train(True/False)`, and `model.eval()==model.train(False)`

## How to freeze certain variables (or parameters). Two ways to specify a “trainable” parameter
- passing a list of variables that you want to optimize (e.g., model.parameters()) to the constructor of the optimiser - you can choose to pass the input variables into optimiser as well in some case (e.g., deep dreams)
- An optimiser will only update variables that are explicitly passed to its constructors, and with `requires_grad` member = True. So turn off `requires_grad` will prevent optimiser to update a variable.