# Getting Started

### Installation

```bash
pip install score_models
```

## Neural network architectures

In [5]:
from score_models import NCSNpp, MLP

### 1D

- Shape of the input is `[B, C]`, where `B` is an unspecified batch size.
- `C` is the number of channels of the input vector.
- `units` is the width of the hidden layers.
- `layers` is the number of hidden layers, not counting the attention bottleneck.
- `attention`: if True, use the attention mechanism in the bottleneck of the MLP

In [None]:
net = MLP(C, units=100, layers=2, activation='silu', attention=True) # Example MLP

### Time-series, long sequences, etc.

- Shape of the input is `[B, C, L]`, where `L` is an unspecified sequence length. 
- `dimensions=1` is used to specify 1D CNN layers in the architecture. 
- `nf` is the base number of filters for the CNN layers.
- `ch_mult` is used to specify the number of levels in the U-net and also is used as a multiplicative factor for the number of filters. In the example below, the CNN layers of the first level have `1 x nf`, the second have `2  nf`, etc.
- `L` must be divisible by `2^len(ch_mult)`.

In [4]:
net = NCSNpp(C, dimensions=1, nf=64, ch_mult=(1, 2, 4), attention=True) # Example NCSN++

### Images

- Shape of the input is `[B, C, H, W]`
- Both `H` and `W` must be divisible by `2^len(ch_mult)`

In [6]:
net = NCSNpp(C, nf=128, ch_mult=(2, 2, 2, 2), attention=True) # Example NCSN++

### Cube

- Shape of the input is `[B, C, H, W, D]`
- `H`, `W` and `D` must be divisible by `2^len(ch_mult)`
- `dimensions=3` is used to specify3D CNN layers in the architecture.

In [None]:
net = NCSNpp(C, dimensions=3, nf=8, ch_mult=(1, 1, 2, 2), attention=True) # Example NCSN++

## Score-Based Model (SBM)

In [7]:
from score_models import ScoreModel, VPSDE

sde = VPSDE()
sbm = ScoreModel(net, sde)

### Training

- `dataset` must be an instance of PyTorch `Dataset` or `DataLoader`. See e.g. [this tutorial](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) to get started.
- If a `Dataset` instance is provided, we wrap it automatically with a `DataLoader`. Ideally, provide `batch_size` to the `fit` method in that case. 

In [None]:
sbm.fit(
    dataset,
    epochs=100,
    learning_rate=1e-3,
    batch_size=16,          # Only if dataset is not a torch DataLoader
    ema_decay=0.999,
    checkpoints_every=10,   # Save model every 10 epochs
    models_to_keep=1,       # Keep only the last model
    path='/path/to/checkpoints_directory',
    )

### Training on HPC Cluster (Upcoming)

Requires setting up the `milex-scheduler` with
```bash
milex-configuration
```
[Link to documentation]. The code below will schedule training on a cluster, possibly remote, through ssh. See [link] to configure the ssh protocol correctly.

In [None]:
sbm.scheduled_fit(
    dataset,
    epochs=100,
    learning_rate=1e-3,
    batch_size=16,          # Only if dataset is not a torch DataLoader
    ema_decay=0.999,
    checkpoints_every=10,   # Save model every 10 epochs
    models_to_keep=1,       # Keep only the last model
    path='/path/to/checkpoints_directory',
    time="03-00:00",        # Time allocation for the job (DD-HH:MM)
    gres="gpu:1",           # Number of GPUs to allocate
    machine="remote"
    )

You can also allocate more than one GPU accross multiple node for heavy duty training (upcoming). The following will request 4 nodes, each with 4 GPU (assuming your compute node support 4 GPU connected with NVLinks), for a total of 16 GPU training in data parallel mode. For this type of training, the dataset must be a Dataset instance, not a DataLoader. Also, you can specify the batch size to maximize the usage of all GPUs. 

In [None]:
sbm.scheduled_fit(
    ...
    batch_size=16*B,        # Increase the batch to be 16 times B, where B would be optimal for one GPU.
    node=4,                 # Number of nodes to allocate
    gres="gpu:4",           # Number of GPUs to allocate
    machine="remote"
    )

### Sampling

- `shape` of the input must be provided. `B` samples will be produced in parallel.
- `steps` specifies the discretization of the SDE. It can be increased to improve the sample quality.

In [None]:
sbm.sample(shape=(B, C, ...), steps=1000)

#### Heavy duty sampling (upcoming)

In [None]:
sbm.scheduled_sample(...) # Similar to scheduled fit

## Hessian Diagonal Models (HDM)

## Conditional SBM

## Guided Diffusion

## Parameter efficient fine-tuning of SBM's with LoRA