Docs: add optimizer parameter to model page

cabralpinto · Sep 17, 2023 · 364bd42 · 364bd42
1 parent cb9a508
commit 364bd42
Show file tree

Hide file tree

Showing 5 changed files with 10 additions and 5 deletions.
diff --git a/docs/package-lock.json b/docs/package-lock.json
diff --git a/docs/src/pages/guides/custom-modules.mdx b/docs/src/pages/guides/custom-modules.mdx
@@ -152,7 +152,7 @@ The `schedule` method precomputes `alpha` and `delta` (cumulative product of `al
 
 ## Denoiser neural network
 
-Modular Diffusion comes with general-use `UNet` and `Transformer` classes, which have proven to be effective denoising networks in the context of Diffusion Models. However, it is not uncommon to see authors make modifications to these networks to achieve even better results. To design your own original network, extend the base abstract `Net` class. This class acts as only a thin wrapper over the standard PyTorch `nn.Module` class, meaning you can use it exactly the same way. The `forward` method should take three tensor arguments: the noisy input `x`, the conditioning matrix `y`, and the diffusion time steps `t`.
+Modular Diffusion comes with general-use `UNet` and `Transformer` classes, which have proven to be effective denoising networks in the context of Diffusion Models. However, it is not uncommon to see authors make modifications to these networks to achieve even better results. To design your own original network, extend the base abstract `Net` class. This class acts as only a thin wrapper over the standard Pytorch `nn.Module` class, meaning you can use it exactly the same way. The `forward` method should take three tensor arguments: the noisy input `x`, the conditioning matrix `y`, and the diffusion time steps `t`.
 
 > Network output shape
 >

diff --git a/docs/src/pages/guides/getting-started.mdx b/docs/src/pages/guides/getting-started.mdx
@@ -20,11 +20,11 @@ Before you start, please install Modular Diffusion in your local Python environm
 python -m pip install modular-diffusion
 ```
 
-Additionally, ensure you've installed the correct [PyTorch distribution](https://pytorch.org/get-started/locally/) for your system.
+Additionally, ensure you've installed the correct [Pytorch distribution](https://pytorch.org/get-started/locally/) for your system.
 
 ## Train a simple model
 
-The first step before training a Diffusion Model is to load your dataset. In this example, we will be using [MNIST](http://yann.lecun.com/exdb/mnist/), which includes 70,000 grayscale images of handwritten digits, and is a great simple dataset to prototype your image models. We are going to load MNIST with [PyTorch Vision](https://pytorch.org/vision/stable/index.html), but you can load your dataset any way you like, as long as it results in a `torch.Tensor` object. We are also going to discard the labels and scale the data to the commonly used $[-1, 1]$ range.
+The first step before training a Diffusion Model is to load your dataset. In this example, we will be using [MNIST](http://yann.lecun.com/exdb/mnist/), which includes 70,000 grayscale images of handwritten digits, and is a great simple dataset to prototype your image models. We are going to load MNIST with [Pytorch Vision](https://pytorch.org/vision/stable/index.html), but you can load your dataset any way you like, as long as it results in a `torch.Tensor` object. We are also going to discard the labels and scale the data to the commonly used $[-1, 1]$ range.
 
 ```python
 import torch

diff --git a/docs/src/pages/modules/denoising-network.mdx b/docs/src/pages/modules/denoising-network.mdx
@@ -7,7 +7,7 @@ visualizations: maybe
 
 # {frontmatter.title}
 
-The backbone of Diffusion Models is a denoising network, which is trained to gradually denoise data. While earlier works used a **U-Net** architecture, newer research has shown that **Transformers** can be used to achieve comparable or superior results. Modular Diffusion ships with both types of denoising network. Both are implemented in PyTorch and thinly wrapped in a `Net` module.
+The backbone of Diffusion Models is a denoising network, which is trained to gradually denoise data. While earlier works used a **U-Net** architecture, newer research has shown that **Transformers** can be used to achieve comparable or superior results. Modular Diffusion ships with both types of denoising network. Both are implemented in Pytorch and thinly wrapped in a `Net` module.
 
 > Future warning
 >

diff --git a/docs/src/pages/modules/diffusion-model.mdx b/docs/src/pages/modules/diffusion-model.mdx
@@ -16,6 +16,7 @@ In Modular Diffusion, the `Model` class is a high-level interface that allows yo
 - `net` -> Denoising network module.
 - `loss` -> Loss function module.
 - `guidance` (Default: `None`) -> Optional guidance module.
+- `optimizer` (Default: `partial(Adam, lr=1e-4)`) -> Pytorch optimizer constructor function.
 - `device` (Default: `"cpu"`) -> Device to train the model on.
 - `compile` (Default: `true`) -> Whether to compile the model with `torch.compile` for faster training.
 
@@ -28,6 +29,8 @@ from diffusion.loss import Simple
 from diffusion.net import UNet
 from diffusion.noise import Gaussian
 from diffusion.schedule import Cosine
+from torch.optim import AdamW
+from functools import partial
 
 model = diffusion.Model(
     data=Identity(x, y, batch=128, shuffle=True),
@@ -36,6 +39,7 @@ model = diffusion.Model(
     net=UNet(channels=(1, 64, 128, 256), labels=10),
     loss=Simple(parameter="epsilon"),
     guidance=ClassifierFree(dropout=0.1, strength=2),
+    optimizer=partial(AdamW, lr=3e-4),
     device="cuda" if torch.cuda.is_available() else "cpu",
 )
 ```