In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [2]:
import torch
from torch import Tensor, distributions

## Random sampling
As seen in part 01, PyTorch can generate tensors filled with random numbers samples according to basic distributions:

In [3]:
torch.rand(10,3)  # Sample 30 numbers from a uniform distribution between 0 and 1

tensor([[0.4722, 0.6718, 0.2287],
        [0.1896, 0.7300, 0.5554],
        [0.3913, 0.9610, 0.9941],
        [0.5253, 0.1720, 0.7818],
        [0.7470, 0.3337, 0.5323],
        [0.3313, 0.1358, 0.6493],
        [0.8164, 0.4990, 0.7478],
        [0.3392, 0.0306, 0.2842],
        [0.7723, 0.4456, 0.8774],
        [0.9072, 0.1643, 0.8128]])

In [4]:
torch.randn(8)  # Sample 8 numbers from a unit-Gaussian

tensor([-0.2394, -0.6538, -1.0392, -0.7268,  0.6277, -0.1179,  0.4908, -1.6001])

## Distribution classes
Sometimes we instead want to sample from more complex distributions, or to be able to treat distributions as objects with their own parameters and methods. 
`torch.distributions` contains a variety of such classes.

The majority of this example will use the `Normal` distribution, but check https://pytorch.org/docs/stable/distributions.html for more info

In [7]:
norm = distributions.Normal(loc=0,scale=1)  # scale here is the standard deviation

Ones instantiated, `Distribution`s have a variety of methods, e.g.:

In [12]:
norm.log_prob(Tensor([2]))  # evaluate the log PDF at x=2

tensor([-2.9189])

Normally, methods can take multi-element tensors, which results in the operation being broadcast across each element:

In [14]:
norm.log_prob(Tensor([-2,-1,0,1,2]))  # evaluate the log probability at multiple values

tensor([-2.9189, -1.4189, -0.9189, -1.4189, -2.9189])

In [15]:
norm.cdf(Tensor([-2,-1,0,1,2]))  # evaluate the cumulative probability

tensor([0.0228, 0.1587, 0.5000, 0.8413, 0.9772])

We can also randomly sample from the distribution by specifying the desired shape of the resulting tensor:

In [19]:
norm.sample([3,2])

tensor([[0.6697, 0.6439],
        [0.3628, 0.2887],
        [1.7138, 0.7125]])

## Parameterised distributions
Previously, we created a distribution using floats, but using tensors gives us a bit more flexibility

In [23]:
norm = distributions.Normal(loc=Tensor([[0,2],[-1,3]]),scale=Tensor([[1,1.5],[6,2]]))

Effectively, our `norm` now contains 4 different Gaussians with a specify shape, and methods will now return tensors with that shape

In [25]:
norm.log_prob(Tensor([2]))  # evaluate the log PDF of all 4 Gaussians at x=2

tensor([[-2.9189, -1.3244],
        [-2.8357, -1.7371]])

Multi-point evaluation, needs to be done such that the evaluation points can be reshaped automatically

In [38]:
norm.log_prob(Tensor([-2,-1,0,1,2])[:,None,None])  # evaluate the log PDF of all 4 Gaussians at several points

tensor([[[-2.9189, -4.8800],
         [-2.7246, -4.7371]],

        [[-1.4189, -3.3244],
         [-2.7107, -3.6121]],

        [[-0.9189, -2.2133],
         [-2.7246, -2.7371]],

        [[-1.4189, -1.5466],
         [-2.7663, -2.1121]],

        [[-2.9189, -1.3244],
         [-2.8357, -1.7371]]])

In [40]:
norm.log_prob(Tensor([[-2,-1],[1,2]]))  # evaluate the log PDF each Gaussian at a different specific point

tensor([[-2.9189, -3.3244],
        [-2.7663, -1.7371]])

### Parameter updates
When the distributions are initialised, the values of the tensors are not copied, instead the distribution is given a pointer to the tensor. This means that if the the value of the tensor changes, then the distribution will will also change accordingly:

In [66]:
loc = torch.tensor([0])
scale = torch.tensor([1])

In [67]:
norm = distributions.Normal(loc=loc,scale=scale)

In [68]:
norm.log_prob(Tensor([2]))

tensor([-2.9189])

Now let's change the parameters in-place

In [69]:
loc[0] = 3

In [70]:
norm.log_prob(Tensor([2]))

tensor([-1.4189])

In [71]:
scale *= 4

In [72]:
norm.log_prob(Tensor([2]))

tensor([-2.3365])

## Differentiable distributions
Most of the methods of a `Distribution` are differentiable, meaning that if the parameters of the distribution require gradient, the returned values will carry a gradient function

In [74]:
loc = torch.tensor([0.], requires_grad=True)
scale = torch.tensor([1.], requires_grad=True)

In [75]:
norm = distributions.Normal(loc=loc,scale=scale)

In [76]:
norm.log_prob(Tensor([2]))

tensor([-2.9189], grad_fn=<SubBackward0>)

In [77]:
norm.cdf(Tensor([0,1,2]))

tensor([0.5000, 0.8413, 0.9772], grad_fn=<MulBackward0>)

The exception is the `sample()` method:

In [78]:
norm.sample([2])

tensor([[1.1032],
        [1.8139]])

However some distributions can be re-parameterised such that the samples are differentiable, e.g. the Gaussian samples can be drawn as `(scale*z~N(0,1))+loc)`.
The `rsample` method will return differentiable samples, if that is possible for the distribution.

In [79]:
norm.rsample([2])

tensor([[-0.7684],
        [ 0.1469]], grad_fn=<AddBackward0>)