# `utils.py`

This notebook tests the `utils` module.

This module contains a miscellanea of functions which are not strictly necessary to FIGARO itself but can be useful while including FIGARO in a piece of code.

## Utilities

### `recursive_grid`

This method takes a 2D array of bounds and a 1D array containing the desired number of grid points for each dimension and returns a ND grid, where the number of dimension is inferred from bounds.\
The shape of the returned grid is `(prod(n_pts), len(bounds))`. This grid shape allows the output of `recursive_grid` to be directly passed to FIGARO.
Please note that `len(bounds)` must be equal or smaller than `len(n_pts)`.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from figaro.utils import recursive_grid

bounds = [[0,1],[0,1]]
n_pts  = [20,30]

grid, dgrid = recursive_grid(bounds, n_pts)

plt.scatter(grid[:,0], grid[:,1])

print(grid.shape)

Comparison with double for loop:

In [None]:
grid_check = []
for i in np.linspace(bounds[0][0], bounds[0][1], n_pts[0]+2)[1:-1]:
    for j in np.linspace(bounds[1][0], bounds[1][1], n_pts[1]+2)[1:-1]:
        grid_check.append([i,j])
        
np.alltrue(grid_check == grid)

4D grid:

In [None]:
bounds = [[0,1] for _ in range(4)]
n_pts  = [30 for _ in range(4)]

grid, dgrid = recursive_grid(bounds, n_pts)

print(grid.shape)

### `rejection_sampler`

This method implements a 1D rejection sampling algorithm. The probability density is passed as a callable, as well as the (eventual) selection function.

Gaussian distribution:

In [None]:
from figaro.utils import rejection_sampler
from scipy.stats import norm

n_samples = 10000
bounds = [-5,5]

x = np.linspace(bounds[0], bounds[1], 1000)

samples = rejection_sampler(n_samples, norm().pdf, bounds)

plt.hist(samples, bins = int(np.sqrt(len(samples))), histtype = 'step', density = True)
plt.plot(x, norm().pdf(x), lw = 0.7, color = 'r')

$f(x) = 2x$ with selection function $g(x) = 1-x$, $x \in[0,1]$:

In [None]:
from scipy.stats import uniform

def probability_density(x):
    return 2*x

def selfunc(x):
    return 1-x

n_samples = 10000
bounds = [0,1]

x  = np.linspace(bounds[0], bounds[1], 1000)
dx = x[1]-x[0]

samples = rejection_sampler(n_samples, probability_density, bounds, selfunc = selfunc)

plt.hist(samples, bins = int(np.sqrt(len(samples))), histtype = 'step', density = True)
pdf = probability_density(x)*selfunc(x)
plt.plot(x, pdf/np.sum(pdf*dx), lw = 0.7, color = 'r')

### `get_priors`

This method takes the prior parameters for the Normal-Inverse-Wishart distribution in the natural space and returns them as parameters in the probit space, ordered as required by FIGARO. In the following, $D$ will denote the dimensionality of the inferred distribution.

Four parameters are returned:
* $\nu$, here denoted by `df`, is the number of degrees of freedom for the Inverse Wishart distribution,. It must be greater than $D+1$. If this parameter is `None` or does not satisfy the condition $\nu > D+1$, the default value $D+2$ is used;
* $k$ is the scale parameter for the multivariate Normal distribution. Suggested values are $k \lesssim 10^{-1}$. If `None`, the default value $10^{-2}$ is used;
* $\mu$ is the mean of the multivariate Normal distribution. It can be either estimated from the available samples or passed directly as a 1D array with length $D$ (the keyword argument `mean` overrides the samples). If `None`, the default value 0 (corresponding to the parameter space center) is used;
* $\Lambda$ is the expected value for the Inverse Wishart distribution. This parameter can be either (in descending priority order):
    * passed as 2D array with shape ($D$,$D$), the covariance matrix - keyword `cov`;
    * passed as 1D array with shape ($D$,) or `double`: vector of standard deviations (if `double`, it assumes that the same std has to be used for all dimensions) - keyword `std`;
    * estimated from samples - keyword `samples`.
   
The order in which they are returned is $(k,\Lambda,\nu,\mu)$.\
A small fluctuation in $\Lambda$ for subsequent calls with same argument is expected and it due to the fact that transforming a covariance matrix in probit space is nontrivial. In order to simplify the process, we decided to sample $10^4$ points from a multivariate Gaussian centered in $\mu$ with the given covariance or std (still in natural space), transform the samples in probit space and use the covariance of the transformed samples as $\Lambda$: from this, the fluctuations.

Estimate from samples:

In [None]:
from figaro.utils import get_priors

bounds = np.array([[-10,10]])
samples = norm().rvs(1000)

get_priors(bounds, samples = samples)

User-defined parameters (overrides samples):

In [None]:
get_priors(bounds, 
           samples = samples, 
           mean = np.array([1]), 
           df = 10, 
           std = np.array([2]), 
           k = 1,
          )

Default parameters:

In [None]:
get_priors(bounds)

Same as above, with multiple dimensions:

In [None]:
from scipy.stats import multivariate_normal as mn
bounds = np.array([[-10,10],[-10,10]])

samples = mn(np.zeros(2), np.identity(2)).rvs(1000)

get_priors(bounds, samples = samples)

User-defined parameters:

In [None]:
get_priors(bounds, 
           samples = samples, 
           mean = np.array([1,1]), 
           df = 10, 
           std = np.array([2,2]), 
           k = 1,
          )

`cov` keyword overrides `std` keyword:

In [None]:
get_priors(bounds, 
           samples = samples, 
           mean = np.array([1,1]), 
           df = 10, 
           std = 
           np.array([2,2]), 
           k = 1, 
           cov = np.array([[4,-1],[-1,4]]),
          )