Clear and efficient sampling from customisable frugal causal models.
Clone the repo and run
pip install git+https://github.com/12kleingordon34/numpyro.git@master#egg=numpyro
pip install -e .
pip install -r requirements.txt
to install the package and its dependencies.
The core of the package is the frugalCopyla.CopulaModel
class, which takes in a dictionary specifying a frugal causal model and allows you to draw data from that distribution. The input provided to the class (i.e. the specification of the model) is flexible and straightforward, and does not require you to program a custom model/sampling routine in the numpyro
/jax
backend.
See example of use here.
The input should be a dictionary whose keys label the variables in your model. For each of these, specify in a sub-dictionary:
dist
: The distribution the variable is drawn from. These must be selected fromnumpyro.distributions
formula
: For each parameter in the chosen distribution, specify its linear model only using variables defined earlier in the dictionary. The names of the correct parameters can be found by either searching thenumpyro
documentation or looking at thearg_constraints
of the distribution by running (using the Normal as an example) :
> numpyro.distributions.Normal.arg_constraints
{'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}
-
params
(name will most likely change): Specifies the linear coefficients used to generate the primary variable through the linear model. A set of coefficients must be provided for each parameter. Note that the labelling of parameters (e.g.'formula': {'rate': 'X ~ 1 + Z + A'}, 'params': {'rate': {'x_0': 0., 'x_1': 2., 'x_2': 1}}
) does not affect the linear model. Only the order of the specification matters. For example,x_0
will be the coefficient of the first variable in the formula (always the intercept) andx_2
will always be the last. -
link
allows the user to provide an inverse link function for each of the linear formulas. For example, the command'X': {'dist': dist.Exponential, 'formula': {'rate': 'X ~ 1 + Z + A'}, 'params': {'rate': {'x_0': 0., 'x_1': 2., 'x_2': 1}}, 'link': {'rate': jnp.exp}},
will wrap the linear predictor in an exponential function such that the probabilistic model is$$X \sim \text{Exponential}(\lambda=\exp(2Z + A)).$$ Note that the link function must have ajax
base. If no inverse link function is require, leave it asNone
.- Additionally, note that the order of the floats within
'params'
will automatically be the the order they are multiplied to the ordered variables in'rate'
. The intercept term will always be the first term in the parsed formula.
- Additionally, note that the order of the floats within
-
copula
: To specify a copula, first choose a'class'
of copula from frugalCopyla/copula_lpdfs.py. The copula functions will take in keyword arguments to calculate the log-likelihood of the copula factor.- Under
vars
, provide a mapping of the variables linked by the copula and the function arguments using a dictionary. For example, themultivar_gaussian_copula_lpdf(vars, rhos)
factor takes two variables: a dictionary of random variables (vars
) and a dictionary of the copula correlation matrix elements (rhos
). If we wish to simulate a copula betweenZ
andY
, providevars
the dictionary..., 'vars': {'u': 'Z', 'v': 'Y'}
. - Under
'formula'
, specify the form of the linear predictor for the parameters passed to the copula. The coefficients for the linear predictor are specified under'params'
. - Similarly to the other inputs, an inverse link function can be chosen to wrap the linear predictor specified in
'formula'
and'params'
.
- Under
To check whether the model has been parsed correctly, you can check the parsed_model
property of a CopulaModel
class:
...
>>> cop_mod = CopulaModel(input_dict)
>>> cop_mod.parsed_model
{'Z': {'dist': numpyro.distributions.continuous.Normal,
'formula': {'loc': 'A ~ 1', 'scale': 'A ~ 1'},
'coeffs': {'loc': [0.0], 'scale': [1.0]},
'link': {},
'linear_predictor': {'loc': '0.0', 'scale': '1.0'}},
'X': {'dist': numpyro.distributions.continuous.Normal,
'formula': {'loc': "X ~ 1 + record_dict['Z']", 'scale': 'X ~ 1'},
'coeffs': {'loc': [0.0, 0.5], 'scale': [1.0]},
'link': {},
'linear_predictor': {'loc': "0.0 + 0.5 * record_dict['Z']", 'scale': '1.0'}},
'Y': {'dist': numpyro.distributions.continuous.Normal,
'formula': {'loc': "Y ~ 1 + record_dict['X']", 'scale': 'Y ~ 1'},
'coeffs': {'loc': [0.0, 0.5], 'scale': [0.5]},
'link': {'loc': None},
'linear_predictor': {'loc': "0.0 + 0.5 * record_dict['X']", 'scale': '0.5'}}}
Note that the linear_predictor
field shows the form of the linear predictor for a distribution's parameter (before being passed to the inverse link function).
Consider the following input:
import numpyro.distributions as dist
import jax
import jax.numpy as jnp
import frugalCopyla
input_dict = {
'Z': {'dist': dist.Normal, 'formula': {'loc': 'Z ~ 1', 'scale': 'Z ~ 1'}, 'coeffs': {'loc': [0.], 'scale': [1.]}, 'link': None},
'X': {'dist': dist.Exponential, 'formula': {'rate': 'X ~ Z'}, 'coeffs': {'rate': [1., 1.]}, 'link': {'rate': jnp.exp}},
'Y': {'dist': dist.Normal, 'formula': {'loc': 'Y ~ X', 'scale': 'Y ~ 1'}, 'coeffs': {'loc': [-0.5, 1.], 'scale': [1.]}, 'link': None},
'copula': {
'class': frugalCopyla.copula_lpdfs.multivar_gaussian_copula_lpdf,
'vars': ['Z', 'Y'],
'formula': {'rho': 'c ~ Z'},
'coeffs': {'rho': [1., 0.]},
'link': {'rho': jax.nn.sigmoid}
}
}
model = frugalCopyla.model.CopulaModel(input_dict)
data = model.simulate_data(num_warmup=1000, num_samples=1000, joint_status='continuous')
which allows one to simulate from the following causal model:
Note that the joint_status
field requires you to specify whether you are sampling from a fully continuous
, discrete
, or mixed
distribution. The model.simulate_data
function infact allows you to specify:
num_warmup
: The number of MCMC warmup stepsnum_samples
: The number of MCMC sampling stepsjoint_status
: The type of variables within the frugal modelnum_chains
: The number of chains to sample withseed
: The seed for the sampler. If unspecified, the sampler outcome will vary across successive runs.
In principle, frugalCopyla
can accomodate any copula framework. These can be found in copula_lpdfs.py
.
Multivariate Gaussian and Multivariate Student-T copulas are currently implemented. To parameterise the copula:
...
'copula': {
'class': <copula-log-likelihood-function-found-in-copula_lpdfs.py>,
'vars': <the-variables-correlated-via-the-copula>,
'formula': <the-linear-formula-parameterising-the-copula-covariances>,
'coeffs': <coefficients-for-each-formula>,
'link': <link-function-for-each-copula-parameter-(can-be-none)>,
'misc': <any-additional-copula-variables>
}
The 'misc'
field is not used for Gaussian copulas. However, the Degrees of Freedom for a Student T (parameterised as df
) can be entered here. For example:
...,
'copula': {
'class': copula_lpdfs.multivar_studentt_copula_lpdf,
'vars': ['Z1', 'Z2', 'Y'],
'formula': {'rho_Z1_Z2': 'cop ~ 1', 'rho_Z1_Y': 'cop ~ 1','rho_Z2_Y': 'cop ~ 1'},
'coeffs': {'rho_Z1_Z2': [0.0], 'rho_Z1_Y': [0.5],'rho_Z2_Y': [0.2]},
'link': {'rho_Z1_Z2': None, 'rho_Z1_Y': None,'rho_Z2_Y': None},
'misc': {'df': 2.}
}
}