Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorize calls to distribution log densities #47

Closed
dustinvtran opened this issue Mar 12, 2016 · 0 comments
Closed

vectorize calls to distribution log densities #47

dustinvtran opened this issue Mar 12, 2016 · 0 comments

Comments

@dustinvtran
Copy link
Member

dustinvtran commented Mar 12, 2016

Consider a B x d array of zs, where a row corresponds to one sample of a d-dimensional latent variable, and we have a mini-batch of size B.

Univariate Distributions
For mean-field methods, we'd like to do something like call bernoulli.logpmf(zs[:, i], p), where p is a scalar in [0,1]. This returns a B-dimensional vector,

[ log Bernoulli(zs[1, i] | p), ..., log Bernoulli(zs[B, i] | p) ]^T

For a univariate distribution, it takes a B-dimensional input and returns a B-dimensional output.

Multivariate Distributions
Consider a d-dimensional multivariate Gaussian. We call multivariate_normal.logpdf(zs.transpose(), mu, Sigma), where mu is d-dimensional, Sigma is d x d, and it returns a B-dimensional vector

[ log Normal(zs[1, :] | mu, Sigma), ..., log Normal(zs[B, :] | mu, Sigma) ]^T

For a d-dimensional distribution, it takes a B x d matrix of inputs and returns a B-dimensional output.

SciPy does this too!

from scipy import stats

#4-d vector input, univariate normal
stats.norm.logpdf([0.0, 1.0, 1.0, 2.0], loc=0, scale=1)
## array([-0.91893853, -1.41893853, -1.41893853, -2.91893853])

#4 x 2 matrix input, 2-d normal
stats.multivariate_normal.logpdf(
    np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0]]), 
    mean=np.zeros(2), cov=np.diag(np.ones(2)))
## array([ -1.83787707,  -2.83787707,  -5.83787707, -10.83787707])

Higher-dimensional arguments
We can also consider something like bernoulli.logpmf(zs[:, i], ps), where not only is zs[:, i] a M-dimensional vector but ps is also a M-dimensional vector (in [0,1]^d). I propose not doing this. This is bound to lead to bugs. Any time this comes up, I propose we do individual calls, bernoulli.logpmf(zs[1, i], ps[i]) and so on.

(I don't know a situation where this comes up enough that vectorizing this computation is crucial. If we notice this we can make the change. I don't think SciPy allows this either.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants