New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
think about what it means to "default" to reparameterization gradient #38
Comments
@mariru is working on MAP, which is another case where we don't necessarily need this score vs reparam dichotomy. We also need to think about how the class should later incorporate sampling methods (e.g., do we just treat is an "optimization"?). |
how about having a hierarchical |
You mean for specifying the inference method? E.g., |
hmm. now that i think about it, i'm not sure. perhaps we have some sort of added hierarchy within i don't know how to communicate this so bear with me:
so the reparam/score loss stuff happens at the does that make sense? |
I like the ASCII! This makes sense. I would also put optimization inside variational. |
optimization with the score function estimator? is that useful? On Sun, Mar 13, 2016 at 12:10 PM, Dustin Tran notifications@github.com
|
For example, MAP (and by extension, MLE) is variational inference with a point mass variational family. This is how Maja is currently implementing it. |
what does sampling from a point mass mean? the way i view it: variational inference in this library is basically (by MAP and MLE does not need to be based on stochastic optimization. so (i could be missing something here.) On Sun, Mar 13, 2016 at 2:40 PM, Dustin Tran notifications@github.com
|
"sampling" for the point mass means simply returning its value. If you checkout branch feature/map I implemented a variational family PMGaussian for modeling unconstrained parameters using a point estimate. It should probably get a better name. But I wanted to make the distinction that like MFGaussian the transform for the mean parameter is the identity. So I think it can be useful to have run() in the variational/optimization parent class but then have methods within run() that get overwritten by the child classes: e.g. call build_loss() within run() in the parent class and then overwrite build_loss() in the child class to call one of build_score_loss() or build_reparam_loss() or build_"other"_loss(). These method specific loss functions can be implemented in the parent class or if a modification is needed they can also be overwritten for a specific inference method. |
Yup that's a great idea. So right now, |
so what's the full spec here? and what would be the best way of making this change? (we should be considerate of stuff happening in other branches.) |
class Inference:
def __init__(self, model, data):
class MonteCarlo(Inference):
def __init__(self, *args, **kwargs):
Inference.__init__(self, *args, **kwargs)
# not sure what will go here
class VariationalInference(Inference):
def __init__(self, model, variational, data):
Inference.__init__(self, model, data)
self.variational = variational
def run():
def initialize():
def update():
def build_loss():
def print_progress():
class MFVI(VariationalInference):
def __init__(self, *args, **kwargs):
VariationalInference.__init__(self, *args, **kwargs)
def build_loss():
if ...:
return build_score_loss()
else:
return build_reparam_loss()
def build_score_loss():
def build_reparam_loss():
class KLpq(VariationalInference):
def __init__(self, *args, **kwargs):
VariationalInference.__init__(self, *args, **kwargs)
def build_loss():
class MAP(VariationalInference):
def __init__(self, model, data):
variational = PointMass(...)
VariationalInference.__init__(self, model, variational,data)
def build_loss(): |
As for how to implement this, I suggest we do this broad refactor as early stage as possible to avoid incurring debt. So we write this in a branch and then individually deal with any merge conflicts to each branch once the pull request is made. |
very nice. wouldn't it be more flexible to have class MAP(Inference): again, i'm not entirely following why we want to go with this |
By doing variational inference with a pointmass, you are reusing the On Mon, Mar 14, 2016 at 2:06 PM Alp Kucukelbir notifications@github.com
|
Broadly, I see inference derived from two paradigms: optimization (variational inference) and sampling (Monte Carlo methods). The reason to include techniques such as MLE, MAP, MML, and MPO as part of the variational inference class is for two reasons:
|
hmm. not to be pedantic here, but i don't think i agree with either point.
a broader point of 1 is i guess this: did we decide to frame i also didn't follow some of maja's comments. perhaps this is easier to On Mon, Mar 14, 2016 at 2:33 PM, Dustin Tran notifications@github.com
|
Well, let's agree to disagree then. :) MPO: marginal posterior optimization All optimization methods default to gradient descent (data subsampling is optional). latent variable sampling is currently used, e.g., in MFVI and KLpq, but it's not a necessary distinction. for example, we ideally would have coordinate ascent MFVI if someone wrote down a exponential family graphical model with VIBES-like metadata. (@heywhoah and I are interested in this.) |
agree to disagree? what kind of strange proposal is that? :) let's chat in person. i think i'm missing some things here. ( e.g. preferring coordinate ascent? much strangeness abound :) ) |
I wrote it in the MAP branch. Here's what it looks like: https://github.com/Blei-Lab/blackbox/blob/af3f0528fd116be3dbcfc6d3871ac9119648abce/blackbox/inferences.py |
nice work! (i'm not saying that what you and maja propose won't work btw.) okay, let's discuss today if you both ( @dustinvtran @mariru ) are around! |
we currently default to the reparameterization gradient if the
Variational
class implementsreparam
however, if the
Inference
class does not support reparameterization gradients (e.g.KLpq
) then it doesn't matter whether theVariational
class implements it or not.The text was updated successfully, but these errors were encountered: