New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second-order score gradients #875
Comments
Yes, there is! I was just chatting with Jakob Foerster last week about getting DiCE (https://arxiv.org/abs/1802.05098) in Edward. I don't know his github—ccing @alshedivat, @rockt who also worked on it. Contributions are welcome. |
@dustinvtran Ah, did see DICE when it came out, and I looked it over again this friday hoping that it would solve my problem in an instant, but I think this might be a different problem? With DICE the goal is to build unbiased estimators of higher order derivatives, while here, the goal is to take the derivative of an existing first-order estimator. I can see how my title might be a tad misleading in that respect. |
Right, it depends on what you're taking derivatives of—exact first-order gradients (which DiCe solves) or the first-order gradient estimator. For the latter, have you seen Edward2's https://github.com/blei-lab/edward/blob/feature/2.0/edward/inferences/klqp.py#L36 |
That's a slightly dense implementation, might need a few pointers. Is the idea to have the surrogate_loss do an implicit edit: For the record, using |
Yes, so DiCE let's you define an objective such that the gradient of the objective is an estimator of the gradient. This holds for arbitrary orders of derivatives, so you don't have to worry about how to differentiate the estimator. I think I understand your use case though and I agree that it's not obvious that DiCE would solve this out of the box. |
This pyro DiCE implementation & use_case might be informative: |
Thanks for the pointer!
it would appear that they don't have a clever way of vectorizing over samples. |
This is admittedly an esoteric issue.
The score gradient is cleverly implemented in Edward with the step
ensuring that the result is of the (pseudo-code) form
mean(score*losses)
.Now say that I want to define an operation which takes the score gradient an an input. If I try to take the gradient of this derived expression, the result will be wrong due to the
stop_gradient
op. Is there a clever idiomatic way to define the score gradient without compromising its derivative?Note that the score could be computed prior to taking the product with
losses
, but since Tensorflow can only compute derivatives of scalar quantities, this would involve unstacking and looping overq_log_prob
.Thinking it over, maybe the simplest way to attack the problem is to use graph modification to swap the
tf.stop_gradient(losses)
node with a purelosses
?edit: for the ones who don't find this an interesting intellectual pursuit in and of itself, I can note that this becomes fairly relevant if one wants to calculate the variance gradient for the REBAR and RELAX estimators used in discrete variational approximations.
The text was updated successfully, but these errors were encountered: