New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve capabilities for non-Gaussian likelihoods #1631
base: develop
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1631 +/- ##
===========================================
- Coverage 97.02% 95.53% -1.50%
===========================================
Files 92 86 -6
Lines 4407 3895 -512
===========================================
- Hits 4276 3721 -555
- Misses 131 174 +43
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work - I would be happy to incorporate this feature in GPflow. I left you a few questions and remarks.
# y_lo_lo, y_lo, y_hi, y_hi_hi = np.quantile(samples, q=(0.025, 0.159, 0.841, 0.975), axis=0) | ||
# Note how, contrary to the binary classification case, here we get the percentiles directly from the | ||
# conditional output distribution | ||
y_lo_lo, y_lo, y_hi, y_hi_hi = y_dist.y_percentile(p=(2.5, 15.9, 84.1, 97.5), num_samples=10_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: it's a bit annoying that numpy
uses fractions of 1 and we use percentiles in our API. I guess this is because of TFP's API?
# %% [markdown] | ||
# ## The conditional output distribution | ||
# | ||
# Here we show how to get the conditional output distribution and plot samples from it. In order to plot the uncertainty associated with that, we also get the percentiles from a sample of the corresponding likelihood parameter values, because the those of the output values are binary and would neither be convenient nor interesting to plot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what is meant with the second bit of the sentence:
, because the those of the output values are binary and would neither be convenient nor interesting to plot
gpflow/likelihoods/base.py
Outdated
@@ -294,6 +295,86 @@ def variational_expectations(self, Fmu, Fvar, Y): | |||
def _variational_expectations(self, Fmu, Fvar, Y): | |||
raise NotImplementedError | |||
|
|||
# @abc.abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we delete commented out code?
""" | ||
return self.likelihood.predict_log_density(self.f_mean, self.f_var, Y) | ||
|
||
def sample(self, num_samples: int = 1000) -> tf.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: this seems like quite a large default value. Any reason for this?
gpflow/likelihoods/base.py
Outdated
""" | ||
y_samples = self.sample(num_samples) | ||
|
||
return tfp.stats.percentile(y_samples, q=p, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add the expected return shapes.
@@ -120,3 +120,9 @@ def conditional_distribution(Fs) -> tfp.distributions.Distribution: | |||
super().__init__( | |||
latent_dim=2, conditional_distribution=conditional_distribution, **kwargs, | |||
) | |||
|
|||
def conditional_parameters(self, F): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is quite an important method, would it be possible to add docstrings?
def conditional_parameters(self, F): | ||
return self._conditional_mean(F), self._conditional_variance(F) | ||
|
||
def conditional_sample(self, F): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is quite an important method, would it be possible to add docstrings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we are not converting all the likelihoods (rightfully so) I think it is important that we explain/document the code as good as possible so that people doing the work on other likelihoods know what to change/implement.
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>
Rearranging and updating notebooks with more user-friendly descriptions.
PR type: bugfix / enhancement / new feature / doc improvement
Ehancement
Related issue(s)/PRs:
#1345
Summary
This PR proposes updates to the API in order to better represent the situation when dealing with non-Gaussian likelihood.
It introduces a conditional likelihood which is then used in a model to get the conditional output distribution. All model relevant
calls (e.g. predict_y, predict_mean_and_var, predict_log_density) broadcast to the corresponding methods in this particular likelihood class.
In order to show the potential of this approach, two likelihoods are changed: Bernoulli and Heteroskedastic. The corresponding
notebooks (classification and heteroskedastic) are changed accordingly in order to show how to plot the conditional output
distribution using the new interface. Note how in the binary classification case a new notebook has been introduced (classification-redesigned).
This PR is a proposal and it's left open for discussion and feedback. This is by no means to be considered as complete.