Improve capabilities for non-Gaussian likelihoods #1631

avullo · 2021-01-18T15:15:44Z

PR type: bugfix / enhancement / new feature / doc improvement
Ehancement

Related issue(s)/PRs:
#1345

Summary

This PR proposes updates to the API in order to better represent the situation when dealing with non-Gaussian likelihood.
It introduces a conditional likelihood which is then used in a model to get the conditional output distribution. All model relevant
calls (e.g. predict_y, predict_mean_and_var, predict_log_density) broadcast to the corresponding methods in this particular likelihood class.
In order to show the potential of this approach, two likelihoods are changed: Bernoulli and Heteroskedastic. The corresponding
notebooks (classification and heteroskedastic) are changed accordingly in order to show how to plot the conditional output
distribution using the new interface. Note how in the binary classification case a new notebook has been introduced (classification-redesigned).

This PR is a proposal and it's left open for discussion and feedback. This is by no means to be considered as complete.

…kelihoods.

…ihood

…entile

…s well.

codecov · 2021-01-18T15:45:05Z

Codecov Report

Merging #1631 (5c6db88) into develop (ed6acf5) will decrease coverage by 1.49%.
The diff coverage is 68.88%.

❗ Current head 5c6db88 differs from pull request most recent head 458c6a6. Consider uploading reports for the commit 458c6a6 to get more accurate results

@@             Coverage Diff             @@
##           develop    #1631      +/-   ##
===========================================
- Coverage    97.02%   95.53%   -1.50%     
===========================================
  Files           92       86       -6     
  Lines         4407     3895     -512     
===========================================
- Hits          4276     3721     -555     
- Misses         131      174      +43

Impacted Files	Coverage Δ
gpflow/likelihoods/__init__.py	`100.00% <ø> (ø)`
gpflow/likelihoods/multilatent.py	`94.11% <50.00%> (-5.89%)`	⬇️
gpflow/likelihoods/scalar_discrete.py	`93.24% <60.00%> (-2.41%)`	⬇️
gpflow/likelihoods/base.py	`92.85% <64.28%> (-4.79%)`	⬇️
gpflow/models/model.py	`98.55% <100.00%> (+0.06%)`	⬆️
gpflow/optimizers/scipy.py	`85.52% <0.00%> (-11.95%)`	⬇️
gpflow/conditionals/multioutput/conditionals.py	`88.23% <0.00%> (-11.77%)`	⬇️
gpflow/conditionals/util.py	`90.00% <0.00%> (-10.00%)`	⬇️
gpflow/covariances/multioutput/kufs.py	`90.32% <0.00%> (-9.68%)`	⬇️
gpflow/base.py	`91.83% <0.00%> (-0.62%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed6acf5...458c6a6. Read the comment docs.

vdutor

Very nice work - I would be happy to incorporate this feature in GPflow. I left you a few questions and remarks.

doc/source/notebooks/advanced/heteroskedastic.pct.py

vdutor · 2021-02-09T12:23:39Z

doc/source/notebooks/advanced/heteroskedastic.pct.py

+# y_lo_lo, y_lo, y_hi, y_hi_hi = np.quantile(samples, q=(0.025, 0.159, 0.841, 0.975), axis=0)
+# Note how, contrary to the binary classification case, here we get the percentiles directly from the
+# conditional output distribution
+y_lo_lo, y_lo, y_hi, y_hi_hi = y_dist.y_percentile(p=(2.5, 15.9, 84.1, 97.5), num_samples=10_000)


Q: it's a bit annoying that numpy uses fractions of 1 and we use percentiles in our API. I guess this is because of TFP's API?

vdutor · 2021-02-09T12:26:55Z

doc/source/notebooks/basics/classification-redesigned.pct.py

+# %% [markdown]
+# ## The conditional output distribution
+#
+# Here we show how to get the conditional output distribution and plot samples from it. In order to plot the uncertainty associated with that, we also get the percentiles from a sample of the corresponding likelihood parameter values, because the those of the output values are binary and would neither be convenient nor interesting to plot.


I'm not sure what is meant with the second bit of the sentence:

, because the those of the output values are binary and would neither be convenient nor interesting to plot

vdutor · 2021-02-09T12:28:13Z

gpflow/likelihoods/base.py

@@ -294,6 +295,86 @@ def variational_expectations(self, Fmu, Fvar, Y):
    def _variational_expectations(self, Fmu, Fvar, Y):
        raise NotImplementedError

+    # @abc.abstractmethod


can we delete commented out code?

gpflow/likelihoods/base.py

vdutor · 2021-02-09T12:42:29Z

gpflow/likelihoods/base.py

+        """
+        return self.likelihood.predict_log_density(self.f_mean, self.f_var, Y)
+
+    def sample(self, num_samples: int = 1000) -> tf.Tensor:


Q: this seems like quite a large default value. Any reason for this?

gpflow/likelihoods/base.py

vdutor · 2021-02-09T12:48:15Z

gpflow/likelihoods/base.py

+        """
+        y_samples = self.sample(num_samples)
+
+        return tfp.stats.percentile(y_samples, q=p, axis=0)


It would be nice to add the expected return shapes.

vdutor · 2021-02-09T12:48:56Z

gpflow/likelihoods/multilatent.py

@@ -120,3 +120,9 @@ def conditional_distribution(Fs) -> tfp.distributions.Distribution:
        super().__init__(
            latent_dim=2, conditional_distribution=conditional_distribution, **kwargs,
        )
+
+    def conditional_parameters(self, F):


Because this is quite an important method, would it be possible to add docstrings?

vdutor · 2021-02-09T12:49:49Z

gpflow/likelihoods/multilatent.py

+    def conditional_parameters(self, F):
+        return self._conditional_mean(F), self._conditional_variance(F)
+
+    def conditional_sample(self, F):


Because this is quite an important method, would it be possible to add docstrings?

Given that we are not converting all the likelihoods (rightfully so) I think it is important that we explain/document the code as good as possible so that people doing the work on other likelihoods know what to change/implement.

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

Rearranging and updating notebooks with more user-friendly descriptions.

willcowley and others added 16 commits January 15, 2021 12:25

plot heteroskedastic y distribution

2bfbae2

Initial exploration for an alternative interface with non-Gaussian li…

dbe09c6

…kelihoods.

Initial sketch for a prototype conditional output distribution.

3458f1a

use ConditionalLikelihood in classificationnotebook

38b88b3

update ConditionalLikelihood

2758130

update notebooks

402c8b4

reroute model.predict_y, predict_log_density to conditional_y_dist

cada4f6

move ConditionedLikelihood into gpflow.likelihoods

b04d630

format on notebooks

b47d114

add missing import

d7e6b21

fix warn() call

d600663

fix rename

b9304c3

add y_percentile and parameter_percentile helpers to ConditionalLikel…

d6bf0e1

…ihood

update notebooks to use y_dist.y_percentile and y_dist.parameter_perc…

8467a73

…entile

Adding some docstrings.

9a3a41a

Add also some explanations about the new interface in the notebooks a…

0333234

…s well.

st-- changed the title ~~Avullo willcowley/working bee ef1~~ Improve capabilities for non-Gaussian likelihoods Jan 18, 2021

Reformatting.

5c6db88

vdutor self-requested a review February 9, 2021 12:16

vdutor requested changes Feb 9, 2021

View reviewed changes

avullo and others added 7 commits September 14, 2021 13:50

Merge branch 'develop' into avullo-willcowley/working-bee-ef1

4db4cea

Merge branch 'develop' into avullo-willcowley/working-bee-ef1

2b103e7

Apply suggestions from code review

47e61b7

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

Consistency with numpy quantile interface.

1dd3223

Rearranging and updating notebooks with more user-friendly descriptions.

Rearranging and updating notebooks with more user-friendly descriptions.

702278c

update classification-redesigned

5e92b2e

update heteorskedastic notebook

458c6a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve capabilities for non-Gaussian likelihoods #1631

Improve capabilities for non-Gaussian likelihoods #1631

avullo commented Jan 18, 2021 •

edited

codecov bot commented Jan 18, 2021 •

edited

vdutor left a comment

vdutor Feb 9, 2021

vdutor Feb 9, 2021

vdutor Feb 9, 2021

vdutor Feb 9, 2021

vdutor Feb 9, 2021

vdutor Feb 9, 2021

vdutor Feb 9, 2021

vdutor Feb 9, 2021

Improve capabilities for non-Gaussian likelihoods #1631

Are you sure you want to change the base?

Improve capabilities for non-Gaussian likelihoods #1631

Conversation

avullo commented Jan 18, 2021 • edited

Summary

codecov bot commented Jan 18, 2021 • edited

Codecov Report

vdutor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avullo commented Jan 18, 2021 •

edited

codecov bot commented Jan 18, 2021 •

edited