Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

visualization of uncertainty #24

Open
yuanqing-wang opened this issue May 13, 2020 · 7 comments
Open

visualization of uncertainty #24

yuanqing-wang opened this issue May 13, 2020 · 7 comments

Comments

@yuanqing-wang
Copy link
Member

Since the inputs are graphs and couldn't be squeezed onto one axis, how should we visualize the uncertainty predictions in regression tasks?

@karalets
Copy link
Collaborator

karalets commented May 13, 2020

Hey,

I would also reference this in the metrics issue #4

If I understand correctly, you are concerned about ordering items.
Initially, I believe one could represent uncertainty per batch/dataset and not easily per item on a plot, since ordering graphs is weird.

Here's one fun way to think about this down the line:
if the semi- and unsupervised learning thingie makes some progress, we may find ourselves in a position to have embeddings of graphs in a low-d domain.

Then you could have a d-dimensional plot with a heatmap representing LLK etc, which would be a very nice way to represent chemical space.

Potentially 2d might even work and is very easy to visualize, but even higher-d could be further reduced.

@yuanqing-wang
Copy link
Member Author

If all we need is a fixed-dimensional representation of graphs, then we might as well just do some dimension reduction tricks on the eigenspace representation?

@karalets
Copy link
Collaborator

karalets commented May 13, 2020

I'm not really sure there are good eigenspace representations of graphs of different sizes etc. If I were you I would not focus on that now, I would focus on trying out some of these unsupervised and semi-supervised models for graphs to get to the thing we actually need in order to incorporate knowledge from graphs we have no measurements for.

@karalets
Copy link
Collaborator

Here's a recent paper doing chem-stuff:
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0396-x

Here's a very rough overview of the core idea:
https://towardsdatascience.com/tutorial-on-variational-graph-auto-encoders-da9333281129

Lots of papers of various degrees of complexity exist and lots of code-bases, in an ideal universe we would open an issue to survey the landscape of the different graph models out there with code and start a script to test their usefulness systematically just like the current experiment about graph nets for regression.

How does that sound?

And here's a classic just for starters:
https://github.com/tkipf/gae

But there are many more recent papers extending those ideas significantly.

@karalets
Copy link
Collaborator

I pitch we continue the graph modeling discussion in a new issue I created for that.

@maxentile
Copy link
Member

I think there are two distinct aspects of this question: how to visually "index into" the domain of possible input graphs, and how to visualize a model's predictive uncertainty for a single input graph.

For predictions of a scalar property of a single molecule (e.g. its "affinity" or "overall goodness score"), a complete representation would be the whole predictive pdf (aka a histogram of samples from the posterior predictive distribution). That can be summarized by an interval or scalar measuring its "spread" (stddev, quantiles, ...). Depending on the shapes of the predictive distributions, these summaries may be more or less lossy.

For predictions of more than one property simultaneously of a given molecule ("solubility", "on-target affinity", "off-target affinity", "toxicity", ...), the predictions of the various quantities will probably be correlated, which will make summarization even harder. A complete representation would be the joint distribution for all these predictions, which can be lossily visualized in the usual ways (reduce jointly to 2D, show all bivariate marginals, ...). Although it will be too big to look at for more than a handful of molecules, I imagine it will be informative to take a look at the joint predictive distribution for all properties of a molecule or two (maybe for one molecule that looks very similar to something in training dataset, and one that looks very different), using the same model but different approximate inference algorithms.

I don't have any special insight into how to visually "index into" the domain of input graphs, to get a more global picture of what the model's doing. Associating each graph with a point in 2D using the approaches @karalets describes here and on the new issue sounds good to me. Each 2D point would further be associated with a scalar (@karalets suggests LLK, and perhaps other scalar summaries of posterior predictive distribution would be appropriate), to form a colored scatterplot or heatmap maybe hinting at "where in chemical space the model is confident or not."

@karalets
Copy link
Collaborator

I think there are two distinct aspects of this question: how to visually "index into" the domain of possible input graphs, and how to visualize a model's predictive uncertainty for a single input graph.

For predictions of a scalar property of a single molecule (e.g. its "affinity" or "overall goodness score"), a complete representation would be the whole predictive pdf (aka a histogram of samples from the posterior predictive distribution). That can be summarized by an interval or scalar measuring its "spread" (stddev, quantiles, ...). Depending on the shapes of the predictive distributions, these summaries may be more or less lossy.

For predictions of more than one property simultaneously of a given molecule ("solubility", "on-target affinity", "off-target affinity", "toxicity", ...), the predictions of the various quantities will probably be correlated, which will make summarization even harder. A complete representation would be the joint distribution for all these predictions, which can be lossily visualized in the usual ways (reduce jointly to 2D, show all bivariate marginals, ...). Although it will be too big to look at for more than a handful of molecules, I imagine it will be informative to take a look at the joint predictive distribution for all properties of a molecule or two (maybe for one molecule that looks very similar to something in training dataset, and one that looks very different), using the same model but different approximate inference algorithms.

I think @yuanqing-wang asks about how to build an axis over graphs here, not the output variable. That discussion, should probably be in the metrics issue #4 .
I agree with you that the title might also point to discussions about how to visualize output, however the description of the issue makes me think otherwise.

I don't have any special insight into how to visually "index into" the domain of input graphs, to get a more global picture of what the model's doing. Associating each graph with a point in 2D using the approaches @karalets describes here and on the new issue sounds good to me. Each 2D point would further be associated with a scalar (@karalets suggests LLK, and perhaps other scalar summaries of posterior predictive distribution would be appropriate), to form a colored scatterplot or heatmap maybe hinting at "where in chemical space the model is confident or not."

To give some more color:

First, the metrics we care about predicting should be anything we decide to have in #4 , I mention LLK here as one example. I am sad nobody is interacting with #4 as this is a very important issue that @yuanqing-wang brought up as a blocker for reproduction when he initially looked at the Cambridge paper and we have not yet made overview plots like the ones they have there.

The 2d or whatever-d plot would point to such metrics, whichever one chooses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants