Kernel variance vs. Gaussian noise variance #848

jmren168 · 2020-06-23T07:17:06Z

Hi,

In my cases, when I optimized GPy.model.GPRegression(kernel=RBF), I got different results

kernel variance (~0.00001) is smaller than Gaussian noise variance (~1) . What's the meaning of this result?
kernel variance (~10) is larger than Gaussian noise variance (~0.009). What's the meaning of this result?

Any helps would be highly appreciated.

Best,
JM

lawrennd · 2020-06-23T07:32:30Z

In (1) the model has found a minima where the signal to noise ratio is very low (for a simple RBF kernel you can divide the variance of the RBF by the variance of the Gaussian noise to find this ratio).

In (2) the opposite has happened.

These can be local minima, you need to be sensitive to initialisation, such as the kernel lengthscale, and perhaps try some different starting points.

This paper uses signal to noise ratios to find 'quiet genes' in gene expression. It might be helpful.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-180

jmren168 · 2020-06-23T08:10:36Z

Thanks for the reply. I've tried different starting points (via optimize_restart), and got similar results.

BTW, how to decide whether a GPRegression is fitted well? My option is that kernel variance is larger than Gaussian noise variance, and Gaussian noise variance is about 0.25~0.5.

Any comments are helpful.
JM

lawrennd · 2020-06-23T08:13:20Z

If you have a look at the paper, the key is to look at the likelihood of the different fits. Often one likelihood will be far larger than the other.

lionfish0 · 2020-06-23T09:36:26Z

Do you have any priors you can bring to the problem? On a scale of easy-solution to most-principled-solution: a) Rather than use the random restarts in the optimize_restart method, could you initialise the parameters roughly to be the values you expect, so it is likely to find the correct (hopefully global) maximum likelihood. b) Put constraints to keep a parameter in a known bound: use "constrain_bounded"? c) Slightly more principled is to add a prior. use "set_prior", and then use normal ML estimation. d) More principled still is to integrate (sample) over the hyperparameters. Just some thoughts. Mike

…

On Tue, 23 Jun 2020 at 09:13, Neil Lawrence ***@***.***> wrote: If you have a look at the paper, the key is to look at the likelihood of the different fits. Often one likelihood will be far larger than the other. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#848 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MGQAJG2UMOJTELMKM6GLRYBP3FANCNFSM4OFLZGVQ> .

jmren168 · 2020-06-23T23:43:50Z

Thanks for the reply. Not sure how to bring the priors to the problem.

More details about this REAL case.

~30 samples, say x1, x2, ..., x30, and each of them is of dimensionality 100.
Only 3~5 values of x2 are different from x1; the same phenomena appears in the comparison of x3 and x2. In addition, 30%~50% of these 100 values are the same.

One way we think is to use Leave-one-out (LOO) to select possible kernel variance and length-scale and then use each of these LOO selected kernel parameters to set constrain_bound. But not sure if this is correct or not.

Any suggestions are appreciated.
JM

zhenwendai · 2020-06-26T10:34:31Z

If you do not use ARD, which gives one length scale per input dimension, the model should not overfit, but the model may treat everything as noise (kernel variance is close to zero).

If you need ARD, point estimate with cross validation or MCMC could be a solution to the problem.

jmren168 · 2020-06-28T15:28:12Z

@lionfish0 Hi Mike,

Do you have any further references of (c) or (d)? Thanks in advanced.

c) Slightly more principled is to add a prior. use "set_prior", and then
use normal ML estimation.
d) More principled still is to integrate (sample) over the hyperparameters.

Best,
JM

jmren168 · 2020-06-29T13:59:44Z

I found a paper discussing about estimating kernel variance via MLE and LOOCV.

F. Bachoc, Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics & Data Analysis 66 (2013): 55-69.

jmren168 · 2020-07-01T09:37:46Z

Hi,

After using ARD, I found some phenomena of "lengthscale" results.

a large lengthscale, reaching to the lower bound of constraint_bound setting, say 1000;
a very small lengthscale, say 0.00000001.

My comments for these two cases:

if values of a dimensionality does not affect y at all, its corresponding optimized lengthscale is large;
a dimensionality with the smallest lengthscale should affect y more.

Please correct me if I'm wrong. Many thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel variance vs. Gaussian noise variance #848

Kernel variance vs. Gaussian noise variance #848

jmren168 commented Jun 23, 2020 •

edited

Loading

lawrennd commented Jun 23, 2020

jmren168 commented Jun 23, 2020

lawrennd commented Jun 23, 2020

lionfish0 commented Jun 23, 2020 via email

jmren168 commented Jun 23, 2020

zhenwendai commented Jun 26, 2020

jmren168 commented Jun 28, 2020

jmren168 commented Jun 29, 2020

jmren168 commented Jul 1, 2020 •

edited

Loading

Kernel variance vs. Gaussian noise variance #848

Kernel variance vs. Gaussian noise variance #848

Comments

jmren168 commented Jun 23, 2020 • edited Loading

lawrennd commented Jun 23, 2020

jmren168 commented Jun 23, 2020

lawrennd commented Jun 23, 2020

lionfish0 commented Jun 23, 2020 via email

jmren168 commented Jun 23, 2020

zhenwendai commented Jun 26, 2020

jmren168 commented Jun 28, 2020

jmren168 commented Jun 29, 2020

jmren168 commented Jul 1, 2020 • edited Loading

jmren168 commented Jun 23, 2020 •

edited

Loading

jmren168 commented Jul 1, 2020 •

edited

Loading