Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel variance vs. Gaussian noise variance #848

Open
jmren168 opened this issue Jun 23, 2020 · 9 comments
Open

Kernel variance vs. Gaussian noise variance #848

jmren168 opened this issue Jun 23, 2020 · 9 comments

Comments

@jmren168
Copy link

jmren168 commented Jun 23, 2020

Hi,

In my cases, when I optimized GPy.model.GPRegression(kernel=RBF), I got different results

  1. kernel variance (~0.00001) is smaller than Gaussian noise variance (~1) . What's the meaning of this result?
  2. kernel variance (~10) is larger than Gaussian noise variance (~0.009). What's the meaning of this result?

Any helps would be highly appreciated.

Best,
JM

@lawrennd
Copy link
Member

In (1) the model has found a minima where the signal to noise ratio is very low (for a simple RBF kernel you can divide the variance of the RBF by the variance of the Gaussian noise to find this ratio).

In (2) the opposite has happened.

These can be local minima, you need to be sensitive to initialisation, such as the kernel lengthscale, and perhaps try some different starting points.

This paper uses signal to noise ratios to find 'quiet genes' in gene expression. It might be helpful.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-180

@jmren168
Copy link
Author

Thanks for the reply. I've tried different starting points (via optimize_restart), and got similar results.

BTW, how to decide whether a GPRegression is fitted well? My option is that kernel variance is larger than Gaussian noise variance, and Gaussian noise variance is about 0.25~0.5.

Any comments are helpful.
JM

@lawrennd
Copy link
Member

If you have a look at the paper, the key is to look at the likelihood of the different fits. Often one likelihood will be far larger than the other.

@lionfish0
Copy link
Member

lionfish0 commented Jun 23, 2020 via email

@jmren168
Copy link
Author

Thanks for the reply. Not sure how to bring the priors to the problem.

More details about this REAL case.

  1. ~30 samples, say x1, x2, ..., x30, and each of them is of dimensionality 100.
  2. Only 3~5 values of x2 are different from x1; the same phenomena appears in the comparison of x3 and x2. In addition, 30%~50% of these 100 values are the same.

One way we think is to use Leave-one-out (LOO) to select possible kernel variance and length-scale and then use each of these LOO selected kernel parameters to set constrain_bound. But not sure if this is correct or not.

Any suggestions are appreciated.
JM

@zhenwendai
Copy link
Member

If you do not use ARD, which gives one length scale per input dimension, the model should not overfit, but the model may treat everything as noise (kernel variance is close to zero).

If you need ARD, point estimate with cross validation or MCMC could be a solution to the problem.

@jmren168
Copy link
Author

@lionfish0 Hi Mike,

Do you have any further references of (c) or (d)? Thanks in advanced.

c) Slightly more principled is to add a prior. use "set_prior", and then
use normal ML estimation.
d) More principled still is to integrate (sample) over the hyperparameters.

Best,
JM

@jmren168
Copy link
Author

I found a paper discussing about estimating kernel variance via MLE and LOOCV.

F. Bachoc, Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics & Data Analysis 66 (2013): 55-69.

@jmren168
Copy link
Author

jmren168 commented Jul 1, 2020

Hi,

After using ARD, I found some phenomena of "lengthscale" results.

  1. a large lengthscale, reaching to the lower bound of constraint_bound setting, say 1000;
  2. a very small lengthscale, say 0.00000001.

My comments for these two cases:

  1. if values of a dimensionality does not affect y at all, its corresponding optimized lengthscale is large;
  2. a dimensionality with the smallest lengthscale should affect y more.

Please correct me if I'm wrong. Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants