-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel variance vs. Gaussian noise variance #848
Comments
In (1) the model has found a minima where the signal to noise ratio is very low (for a simple RBF kernel you can divide the variance of the RBF by the variance of the Gaussian noise to find this ratio). In (2) the opposite has happened. These can be local minima, you need to be sensitive to initialisation, such as the kernel lengthscale, and perhaps try some different starting points. This paper uses signal to noise ratios to find 'quiet genes' in gene expression. It might be helpful. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-180 |
Thanks for the reply. I've tried different starting points (via optimize_restart), and got similar results. BTW, how to decide whether a GPRegression is fitted well? My option is that kernel variance is larger than Gaussian noise variance, and Gaussian noise variance is about 0.25~0.5. Any comments are helpful. |
If you have a look at the paper, the key is to look at the likelihood of the different fits. Often one likelihood will be far larger than the other. |
Do you have any priors you can bring to the problem?
On a scale of easy-solution to most-principled-solution:
a) Rather than use the random restarts in the optimize_restart method,
could you initialise the parameters roughly to be the values you expect, so
it is likely to find the correct (hopefully global) maximum likelihood.
b) Put constraints to keep a parameter in a known bound: use
"constrain_bounded"?
c) Slightly more principled is to add a prior. use "set_prior", and then
use normal ML estimation.
d) More principled still is to integrate (sample) over the hyperparameters.
Just some thoughts.
Mike
…On Tue, 23 Jun 2020 at 09:13, Neil Lawrence ***@***.***> wrote:
If you have a look at the paper, the key is to look at the likelihood of
the different fits. Often one likelihood will be far larger than the other.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#848 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4MGQAJG2UMOJTELMKM6GLRYBP3FANCNFSM4OFLZGVQ>
.
|
Thanks for the reply. Not sure how to bring the priors to the problem. More details about this REAL case.
One way we think is to use Leave-one-out (LOO) to select possible kernel variance and length-scale and then use each of these LOO selected kernel parameters to set constrain_bound. But not sure if this is correct or not. Any suggestions are appreciated. |
If you do not use ARD, which gives one length scale per input dimension, the model should not overfit, but the model may treat everything as noise (kernel variance is close to zero). If you need ARD, point estimate with cross validation or MCMC could be a solution to the problem. |
@lionfish0 Hi Mike, Do you have any further references of (c) or (d)? Thanks in advanced. c) Slightly more principled is to add a prior. use "set_prior", and then Best, |
I found a paper discussing about estimating kernel variance via MLE and LOOCV. F. Bachoc, Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics & Data Analysis 66 (2013): 55-69. |
Hi, After using ARD, I found some phenomena of "lengthscale" results.
My comments for these two cases:
Please correct me if I'm wrong. Many thanks. |
Hi,
In my cases, when I optimized GPy.model.GPRegression(kernel=RBF), I got different results
Any helps would be highly appreciated.
Best,
JM
The text was updated successfully, but these errors were encountered: