Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GP model has positive log-likelihood after optimisation #1043

Open
enushi opened this issue Nov 5, 2023 · 5 comments
Open

GP model has positive log-likelihood after optimisation #1043

enushi opened this issue Nov 5, 2023 · 5 comments

Comments

@enushi
Copy link

enushi commented Nov 5, 2023

Hi,

My GP model has positive log-likelihood after optimisation and furthermore I don't get any warning message that something could be going wrong. Here is the code to reproduce the problem:

Y = np.array([[ 2.64743328,  1.03329583,  0.76342846, -0.2054021 ,  0.55964613,
         0.37885527, -0.67376998, -0.71804405, -0.77531237, -0.77549353,
        -0.77524664, -0.77258172, -0.68680858,  2.64743328,  1.03329583,
         0.76342846, -0.2054021 ,  0.55964613,  0.37885527, -0.67376998,
        -0.71804405, -0.77531237, -0.77549353, -0.77524664, -0.77258172,
        -0.68680858,  2.64743328,  1.03329583,  0.76342846, -0.2054021 ,
         0.55964613,  0.37885527, -0.67376998, -0.71804405, -0.77531237,
        -0.77549353, -0.77524664, -0.77258172, -0.68680858]])

X = np.array([[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13.5],[15]]*3)

kernel = GPy.kern.RBF(input_dim=1, variance=0.5, lengthscale=1)

m = GPy.models.GPRegression(X, np.array(np.mat(Y)).T, kernel)

m.optimize(messages=True)
m.plot()
m.log_likelihood()

The log-likelihood of the model is 189.98926713194757. And this is the resulting posterior plot:
image

Is this an expected behaviour of the GPy package or a possible bug?

@MartinBubel
Copy link
Contributor

Hi @enushi
I am not an expert on this but here are my thoughts - maybe they are helpful:

A positive likelihood shouldn't be problematic as we are maximizing the likelihood.
Also, you are getting the optimizer message

Running L-BFGS-B (Scipy implementation) Code:
  runtime   i      f              |g|
    00s00  0002   4.730200e+01   5.290785e+01
    00s02  0009   1.469069e+01   1.635310e+02 
    00s05  0021  -1.818694e+02   3.202323e-02 
    00s12  0046  -1.897958e+02   5.945395e+01 
Runtime:     00s12
Optimization status: Converged

which looks fine.

After all, I think you are confused by the plot you are getting, which shows that the model's mean does simply is zero.
That however is, because you are setting the variance to 0.5, which is enough to explain the data points based on a zero mean.

Decrease the variance, e.g. to 0.05, and you will see totally different result.

Is this helpful?

@MartinBubel MartinBubel added need more info If an issue or PR needs further information by the issuer and removed need more info If an issue or PR needs further information by the issuer labels Nov 8, 2023
@enushi
Copy link
Author

enushi commented Nov 9, 2023

Hi @MartinBubel,

Thanks for the explanation! It makes more sense now. Actually I added the line m.optimize_restarts(num_restarts =10, verbose=1) after m.optimize(messages=True) and I get a totally different plot:

image

However, I noticed that in both cases (i.e., whether I do m.optimize_restarts(num_restarts =10, verbose=1) or not) the posterior means and variances that I get from means, variances = m.predict(np.array([[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13.5],[15]])) are almost exactly the same. Which makes me think that there is something wrong with the first plot as it does not reflect the posterior mean and variance after the optimisation. As it can be seen by doing the following:

Y = np.array([[ 2.64743328,  1.03329583,  0.76342846, -0.2054021 ,  0.55964613,
         0.37885527, -0.67376998, -0.71804405, -0.77531237, -0.77549353,
        -0.77524664, -0.77258172, -0.68680858,  2.64743328,  1.03329583,
         0.76342846, -0.2054021 ,  0.55964613,  0.37885527, -0.67376998,
        -0.71804405, -0.77531237, -0.77549353, -0.77524664, -0.77258172,
        -0.68680858,  2.64743328,  1.03329583,  0.76342846, -0.2054021 ,
         0.55964613,  0.37885527, -0.67376998, -0.71804405, -0.77531237,
        -0.77549353, -0.77524664, -0.77258172, -0.68680858]])

X = np.array([[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13.5],[15]]*3)

kernel = GPy.kern.RBF(input_dim=1, variance=0.5, lengthscale=1)

m = GPy.models.GPRegression(X, np.array(np.mat(Y)).T, kernel)

m.optimize(messages=True)
#m.optimize_restarts(num_restarts =2, verbose=0)
m.plot()
m.log_likelihood()

means, variances = m.predict(np.array([[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13.5],[15]]))

plt.plot(np.array([[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13.5],[15]]), means, c='red')

plt.show()

which outputs the following:

image

@MartinBubel
Copy link
Contributor

Hi @enushi

Thanks for following up. Now, I see that my previous answer was not that accurate. In fact, there seems nothing bad with your prior variance guess. However, it seems like the optimizer converges to a local optimum. Using a variance prior of 0.05 instead of 0.5 simply yields a better guess for the optimizer and thus results in a better fit. This is particularly surprising since the optimizer converges to a variance of 1, eventually, where you could argue that the initial guess of 0.5 is much closer than 0.05. However, with nonlinear optimization, it is always not that easy so that "closeness" might be misleading.

Restarting the optimization comes with some "globalization", using different (random) initial guesses for the model parameters.
So I would conclude that one should always use restart. However, I would not have expeceted to see such issues with an example as simple as this one but that might be due to my lack of experience here.

I admit, this is not optimal user experience.
However, we are currently busy with "basic maintenance" and there won't be any change to the hyperparmaeter optimization soon.

Ultimately, regarding your question: Yes, this is expected behavior and not a bug, as per default, the LBFGS solver (a local solver) is selected.

Does that help you or are there any questions remaining?

Best regards

Martin

@enushi
Copy link
Author

enushi commented Nov 11, 2023

Hi @MartinBubel,

Thanks for your answer, but I don't understand still the discrepancy between what the plot shows and the posterior mean and variance that I get, in the case when I don't use restart.

best regards,

Elio

@MartinBubel
Copy link
Contributor

Hi @enushi
if you don't use restart, the solver converges to a local optima, using variance = 1 and kernel lengthscale near 0. Thus, the posterior mean is dominated by the variance. In the above plot, the filled area is obtained by mean +- 2 * std. If your variance is 1 and your kernel lengthscale is near 0, then the above plot is what you get.

Also, I am not sure if rbf.variance is the posterior or simply an updated prior guess. I am not familiar yet with how GPy is architectured and documentation is scarce.

To my understanding, the GP should have posterior variance at or near 0 at the training locations (not the kernel).
This is the case for the restart case but as you noted, it is not the case for the non-restart case.
Maybe you can use this as a criterion for checking whether the optimization converges to a "good" solution?
That also is something that I would consider being beneficial as information for the user instead of just obtaining a "solver converged successfully" flag...

Also, is it possible that something has got mixed up in the screenshots in your last post?
When I run it with num_restarts > 1, then I get a rbf.lengthscale about 0.89. Also, it looks suspicious that in both the screenshots, the exact same loss function value is achieved.
Are you sure that the screenshots match the code left to it? Maybe it would be better if we iterate some script or notebook.

Maybe you will find more insight if you debug into m.plot().

Best regards
Martin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants