Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cholesky solver #38

Closed
david-cortes opened this issue Nov 15, 2020 · 8 comments
Closed

Cholesky solver #38

david-cortes opened this issue Nov 15, 2020 · 8 comments
Labels

Comments

@david-cortes
Copy link
Contributor

david-cortes commented Nov 15, 2020

I'm getting some pretty bad results using WRMF with the Cholesky solver.

I tried doing a reproducible experiment as follows:

  • Took the LastFM360k data.
  • Set the first 357k users as train, rest ~1800 as test.
  • Fit a WRMF model with k=40 and lambda=1 and 10 iterations.
  • Calculated factors and top-5 predictions for the test users.
  • Calculated hit rate at 5 for them.

And got the following results:

  • CG: 0.4441335
  • Cholesky: 0.4179763

Which is certainly not what I'd expect as the Cholesky is a more exact method and should lead to better results. Using different random seeds did not result in any significant change.

I additionally see a strange behavior with the timings:

  • CG: ~21s
  • Chol: ~32s

According to the reference paper Applications of the conjugate gradient method for implicit feedback collaborative filtering, the Cholesky solver in this case should take close to 3x longer than the CG one. I did the same experiment with this package and got the expected results: Cholesky takes 3x longer and gets a better end result:

  • CG: ~23s | 0.4501615
  • Chol: ~64s | 0.4512379

Tried playing with the armadillo option arma::solve_opts::fast, changing it to something more exact, but it didn't make any difference in the HR@5 metric.

I'm not familiar with armadillo but I suspect this line is incorrect:

arma::Mat<T> inv = XtX + X_nnz.each_row() % (confidence.t() - 1) * X_nnz.t();

Since it's doing an operation per row, whereas it should be doing rank-1 updates.

@dselivanov
Copy link
Owner

Thanks for reporting, I will take a look. Actually I expect the problem is the same as in #35 - XtX.

Which is certainly not what I'd expect as the Cholesky is a more exact method and should lead to better results.

Loss with Cholesky should be lower, but hit rate might be worse since we don't optimize it directly...

@dselivanov
Copy link
Owner

@david-cortes could set logger to debug and check the loss? From what I see cholesky achieves smaller (or about the same) loss:

library(rsparse)
library(lgr)
lg = get_logger('rsparse')
lg$set_threshold('debug')
data('movielens100k')

train = movielens100k
train = log1p(train)

for (seed in c(1, 2, 3)) {
  for (solver in c('cholesky', 'conjugate_gradient')) {
    model = WRMF$new(rank = 50,  lambda = 1, solver = solver)
    message(glue::glue("seed {seed}, solver {solver}"))
    user_emb = model$fit_transform(train, n_iter = 10, convergence_tol = -1, cg_steps = 3)
  }
}

@david-cortes
Copy link
Contributor Author

david-cortes commented Nov 15, 2020

But that loss is not the same function that is being minimized. The solver minimizes (or is supposed to minimize) the values over all entries, whereas the loss function calculates it over the present entries.

@dselivanov
Copy link
Owner

@david-cortes you are right! starting to forget all the details...

@david-cortes
Copy link
Contributor Author

david-cortes commented Nov 15, 2020

Well I did some experiments calculating over all entries and they do indeed end up with similar loss. Weird that the resulting from Cholesky then performs worse at other metrics, and that it takes so little time.

@david-cortes
Copy link
Contributor Author

david-cortes commented Nov 15, 2020

Did some more experiments this time, from the same movielens. After running for 10 iterations with seed=1, I get these losses:

  • CG: 74202.05
  • Cholesky: 74295.25

Whereas with the other package I was comparing, I get these:

  • CG: 126483.4
  • Cholesky: 78421.88

Wonder where the difference comes from...

@dselivanov
Copy link
Owner

And if you remove arma::solve_opts::fast argument from the solver?

@david-cortes
Copy link
Contributor Author

Then the loss does end up lower than for CG, even though in a test set it didn't improve HR@5 or AUC. I guess technically it is working as it should - that is, it minimizes the function it is intended to - so I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants