Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation error #995

Open
sschooler opened this issue Mar 1, 2024 · 8 comments
Open

Memory allocation error #995

sschooler opened this issue Mar 1, 2024 · 8 comments

Comments

@sschooler
Copy link

sschooler commented Mar 1, 2024

I'm trying to run a gaussian model with temporal autocorrelation. The data is GPS data, with observations from multiple animals once every hour. In total, there are 300000 data points, with the temporal autocorrelation factor having 20407 levels. I followed this vignette to add temporal autocorrelation, grouping by each animal, such that the autocorrelation parameter is ar1(date-time factor + 0|individual id). I have 5 other predictors other than the autocorrelation parameter. I have set the dispersion formula to zero in the model specification.

I'm running into a memory allocation issue, with the error "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': cannot allocate vector of size 47.1 Gb". I know this is likely because the dataset is too large to run the model and my computer is running out of memory. I was wondering if there were workarounds to this problem other than reducing the size of my dataset.

Thank you!

@bbolker
Copy link
Contributor

bbolker commented Mar 1, 2024

I'm not sure we have any magic tricks, but it would be useful to know what the results of traceback() are when you run your model. It would also be useful, and might be possible, to write code that would simulate data similar to the size and structure of your data (for the purposes of troubleshooting memory problems, it probably doesn't even matter if the response variable has any patterns in it, e.g. you could use y <- rnorm(300000) to generate the response variable).

If your model is fairly simple, you may be able to use lme from the base-R nlme package, with correlation = corAR1(form = ~time|individual))

@bbolker
Copy link
Contributor

bbolker commented Mar 3, 2024

Here's a start at a reproducible example. So far it seems slow on my machine but not hugely memory-hungry. R is reporting 2.4GB of RAM use - I think but am not sure that TMB's memory use will be correctly charged to R? In any case, the total memory allocation on the machine is at about 20GB and doesn't seem to be blowing up.

Does it cause problems for you? The error you're reporting above ("cannot allocate vector of size 47.1 Gb") doesn't seem consistent with what's happening here (although I haven't gotten to the end of the computation yet, maybe something bad will happen when the model gets finalized ...). Again, more details from traceback() would be useful.

  • this is non-parallelized/with a single thread; using more OpenMP threads will make it faster [?possibly more memory-hungry? I wouldn't expect that because OpenMP is shared-memory but ...]
nt <- 2e4
nn <- 3e5

set.seed(101)
## counts with at least 2 times per individual
nt_id <- rpois(nt, lambda = mean(nn/nt)-2) + 2
nn2 <-sum(nt_id)  ## close to nn

tvec <- lapply(nt_id, seq.int) |> unlist() |> factor()
idvec <- rep(seq_along(nt_id), times = nt_id)
dd <- (replicate(6, rnorm(nn2), simplify = FALSE)
    |> as.data.frame()
    |> setNames(c("y", paste0("x", 1:5)))
)
dd <- data.frame(dd, tvec, id = idvec)

library(glmmTMB)
library(peakRAM)
peakRAM(
 glmmTMB(y ~ x1 + x2 + x3 + x4 + x5 + ar1(0 + tvec|id) + (1|id),
        dispformula = ~ 0,
        data = dd, verbose = TRUE)
)

@sschooler
Copy link
Author

sschooler commented Mar 4, 2024

Thank you! One thing I noticed that was different between your code and mine was that I used ar1(timevec + 0|id) whereas you used ar1(0+timevec|id). According to the vignette, it has the former as the correct formulation. It also looks like for the time vector there are only 30 time steps, where my dataset has 12000 different time steps with 160 individual ids.

Here's the traceback:

13: h(simpleError(msg, call))
12: .handleSimpleError(function (cond) 
    .Internal(C_tryCatchHelper(addr, 1L, cond)), "cannot allocate vector of size 42.6 Gb", 
        base::quote(NULL))
11: as.vector(mm[, jj])
10: data.frame(i = as.vector(mm[, ii]) + boff[i], j = as.vector(mm[, 
        jj]) + boff[i], x = as.double(rep.int(seq_along(ii), rep.int(nl[i], 
        length(ii))) + thoff[i]))
9: FUN(X[[i]], ...)
8: lapply(seq_along(blist), function(i) {
       mm <- matrix(seq_len(nb[i]), ncol = nc[i], byrow = TRUE)
       dd <- diag(nc[i])
       ltri <- lower.tri(dd, diag = TRUE)
       ii <- row(dd)[ltri]
       jj <- col(dd)[ltri]
       data.frame(i = as.vector(mm[, ii]) + boff[i], j = as.vector(mm[, 
           jj]) + boff[i], x = as.double(rep.int(seq_along(ii), 
           rep.int(nl[i], length(ii))) + thoff[i]))
   })
7: do.call(rbind, lapply(seq_along(blist), function(i) {
       mm <- matrix(seq_len(nb[i]), ncol = nc[i], byrow = TRUE)
       dd <- diag(nc[i])
       ltri <- lower.tri(dd, diag = TRUE)
       ii <- row(dd)[ltri]
       jj <- col(dd)[ltri]
       data.frame(i = as.vector(mm[, ii]) + boff[i], j = as.vector(mm[, 
           jj]) + boff[i], x = as.double(rep.int(seq_along(ii), 
           rep.int(nl[i], length(ii))) + thoff[i]))
   }))
6: do.call(sparseMatrix, do.call(rbind, lapply(seq_along(blist), 
       function(i) {
           mm <- matrix(seq_len(nb[i]), ncol = nc[i], byrow = TRUE)
           dd <- diag(nc[i])
           ltri <- lower.tri(dd, diag = TRUE)
           ii <- row(dd)[ltri]
           jj <- col(dd)[ltri]
           data.frame(i = as.vector(mm[, ii]) + boff[i], j = as.vector(mm[, 
               jj]) + boff[i], x = as.double(rep.int(seq_along(ii), 
               rep.int(nl[i], length(ii))) + thoff[i]))
       })))
5: t(do.call(sparseMatrix, do.call(rbind, lapply(seq_along(blist), 
       function(i) {
           mm <- matrix(seq_len(nb[i]), ncol = nc[i], byrow = TRUE)
           dd <- diag(nc[i])
           ltri <- lower.tri(dd, diag = TRUE)
           ii <- row(dd)[ltri]
           jj <- col(dd)[ltri]
           data.frame(i = as.vector(mm[, ii]) + boff[i], j = as.vector(mm[, 
               jj]) + boff[i], x = as.double(rep.int(seq_along(ii), 
               rep.int(nl[i], length(ii))) + thoff[i]))
       }))))
4: mkReTrms(no_specials(findbars_x(formula)), fr, reorder.terms = FALSE)
3: getXReTrms(formula, mf, fr, type = "conditional", contrasts = contrasts, 
       sparse = sparseX[["cond"]], old_smooths = old_smooths$cond)
2: mkTMBStruc(formula, ziformula, dispformula, combForm, mf, fr, 
       yobs = y, respCol, weights, contrasts = contrasts, family = family, 
       se = se, call = call, verbose = verbose, REML = REML, start = start, 
       map = map, sparseX = sparseX, control = control)
1: glmmTMB(data = summer.df, logV ~ scale(slope) + scale(elevation) + 
       scale(bearsuit) + lc + sex + (1 | elkyr) + ar1(datetimenumber + 
       0 | elkyr), family = gaussian, dispformula = ~0, )

@bbolker
Copy link
Contributor

bbolker commented Mar 4, 2024

There should be no difference between ar1(timevec + 0|id) and ar1(0+timevec|id).

Can you show us the output of summary(summer.df) and with(summer.df, summary(lengths(split(datetimenumber, elkyr)))) ?

@sschooler
Copy link
Author

Here you go!

Screenshot 2024-03-05 103534
Screenshot 2024-03-05 103717

@bbolker
Copy link
Contributor

bbolker commented Mar 8, 2024

I will try this again when I get a chance. I'm pretty sure I know what's happening, which is that at some point internally the machinery is trying to set up a dense matrix with dimensions equal to the length of the autocorrelation factor. This matrix is not actually needed by glmmTMB; we're using a function from lme4, which does need the component. I think the correct solution, ultimately, is to (1) put mkReTerms into the reformulas package; (2) modify it to make the computation of certain terms optional; (3) have both lme4 and glmmTMB use that version.

The simpler, hackier solution is to copy mkReTrms from lme4 to glmmTMB and modify it to skip this part of the calculation -- that might be an interim solution.

@kaskr
Copy link
Contributor

kaskr commented Mar 8, 2024

There's another similar issue here , causing O(n^2) time/memory complexity for models that should only be O(n). A flag should probably be added to skip calculation of these giant matrices.

@bbolker
Copy link
Contributor

bbolker commented Mar 10, 2024

I've copied mkReTrms into the (new!) reformulas package and added an option to skip this part of the calculation. You could try the following:

remotes::install_github("bbolker/reformulas")
remotes::install_github("glmmTMB/glmmTMB/glmmTMB@reformulas")

and see if you get the same error (you'll need development tools installed). You may run into problems a little further down the line based on the step that @kaskr mentions above, but if so we can add something to skip this step ...

@bbolker bbolker mentioned this issue Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants