Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in BIOMOD_Projection using sfLapply #265

Closed
pabloriul opened this issue May 21, 2023 · 6 comments
Closed

Error in BIOMOD_Projection using sfLapply #265

pabloriul opened this issue May 21, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@pabloriul
Copy link

pabloriul commented May 21, 2023

Hi biomod team,

I am trying to create separate models for each environmental layer using the same PA data. The code works fine using lapply but it fails when I try it in parallel using sfLapply.

Cheers,

Pablo

Code used to get the error

# Load packages, species occurrences, select one species, get presence/absence data, load environmental variables and format data

library(biomod2) 
library(terra)

data(DataSpecies) 
head(DataSpecies)

myRespName <- 'GuloGulo' 

myResp <- as.numeric(DataSpecies[, myRespName])

myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')] # Get corresponding XY coordinates

data(bioclim_current)

myExpl <- terra::rast(bioclim_current)

myBiomodData_full <- BIOMOD_FormatingData(resp.var = myResp,
                                       expl.var = myExpl,
                                       resp.xy = myRespXY,
                                       resp.name = myRespName)

  # Function to create and project the model and its ensemble for each var in a vector
biomod <- function(var) {

  
  myBiomodData <- myBiomodData_full								#copy myBiomodData
  myBiomodData@data.env.var <-  myBiomodData@data.env.var[var]		#subset myBiomodData env vars 
  

  myBiomodOptions <- BIOMOD_ModelingOptions()   # Create default modeling options

  myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData, 
                                      modeling.id = paste(names(myBiomodData@data.env.var),collapse="_"),
                                      models = c('RF', 'GLM'),
                                      bm.options = myBiomodOptions,
                                      nb.rep = 2,
                                      data.split.perc = 80,
                                      metric.eval = c('TSS','ROC'),
                                      var.import = 3,
                                      do.full.models = FALSE,
                                      seed.val = 42)   

  myBiomodProj <- BIOMOD_Projection(bm.mod = myBiomodModelOut,   
                                    proj.name = paste(paste(names(myBiomodData@data.env.var),collapse="_"),"_Current",sep=""),
                                    new.env = subset(myExpl, paste(names(myBiomodData@data.env.var),sep=",")),
                                    models.chosen = 'all',
                                    build.clamping.mask = TRUE)


  myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut,   
                                        models.chosen = 'all',
                                        em.by = 'all',
                                        em.algo = c('EMmean', 'EMca'),
                                        metric.select = c('TSS'),
                                        metric.select.thresh = c(0.1),     #0.1 to avoid excluding all modes 
                                        metric.eval = c('TSS', 'ROC'),
                                        var.import = 3,
                                        seed.val = 42)


myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM, 
                                             proj.name = paste(paste(names(myBiomodData@data.env.var),collapse="_"),"_CurrentEM",sep=""),
                                             new.env = subset(myExpl, paste(names(myBiomodData@data.env.var),sep=",")),
                                             models.chosen = 'all',
                                             metric.binary = 'all',
                                             metric.filter = 'all')
}

#create a vector with numbers of variables in myExpl
env_n <- seq(1,nlyr(myExpl))

# using lapply #
#lapply(env_n, biomod)

# using sfLapply #

#load snowfall
library(snowfall)

#init cluster
sfInit(parallel=TRUE, cpus=4)

#export vars
sfExport("myRespName","myResp","myRespXY","myExpl","myBiomodData_full","biomod", local=TRUE)

#export packages
sfLibrary('biomod2', character.only=TRUE)
sfLibrary('terra', character.only=TRUE)

#build the models using sfLapply
mymodels <- sfLapply(env_n, biomod)

#stop cluster
sfStop( nostop=FALSE )
# paste output here

Error in checkForRemoteErrors(val) : 
  4 nodes produced errors; first error: NULL value passed as symbol address

Environment Information
Please paste the output of sessionInfo() in your current R session below.

# paste output of sessionInfo() here
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] snowfall_1.84-6.2 snow_0.4-4        terra_1.7-3       biomod2_4.2-2    

loaded via a namespace (and not attached):
 [1] gbm_2.1.8.1            tidyselect_1.2.0       reshape2_1.4.4         splines_4.2.2          lattice_0.20-45        colorspace_2.1-0       vctrs_0.6.2            generics_0.1.3         utf8_1.2.3            
[10] survival_3.4-0         rlang_1.1.1            pillar_1.9.0           glue_1.6.2             DBI_1.1.3              sp_1.6-0               plotmo_3.6.2           foreach_1.5.2          lifecycle_1.0.3       
[19] plyr_1.8.8             mda_0.5-3              stringr_1.5.0          munsell_0.5.0          gtable_0.3.1           codetools_0.2-18       maxnet_0.1.4           parallel_4.2.2         class_7.3-20          
[28] fansi_1.0.4            Rcpp_1.0.10            scales_1.2.1           plotrix_3.8-2          abind_1.4-5            TeachingDemos_2.12     ggplot2_3.4.2          stringi_1.7.12         dplyr_1.1.2           
[37] grid_4.2.2             cli_3.6.1              tools_4.2.2            magrittr_2.0.3         PresenceAbsence_1.1.11 tibble_3.2.1           randomForest_4.7-1.1   Formula_1.2-5          pkgconfig_2.0.3       
[46] MASS_7.3-58.1          Matrix_1.5-1           data.table_1.14.8      pROC_1.18.0            reshape_0.8.9          iterators_1.0.14       earth_5.3.2            R6_2.5.1               rpart_4.1.19          
[55] nnet_7.3-18            compiler_4.2.2    

Additional information
Testing the code in chunks I've noticed it worked in sfLapply for BIOMOD_Modeling, but fails in the next step BIOMOD_Projection.

@pabloriul pabloriul added the bug Something isn't working label May 21, 2023
@rpatin
Copy link
Contributor

rpatin commented May 22, 2023

Hello Pablo,
Thank you for reporting and using our new issue template, your issue was really clear and easy to reproduce 🙏

The problem you encounter is linked to terra objects that require much care when used in parallel computation. In short SpatRaster (the new object type replacing RasterStack and its cousin) cannot be given directly to a node. They have to be protected with wrap() and once given to the node converted back into a normal SpatRaster with unwrap. That is why BIOMOD_Modeling that did not use myExpl ran properly although BIOMOD_Projection failed. Here is the idea of the changes to be made in your code:

myExpl.wrap <- wrap(myExpl)

biomod <- function(var) {
[...]
myExpl <- unwrap(myExpl.wrap)
[...]
}

[...]
sfExport("myRespName","myResp","myRespXY","myExpl.wrap","myBiomodData_full","biomod", local=TRUE)
[...]

Using the wrap trick you should be able to run properly the projection and ensemble forecasting. However you may encounter some issues with the ensemble forecasting due to forecast in different nodes using the same temporary directory. I pushed a commit for that so if need be you can update to current github version (with devtools::install_github('biomodhub/biomod2') ). Note that by doing so you will need to adjust the cross-validation arguments (nb.rep, do.full.models ... - you should have message telling you that).

Additionally note that instead of running your different workflow in parallel you could also have run them sequentially while using the internal parallelization with argument nb.cpu within BIOMOD_Modeling, BIOMOD_Projection, BIOMOD_EnsembleModeling and BIOMOD_EnsembleForecasting.

Cheers,
Rémi

@pabloriul
Copy link
Author

Dear Remi,

Thank you very much for your quick reply. I was striving for three days :-). I had the impression the issue was related to SpatRaster because the code used to work with the previous version using a RasterStack.

Thanks for the suggestion of using of nb.cpu, I will make some tests, however it doesn't used to work in Mac OS.

Cheers,

Pablo

@rpatin
Copy link
Contributor

rpatin commented May 23, 2023

Dear Pablo,
Indeed, internal parallelization is mostly functional on Linux, we did not test it on Mac OS.

Concerning the RasterStack/SpatRaster, and in case you did not know, the raster package will soon be deprecated and fully replaced by terra, hence our motivation to migrate to terra.

Cheers,
Rémi

@pabloriul
Copy link
Author

pabloriul commented May 24, 2023

Dear Rémi,

Thank you again for your attention. I did some changes in the script and now I model all possible combinations between all environmental vars.

The code is almost the same with some minor changes. Also added some code for getting and writing evaluation and var importance tables.

The code works great with biomod example data, occ and environmental, and also when using example occ data and loading predictors from a folder.

However, for some reason I cannot understand why it fails when I use my own data (occ and loading predictors from a folder - the same environmental used with GuluGulo data - ) but it succeeds when I use my occurrence data and biomod example data as predictors (just can't imagine why).

The error apparently is occurring in BIOMOD_EnsembleModeling:

Error in checkForRemoteErrors(val) : 
 2 nodes produced errors; first error: task 1 failed - "task 1 failed - "error in evaluating the argument 'obj' in selecting a method for function 'get_predictions': 
new.env must be a 'matrix', 'data.frame', 'SpatRaster' or 'Raster' object""

But oddly occurs in 2 of the three models, with 1 model fully computed. I noticed changing em.by in BIOMOD_EnsembleModeling result in success but, as I said, don't know why it works using biomod environmental data.

Sorry for bringing you more issues :-).

Cheers,

Pablo

script_data.zip

@rpatin
Copy link
Contributor

rpatin commented May 24, 2023

Dear Pablo,
Do not worry about raising new issues, we are happy to help and answer nicely formatted issues 😉
And thank you for spotting the issue and providing a nice reproducible example 🙏 It appears that when using only one environmental variables with pseudo-absence dataset and merging all models together (em.by = 'all'), additional projection are needed (as individual models needs to be projected on the other PA dataset) and they failed when there was only one environmental layer (due to data.frame simplifying into vector).
It is now fixed and if you update to current github version this should now work fine : devtools::install_github('biomodhub/biomod2')
Of course, feel free to let us know if this does not fix your issue or if you encounter additional troubles.
Cheers,
Rémi

@pabloriul
Copy link
Author

Dear Rémy,

Thank you again! Everything working great now :-).

Cheers,

Pablo

@rpatin rpatin closed this as completed Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants