Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaps in the projected ensemble maps #242

Closed
jpmaalouf opened this issue Apr 17, 2023 · 7 comments
Closed

Gaps in the projected ensemble maps #242

jpmaalouf opened this issue Apr 17, 2023 · 7 comments
Labels
modeling question About modeling workflow and output

Comments

@jpmaalouf
Copy link

jpmaalouf commented Apr 17, 2023

Issue

We are having an issue on the final projections from ensemble models. In all ensemble models except CV, the map includes projection gaps corresponding to NA projection values.

This is surprising as CV incorporates the mean in its formula. So if CV is computed, how come the mean is not?

Here is the basic biomod code we used, as well as three projection maps :

  • Mean showing a white gap (northwest)
  • Median showing a white gap (northwest)
  • Coefficient of variation (no gaps)

Code

Biomod parameters

Parametres=BIOMOD_ModelingOptions(
  GLM = list(type="quadratic", interaction.level = 0), 
  GBM = list(n.trees = 1000),                          
  GAM=list(algo = "GAM_mgcv", k=4))  

Biomod modeling

MODEL <- BIOMOD_Modeling(bm.format = BioDATA, # BioDATA is an object returned from BIOMOD_FormatingData()
                        modeling.id = "Test",
                        models = c("GLM", "FDA", "GBM", "RF", "MARS", "GAM"),
                        bm.options = Parametres,
                        nb.rep = 2,
                        data.split.perc = 70,
                        metric.eval = c('TSS','ROC'),
                        var.import = 2,
                        do.full.models = FALSE)

EnsMOD <- BIOMOD_EnsembleModeling(bm.mod = MODEL,
                               models.chosen="all",
                               em.by = "all",
                               metric.select = "TSS",
                               metric.select.thresh = THRESH,
                               metric.eval = c("ROC", "TSS"),
                               prob.mean = T, prob.cv=T, prob.ci=T,
                               prob.ci.alpha=0.05, 
                               prob.median=T,
                               committee.averaging = T,
                               prob.mean.weight = T,
                               prob.mean.weight.decay="proportional")

EnsPROJ<-BIOMOD_EnsembleForecasting(bm.em = EnsMOD,
                                    proj.name = "Test",
                                    new.env = EnvData,
                                    models.chosen = 'all',
                                    metric.binary = 'all',
                                    metric.filter = 'all')

Projections

Mean (gap in the northwest)

image

Median (gap in the northwest)

image

Coefficient of variation (no gaps)

image

@jpmaalouf jpmaalouf changed the title Gaps in the proejected ensemble maps Gaps in the projected ensemble maps Apr 17, 2023
@rpatin rpatin added the modeling question About modeling workflow and output label Apr 17, 2023
@rpatin
Copy link
Contributor

rpatin commented Apr 17, 2023

Bonjour Jean-Paul,
Thank you for submitting the issue on github, it is really appreciated 🙏
Your issue highlight a code difference between mean, median and CV calculation. Mean and median uses na.rm = FALSE while CV uses na.rm = TRUE. So EMcv is able to calculate mean/sd where EMmean and EMmedian are not. This also imply that you have some individual models that predict NA in the area concerned.

A few conclusion:

  1. About na.rm in BIOMOD_EnsembleForecasting:
    1. We are discussing internally to harmonize the na.rm decision and will likely set na.rm = TRUE as default for all ensemble algorithm.
    2. We will also likely add an option to set na.rm = FALSE
    3. Those changes will likely be available in the next days on github but depending on the version you may need to adjust other things or re-run your workflow.
    4. Alternatively you can update your CV map by removing the zone predicted as NA in other zones.
    5. If your workflow is too long to run but you want values in the zone in question (i.e. having na.rm = TRUE for all), I can also set a dedicated biomod2 branch for you, depending on the version you are currently using.
  2. About the missing values in your individual model projection:
    1. they may be caused by missing values in the environment data
    2. they can also be caused by NaN predictions because of environmental variables are out of calibration range
    3. if you want to understand what is happening, I would look at individual model predictions to identify which algorithm or run are affected and at your environmental variables distribution in the given zone.

Note that you can also identify out of range predictions by using build.clamping.mask = TRUE in BIOMOD_Projection or BIOMOD_EnsembleForecasting. Some areas will then be filled with NA when variables are out of calibration ranges.

Feel free to update the issue if you have additional information or question. I will update it when the fix to na.rm will be available.

Best regards,
The biomod2 team

@jpmaalouf
Copy link
Author

jpmaalouf commented Apr 17, 2023

Bonjour the biomod2 team!

Thank you very much for your prompt and efficient reply.

  • Great news if we'll be able to manage this issue in the upcoming version on github in the coming days. We'd rather have complete maps, rather than excluding NA pixels in the CV map, so we'll just wait for this version to get published. Any information on how long it will take you to publish it?
  • Also, Thank you for offering to dedicate a specific branch to prevent us from running our workflow from be start :). No need to do it. Once the new version is published, we'll just try to adapt the code to the individual models which are already computed (on the latest version: 4.2-2). I'll let you know if things don't work properly.
  • Thank you for the tips that help identify individual models that failed to predict. Surprisingly, I was expecting that models from specific methods (e.g. all FDA models) would fail to predict, but actually not. Maybe this behavior could linked to differences in sampled Pseudo-Absences, which relates to your idea of some observation datasets not covering the variable range widely enough.

Cheers!

Jean Paul

@rpatin
Copy link
Contributor

rpatin commented Apr 18, 2023

Bonjour Jean Paul,

  • We just published the new version this morning with the fix to na.rm. You can download it with devtools::install_github('biomodhub/biomod2'). It will likely be available on CRAN in the following weeks.
  • If you are lucky you may have to re-run only the BIOMOD_EnsembleForecasting part, but it is also likely that you'll have to start again. Let me know if that happens and need a dedicated branch inserting the correction for version 4.2-2.
  • I would also have expected some methods to fail but it is interesting to see that it is more closely related to Pseudo-Absences dataset (or cross-validation repetition).

Cheers,
Rémi

@jpmaalouf
Copy link
Author

Bonjour Rémi !

Thank you very much for the na.rm fix. I downloaded the updated version on github. The help documentation of the function now includes the na.rm argument. I tried re-launching the whole workflow from BIOMOD_Modeling(), and it blocked at BIOMOD_EnsembleForecasting() with the error Error in { : task 6 failed - "[write] unknown option(s): na.rm".

Cheers

Jean Paul

@rpatin
Copy link
Contributor

rpatin commented Apr 18, 2023

Bonjour Jean Paul,

Indeed there was a small mistake on my side. na.rm argument was on the wrong side of a bracket in some of the calculations (especially for EMci algorithm). This should be fixed now if you update again. You should just have to re-launch BIOMOD_EnsembleForecasting and it should (hopefully) work.

Cheers,
Rémi

@rpatin
Copy link
Contributor

rpatin commented Apr 19, 2023

Bonjour Jean Paul,
There was a tiny mistake on my end when implementing the na.rm argument (a misplaced bracket). It is now corrected, sorry for that 🙏
If you update again to current github version, it should now work and you should just have to rerun BIOMOD_EnsembleForecasting hopefully.
Cheers,
Rémi

@rpatin rpatin closed this as completed May 17, 2023
@jpmaalouf
Copy link
Author

Bonjour Rémi,

Never got the chance to say big thank you for what you did on this issue :). Il n'est jamais trop tard. Our project is now done and delivered.

All the best,

JP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modeling question About modeling workflow and output
Projects
None yet
Development

No branches or pull requests

2 participants