Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious differences in biomod2 versions seen in model results #216

Closed
suicmezburak opened this issue Mar 18, 2023 · 5 comments
Closed

Serious differences in biomod2 versions seen in model results #216

suicmezburak opened this issue Mar 18, 2023 · 5 comments
Labels
bug Something isn't working modeling question About modeling workflow and output

Comments

@suicmezburak
Copy link

suicmezburak commented Mar 18, 2023

I created models using GAM and RF algorithms on biomod2 (version 3.5.1) in the form of 10 replicates using 1606 occurence data, 10 replicates of the same number of pseudo absence data, and 7 environmental variables. I used the settings recommended by biomod when creating the model. When I create a model using the same data with one repetition, the model performance values are as follows.

, , GAM, RUN1, PA1

    Testing.data Cutoff Sensitivity Specificity
TSS        0.850  348.0      99.688      85.358
ROC        0.959  345.5      99.688      85.358

, , RF, RUN1, PA1

    Testing.data Cutoff Sensitivity Specificity
TSS        0.879    474      96.885      90.966
ROC        0.964    476      96.885      90.966

(I did not give the values of all repetitions because it would be too long, but I obtained similar model performance values for all repetitions.)

When I create models using the same data and the same settings and GAM and RF algorithms this time using biomod2 (version 4.2.2), the model performance values are as follows.

                 full.name  PA  run algo metric.eval cutoff sensitivity specificity calibration validation evaluation
1 Quercus_PA1_RUN1_GAM PA1 RUN1  GAM         TSS    586      97.588      54.086       0.517      0.551         NA
2 Quercus_PA1_RUN1_GAM PA1 RUN1  GAM         ROC    580      97.665      54.086       0.654      0.691         NA
3  Quercus_PA1_RUN1_RF PA1 RUN1   RF         TSS    641      98.054      99.844       0.978      0.798         NA
4  Quercus_PA1_RUN1_RF PA1 RUN1   RF         ROC    642      98.054      99.844       0.999      0.935         NA

As you can see, there are very serious differences in the new version. There are very low ROC AND TSS values for GAM, and very high ROC values and specifity in the RF algorithm, perhaps indicating an overfit. What is the reason for this difference? I could not identify the reason why there is such a drastic difference even though ı used same data and the same variables.

Thanks in advance.

@rpatin rpatin added the modeling question About modeling workflow and output label Mar 20, 2023
@rpatin
Copy link
Contributor

rpatin commented Mar 20, 2023

Hi @suicmezburak,
Thank you for reporting 🙏

First note that a lot has changed for the package since version 3.5.1 and in general we would trust better the current version.

If I summarize your issue (please correct me if I misunderstood something):

  • you built 10 PA dataset x 10 cross-validation repetitions and then ran the resulting 100 models with both RF and GAM
  • you did that with both biomod2 version 3.5.1 and 4.2-2.
  • Pseudo-absences and cross-validation dataset were sampled independently for the two version, i.e. they are not the same in both situation.
  • With version 3.5.1, ROC and TSS had similar values for RF and GAM (ROC ~ 0.96 and TSS ~ 0.85) for all pseudo-absences dataset and cross-validation repetition.
  • With version 4.2-2, ROC and TSS have very different and much lower values for GAM compared to RF and this, for all pseudo-absences dataset and cross-validation repetition.
  • I also noticed that the cutoff selected are much lower in version 4.2 compared to 3.5 for both GAM and RF. Is it observed for all pseudo-absences dataset and cross-validation repetition ?

With such an overview, we have thought to no obvious solution.

  1. The first suspect would be the difference in pseudo-absence dataset and cross-validation repetitions between the two version. However if you built 10 pseudo absences dataset times 10 cross-validation repetitions that would be quite surprising. In your post it was not explicit that you built 10 pseudo absences dataset. Could you confirm that ?
  2. Alternatively that could be some code error in one of the version. However it is quite unlikely as the code used is highly shared between both algorithm, so we could not understand why this would improve RF results while downgrading GAM. We have not ruled out this hypothesis yet, but the first code review did not reveal anything suspicious and we have few hints suggesting this would be such a code error.

With that in mind, could you:

  1. Check that our understanding of the situation is correct and add information if we missed something.
  2. Share the two codes used to do the modeling for version 3.5.1 and 4.2-2.
  3. I can also try to reproduce what you observe, but I would need to use your data. With our example data it did not lead to such differences. If you want you can send your data to remi.patin@univ-grenoble-alpes.fr so that I can try to reproduce and solve the issue.

Thanks in advance,
Rémi

@suicmezburak
Copy link
Author

Hi @rpatin
Thank you for respond 🙏

  • you built 10 PA dataset x 10 cross-validation repetitions and then ran the resulting 100 models with both RF and GAM

  • you did that with both biomod2 version 3.5.1 and 4.2-2.

  • Pseudo-absences and cross-validation dataset were sampled independently for the two version, i.e. they are not the same in both situation.

  • With version 3.5.1, ROC and TSS had similar values for RF and GAM (ROC ~ 0.96 and TSS ~ 0.85) for all pseudo-absences dataset and cross-validation repetition.

  • With version 4.2-2, ROC and TSS have very different and much lower values for GAM compared to RF and this, for all pseudo-absences dataset and cross-validation repetition.
    There are no misunderstandings so far. The answer to all of the above questions is yes.

  • I also noticed that the cutoff selected are much lower in version 4.2 compared to 3.5 for both GAM and RF. Is it observed for all pseudo-absences dataset and cross-validation repetition ?
    Yes, cutoff selected for GAM and RF are much lower than 4.2.2 in version 3.5.1 for all pseudo-absence dataset and cross-validation iteration.

Occurence data used for modeling was same number for two versions, which is 1606.
Environmental variables used for modeling was also the same ones for two versions: BIO5, BIO7, BIO8, BIO11, BIO15, BIO16, BIO17 (from WorldClim v1.4) extent for raster data = (-11, 45, 28, 60)

When modeling with version 3.5.1, I used the raster package to crop environmental variables, and in version 4.2.2 I used the terra package(version 1.7-3).

Codes used modeling for version 3.5.1

quercus_data <- 
  BIOMOD_FormatingData(
    resp.var = respvar,
    expl.var = current_bio,
    resp.xy = respxy,
    resp.name = respname,
    PA.nb.rep = 10,
    PA.nb.absences = 1606, 
    PA.strategy = "random",
    na.rm = TRUE
  )


## Define individual models options
modeloption <- BIOMOD_ModelingOptions()


## Modelling
# by algorithm
quercus_model_output <- 
  BIOMOD_Modeling(
    data = quercus_data,
    models = c("GAM", "RF"),
    models.options = modeloption,
    NbRunEval = 10, 
    DataSplit = 80,
    VarImport = 3,
    models.eval.meth = c("TSS", "ROC"),
    do.full.models = FALSE,
    rescal.all.models = FALSE,
    SaveObj = TRUE,
    modeling.id = paste(respname,"modelling", sep = "")
  )

Codes used modeling for version 4.2.2

quercus_data <- 
  BIOMOD_FormatingData(
    resp.name = respname,
    resp.var = respvar,
    expl.var = current_bio,
    resp.xy = respxy,
    PA.nb.rep = 10,
    PA.nb.absences = 1606, 
    PA.strategy = "random"
  )


## Define individual models options
modeloption <- BIOMOD_ModelingOptions()


## Modelling
# by algorithm
quercus_model_output <- 
  BIOMOD_Modeling(
    bm.format = quercus_data,
    bm.options = modeloption,
    modeling.id = "all_Models",
    models = c("GAM", "RF"),
    nb.rep = 10, 
    data.split.perc = 80,
    metric.eval = c("TSS", "ROC"),
    var.import = 3,
    save.output = TRUE,
    do.full.models = FALSE
  )

Unfortunately, I can't send my data(it's for research paper), but to test whether the problem is caused by the data, I tried a ENM scifientific paper that shared its data as open source. I faced the same issues with this species as well, ie version 3.5.1 gave similar results for all pseudo absene data and cross-validation repeats for GAM and RF(10PA + 10CV = 100 Models), whereas version 4.2.2 had the same issue I observed above. If appropriate, I can share the data and environmental variables that I use of this species.

Thank you for your time.

@rpatin
Copy link
Contributor

rpatin commented Mar 21, 2023

Hi @suicmezburak,
Thank you for the additionnal information 🙏
If you have open source data from another scientific paper, that would be great to reproduce your issue.
Best,
Rémi

@rpatin
Copy link
Contributor

rpatin commented Mar 22, 2023

Hi @suicmezburak,
Thank you for the data 🙏 With that, I had no issue to reproduce the problem.
Here is what we found:

  1. There was a mistake in biomod2 version 4.2-2, leading to additional subsampling of the data used for some of the algorithm (e.g. GAM but not RF). This was likely responsible for the larger difference between GAM and RF in current version.
  2. There was a second mistake in biomod2 version 4.2-2: evaluation metric calculated on the validation dataset were using the threshold from the validation dataset instead of the calibration dataset. This lead to optimistic values for TSS.
  3. Version 3.5.1. had the same issue as (2), but it was by design rather than an error.

Now that it is corrected, values obtained by biomod2 version 4.2-3 are much more satisfying. TSS is still lower than in version 3.5.1 but it is expected to do so (because it is now calculated with the threshold from the calibration data).

The good thing is that our work on the next biomod2 version had already fixed (1) (as a side effect of some code cleaning). And that (2) was easy to fix.

Next version fixing those different bugs should be out in the upcoming weeks.

Thanks again for helping us find the errors, it is really appreciated to have such support 🙏

Best,
Rémi

@rpatin rpatin added the bug Something isn't working label Mar 22, 2023
@suicmezburak
Copy link
Author

Hi @rpatin ,
I would like to thank the biomod2 team for their time. I wish you continued success in your work.

Best wishes,
Burak

@rpatin rpatin closed this as completed Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working modeling question About modeling workflow and output
Projects
None yet
Development

No branches or pull requests

2 participants