training set, validation set, and test set #475

chenyongpeng1 · 2024-06-11T05:37:19Z

Hello， biomod2 team！
Species records are randomly divided into a training set (75% of the data) for model calibration and a test set (25% of the data) for validation.

--This is somewhat misleading. Typically there should be three dataset parts: : a training set, a validation set, and a test set. The validation set is used to fine-tune the model's hyperparameters, while the test set is used to assess the model accuracy to unseen data.
Whether biomod2 contains validation sets. If not, can you add a validation set and use techniques such as cross-validation to mitigate the overfitting problem?

MayaGueguen · 2024-06-11T08:20:04Z

Hello Chenyong 👋

Actually, we do have calibration, validation and evaluation datasets 🙂

while calibration and validation will come from your original dataset
evaluation is a completely different dataset that will only be used, if provided, to compute evaluation metrics using models built over the original dataset

Datasets are given to the BIOMOD_FormatingData :

original dataset is given through resp.var, expl.var, resp.xy parameters
while evaluation dataset is provided through eval.resp.var, eval.expl.var, eval.resp.xy parameters

Cut of the original dataset into calibration and validation is done within BIOMOD_Modeling function through CV.[...] parameters. (This is done by calling the bm_CrossValidation function, which you can also use by your own if necessary)

ℹ️ Note that there are several possibilities to build your calibration and validation datasets. You can find more details within the Cross-validation vignette.

When calling to get_evaluations function to retrieve your evaluation values, you will see 3 columns in your output table : calibration, validation and evaluation. They will only be filled if the corresponding dataset was provided, so evaluation will be empty if you did not provide evaluation dataset to BIOMOD_FormatingData, and validation column will be empty if for example you ask to build a model with all the data (CV.do.full.models = TRUE in BIOMOD_Modeling).

➡️ Note also that we have plenty of tutorial and documentation onto our website, and you can have an overview of the package functions within this presentation. So do not hesitate to have a look 👀

Hope it helps,
Maya

chenyongpeng1 · 2024-06-12T02:10:14Z

Hello，Maya
Thank you for your patience and attention, and I have studied a lot.
but I still have some questions：

Can the original data set be automatically divided into three parts through the CV Function, for example, one part is calibration, one part is validation, and one part is test or evaluation according to the ratio of 70:10:20？and eval.resp.var, eval.expl.var, eval.resp.xy parameters are clearly the same as before.
I'm a little confused：biomod2 allows you to use different strategies to separate your data into a calibration dataset and a validation dataset for the cross-validation. But I have confused that，in this picture，which is validation set？In other words，validation dataset in Biomod2 is equivalent to validation fold from training set，or test set in this picture.

Looking forward to your reply! Thanks again!

Best wishes,
Chenyongpeng

MayaGueguen · 2024-06-13T07:21:49Z

Hello Chenyongpeng,

Can the original data set be automatically divided into three parts through the CV Function, for example, one part is calibration, one part is validation, and one part is test or evaluation according to the ratio of 70:10:20？and eval.resp.var, eval.expl.var, eval.resp.xy parameters are clearly the same as before.

No. As the evaluation dataset is supposed to be independant data from the one used to build the models, original data can only be divided in calibration and validation through biomod2.

I'm a little confused：biomod2 allows you to use different strategies to separate your data into a calibration dataset and a validation dataset for the cross-validation. But I have confused that，in this picture，which is validation set？In other words，validation dataset in Biomod2 is equivalent to validation fold from training set，or test set in this picture.

In your picture :

Training Data set = what I called original data in previous answer
- Training fold = calibration dataset in biomod2
- Validation fold = validation dataset in biomod2
Test Data set = evaluation dataset in biomod2

Hope it helps,
Maya

MayaGueguen closed this as completed Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training set, validation set, and test set #475

training set, validation set, and test set #475

chenyongpeng1 commented Jun 11, 2024 •

edited

Loading

MayaGueguen commented Jun 11, 2024

chenyongpeng1 commented Jun 12, 2024 •

edited

Loading

MayaGueguen commented Jun 13, 2024

training set, validation set, and test set #475

training set, validation set, and test set #475

Comments

chenyongpeng1 commented Jun 11, 2024 • edited Loading

MayaGueguen commented Jun 11, 2024

chenyongpeng1 commented Jun 12, 2024 • edited Loading

MayaGueguen commented Jun 13, 2024

chenyongpeng1 commented Jun 11, 2024 •

edited

Loading

chenyongpeng1 commented Jun 12, 2024 •

edited

Loading