Support train/test/validation splitting #11

kkovary · 2024-04-01T22:31:19Z

As far as I can tell, the splitters will only do train/test splits. It would be really useful to allow for a third validation split.

cwognum · 2024-04-01T22:35:52Z

Hi @kkovary,

Thank you for your question!

Similar to how Scikit-learn does this, you can achieve this by simply splitting the train set again. For example:

import datamol as dm
from splito import ScaffoldSplit

# Load some data
data = dm.data.chembl_drugs()
all_smiles = data["smiles"].tolist()

# Generate the trainval-test split
splitter = ScaffoldSplit(smiles=all_smiles, test_size=0.2)
trainval_idx, test_idx = next(splitter.split(X=all_smiles))

# Generate the train-val split
trainval_smiles = all_smiles[trainval_idx]
splitter = ScaffoldSplit(smiles=trainval_smiles, test_size=0.2)
train_idx, val_idx = next(splitter.split(X=trainval_smiles))

I do agree that with our current setup, this is a bit verbose. Having something like #8 would probably help to make this easier!

Let me know if that helps!

kkovary · 2024-04-01T23:12:40Z

Thanks so much!

cwognum closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support train/test/validation splitting #11

Support train/test/validation splitting #11

kkovary commented Apr 1, 2024

cwognum commented Apr 1, 2024

kkovary commented Apr 1, 2024

Support train/test/validation splitting #11

Support train/test/validation splitting #11

Comments

kkovary commented Apr 1, 2024

cwognum commented Apr 1, 2024

kkovary commented Apr 1, 2024