Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support train/test/validation splitting #11

Closed
kkovary opened this issue Apr 1, 2024 · 2 comments
Closed

Support train/test/validation splitting #11

kkovary opened this issue Apr 1, 2024 · 2 comments

Comments

@kkovary
Copy link

kkovary commented Apr 1, 2024

As far as I can tell, the splitters will only do train/test splits. It would be really useful to allow for a third validation split.

@cwognum
Copy link
Contributor

cwognum commented Apr 1, 2024

Hi @kkovary,

Thank you for your question!

Similar to how Scikit-learn does this, you can achieve this by simply splitting the train set again. For example:

import datamol as dm
from splito import ScaffoldSplit

# Load some data
data = dm.data.chembl_drugs()
all_smiles = data["smiles"].tolist()

# Generate the trainval-test split
splitter = ScaffoldSplit(smiles=all_smiles, test_size=0.2)
trainval_idx, test_idx = next(splitter.split(X=all_smiles))

# Generate the train-val split
trainval_smiles = all_smiles[trainval_idx]
splitter = ScaffoldSplit(smiles=trainval_smiles, test_size=0.2)
train_idx, val_idx = next(splitter.split(X=trainval_smiles))

I do agree that with our current setup, this is a bit verbose. Having something like #8 would probably help to make this easier!

Let me know if that helps!

@kkovary
Copy link
Author

kkovary commented Apr 1, 2024

Thanks so much!

@cwognum cwognum closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants