-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data handling needs to be restructured #10
Comments
Hi @sfluegel05, I have doubt regarding the issue. Do we have to implement the above restructuring only for chebi dataset or for all other datasets too. |
This is only for the ChEBI datasets. The other datasets have their own structure. That should be adjusted as well at some point, but that would be a different issue |
A special case for the data splits is the Use caseYou want to compare two models trained on different versions of ChEBI. In order to make a fair comparison, you need to evaluate both models on the same test set (and train them on training sets that don't overlap with this test set). Tasks
Most of the functionality is already implemented for that, it just needs to be adapted to the dynamic data splits. In the end, no new files should be created for specific splits. |
Status quo
chebi.obo
,classes.txt
, train/test/val splits (unprocessed SMILES, labels)Goal
chebi.obo
(raw)data/ChEBIX/chebi_version/raw
/data/ChEBIX/chebi_version/processed/encoding
data/chebi_version/raw
data/chebi_version/ChEBIX/processed
data/chebi_version/ChEBIX/processed/encoding
Things to keep in mind (for later implementations)
The text was updated successfully, but these errors were encountered: