Skip to content

Conversation

@aditya0by0
Copy link
Member

@aditya0by0 aditya0by0 commented Jul 3, 2024

  • When initialising a dataset, the user has the option to provide a file path to csv file that contains a list of chebi ids and their assignment to a dataset (either train, validation or test). Then, instead of creating a new split, the provided split will be used
  • When initialising the dataset without providing such a file, the splits will get created automatically (as before) and the resulting split is saved as a csv file
  • When running the migration script, the chebi data files will be copied into the new structure. For the splits, the split files are combined into one file and a csv file for the split assignment will be created in addition.

aditya0by0 and others added 27 commits May 27, 2024 22:51
Updates
- Evaluation notebook
- classification.py
- utils.py
- pre-commit + some suggestions
- removed list comprehension from data split logic
- used dataframe operations instead as they are faster
- remove looping for msss.split as no need for it, used `next` instead
@aditya0by0 aditya0by0 self-assigned this Jul 3, 2024
@aditya0by0 aditya0by0 requested a review from sfluegel05 July 3, 2024 18:08
@aditya0by0 aditya0by0 linked an issue Jul 3, 2024 that may be closed by this pull request
@aditya0by0
Copy link
Member Author

@sfluegel05 sfluegel05 marked this pull request as ready for review July 5, 2024 11:48
@sfluegel05
Copy link
Collaborator

As far as I am aware, this branch originates from the brach used in PR #29. Therefore I will just merge this directly into dev.
I also added some minor changes: The cli for the migration script now uses jsonargparse, which means that one can directly use a config file instead of having to resolve the individual parameters for each class with separate arguments (this also covers other ChEBI configuration such as ChEBIOverXPartial). If specific files are not found, they are just skipped. And I added some hints for users.

@sfluegel05 sfluegel05 merged commit 7dc4e63 into dev Jul 5, 2024
@sfluegel05 sfluegel05 deleted the data_migration branch July 5, 2024 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data migration / setting fixed splits

3 participants