-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching logic improvement #432
Conversation
Explore finetuning
@DomInvivo I tested the caching branch and at closer inspection found some bugs that were not evident from the changelogs: The unit tests currently fail because we need to add After this is fixed, there is an issue because the graphium/graphium/data/dataset.py Lines 136 to 151 in 7fa23b6
However, in the graphium/graphium/data/datamodule.py Lines 1286 to 1297 in 7fa23b6
It seems With this change, it runs for |
@DomInvivo here are the main updates:
|
graphium/data/datamodule.py
Outdated
# else: | ||
# processed_train_data_path = None | ||
# processed_val_data_path = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to remove commented lines. Will do so shortly.
Codecov Report
@@ Coverage Diff @@
## main #432 +/- ##
==========================================
+ Coverage 64.74% 65.56% +0.81%
==========================================
Files 89 90 +1
Lines 8211 8226 +15
==========================================
+ Hits 5316 5393 +77
+ Misses 2895 2833 -62
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. Good work :)
@DomInvivo some finishing touches for the PR:
|
@DomInvivo should be ready to merge |
@WenkelF @DomInvivo I know I'm a little late to the party (apologies!), but I just saw you added a new CLI in this PR. You added a new command in the For your information, using the For the particular command you introduced in this PR and given the changes in #441, we could for example do: from .data import data_app
from hydra import compose, initialize
@app.command(name="prepare", help="Prepare the data in advance")
def cli(config_path: str, config_name: str) -> None:
with initialize(version_base=None, config_path=config_path):
cfg = compose(config_name=config_name, overrides=[])
run_prepare_data(cfg) This command is then available as: graphium data prepare
|
Changelogs
This is a draft PR looking to change the logic of how caching is done. The PR is motivated in #431 . I'll wait for comments and suggestions before pursuing this.
cache_data_path
option. We don't want to loaddataloading_from
option to select whether to load from Disk or RAMChecklist:
feature
,fix
ortest
(or ask a maintainer to do it for you).discussion related to that PR