Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor & standardize evaluation with Evaluator #287

Merged
merged 107 commits into from
Jun 6, 2024
Merged

Conversation

vict0rsch
Copy link
Collaborator

@vict0rsch vict0rsch commented Feb 15, 2024

  • remove test logic from main GFlowNetAgent
  • rename to eval() logic to be semantically distinct from (unit, integration etc.)-tests
  • clean up evaluation return values (to dict instead of messy tuple)
  • fix logger vs evaluator roles
  • isolate plotting logic

Check out tutorial and docs @ https://gflownet.readthedocs.io/en/evaluator

Questions / Need help

  1. clean up BaseEvaluator.compute_density_metrics
  2. define / discuss behaviour of .eval() in Scenario 1
  3. prevent buffer from systematically writing files (very annoying when evaluating a trained gfn)
  4. Update all existing experiment configs?
  5. use utils in test__gflownet_minimal_runs ? (for instance gflownet_for_tests in conftest.py or at least common.py:gflownet_from_config()

Scenario 1

$ python main.py user=$USER +experiments=scrabble/jay.yaml logger.do.online=False evaluator.metrics=\'l1,kl,jsd\' evaluator.checkpoints_period=5
from gflownet.evaluator.base import BaseEvaluator
gfn_run_dir = "path to previous dir"

gfne = BaseEvaluator.from_dir(gfn_run_dir)

results = gfne.eval()
figs = gfne.plot(**results["data"])

Scenario 2

$ python main.py user=$USER +experiments=icml23/ctorus device=cpu logger.do.online=False evaluator.checkpoints_period=20 
from gflownet.evaluator.base import BaseEvaluator
gfn_run_dir = "path to previous dir"

gfne = BaseEvaluator.from_dir(gfn_run_dir)

results = gfne.eval()
figs = gfne.plot(**results["data"])

for f, fig in figs.items():
    fig.savefig(f"{f}.pdf")

gflownet/evaluator/base.py Outdated Show resolved Hide resolved
gflownet/utils/logger.py Outdated Show resolved Hide resolved
@vict0rsch
Copy link
Collaborator Author

@alexhernandezgarcia what was the reason for storing metrics as attributes to the GFlowNetAgent? Can I safely remove this procedure from the evaluation / logging logic? Or would there be some side effects somewhere in your opinion? Looks ok from the code but making sure with you

@alexhernandezgarcia
Copy link
Owner

@alexhernandezgarcia what was the reason for storing metrics as attributes to the GFlowNetAgent? Can I safely remove this procedure from the evaluation / logging logic? Or would there be some side effects somewhere in your opinion? Looks ok from the code but making sure with you

No important reason. I think Nikita started it that way at some point and we just never followed it up. Like I said, we should not feel restricted by the way things are currently done.

@vict0rsch
Copy link
Collaborator Author

vict0rsch commented Feb 21, 2024

Controversial change @alexhernandezgarcia @michalkoziarski @carriepl @josephdviviano @AlexandraVolokhova thoughts?

5581e71 (#287)

If you agree this should be used in the eval_gflownet.py script and in tests.env.common::BaseTestsCommon.test__gflownet_minimal_runs

gflownet/gflownet.py Outdated Show resolved Hide resolved
@alexhernandezgarcia
Copy link
Owner

Controversial change @alexhernandezgarcia @michalkoziarski @carriepl @josephdviviano @AlexandraVolokhova thoughts?

5581e71 (#287)

If you agree this should be used in the eval_gflownet.py script and in tests.env.common::BaseTestsCommon.test__gflownet_minimal_runs

I am not against using a method to hide all the complexity in the tests or in the evaluation script. However, I think it is good that main.py shows explicitly the components that are needed and what depends on what.

Also, as a separate comments: I would try if possible to not include too many changes in a single PR - especially if they are out of the scope of the PR. In other words, I would try to spin off changes like this one in a separate PR. It's just to make the review process a tad easier.

@vict0rsch
Copy link
Collaborator Author

Controversial change @alexhernandezgarcia @michalkoziarski @carriepl @josephdviviano @AlexandraVolokhova thoughts?
5581e71 (#287)
If you agree this should be used in the eval_gflownet.py script and in tests.env.common::BaseTestsCommon.test__gflownet_minimal_runs

I am not against using a method to hide all the complexity in the tests or in the evaluation script. However, I think it is good that main.py shows explicitly the components that are needed and what depends on what.

Also, as a separate comments: I would try if possible to not include too many changes in a single PR - especially if they are out of the scope of the PR. In other words, I would try to spin off changes like this one in a separate PR. It's just to make the review process a tad easier.

Sure I'll move that to another PR to merge after this one then

@vict0rsch
Copy link
Collaborator Author

@alexhernandezgarcia I'm following those steps https://stackoverflow.com/a/30893291/3867406 -> have you pulled or fetched from THIS branch (evaluator)?

If I edit the commit history on MY machine then FORCE push to Github but you have a local version of evaluator that will have diverged we may end up blowing everything up, in which case I suggest we just keep the offending commit (5581e71 (#287))

@alexhernandezgarcia
Copy link
Owner

@alexhernandezgarcia I'm following those steps https://stackoverflow.com/a/30893291/3867406 -> have you pulled or fetched from THIS branch (evaluator)?

If I edit the commit history on MY machine then FORCE push to Github but you have a local version of evaluator that will have diverged we may end up blowing everything up, in which case I suggest we just keep the offending commit (5581e71 (#287))

You can go ahead without impact on my work / local copies.

@josephdviviano
Copy link
Collaborator

Controversial change @alexhernandezgarcia @michalkoziarski @carriepl @josephdviviano @AlexandraVolokhova thoughts?

5581e71 (#287)

If you agree this should be used in the eval_gflownet.py script and in tests.env.common::BaseTestsCommon.test__gflownet_minimal_runs

I can't see what this controversial change is

@vict0rsch
Copy link
Collaborator Author

Controversial change @alexhernandezgarcia @michalkoziarski @carriepl @josephdviviano @AlexandraVolokhova thoughts?
5581e71 (#287)
If you agree this should be used in the eval_gflownet.py script and in tests.env.common::BaseTestsCommon.test__gflownet_minimal_runs

I can't see what this controversial change is

Great, that means I reverted the commit appropriately :p

I think we should not have multiple places in the code that instantiate a GFlowNetAgent from a Hydra config. That's why I created gflownet.utils.common::gflownet_from_config(). I was suggesting to use it in main.py but Alex suggested we don't do it in this PR and mix things up (and seems against it for main.py though I think I can convince him 😄)

@josephdviviano
Copy link
Collaborator

Without total context I agree with @vict0rsch - this sounds like something that should be one and done.

for name, metric in results["metrics"].items():
print(f"{name:20}: {metric:.4f}")

data = results.get("data", {})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: understand why the argument {} is needed.

carriepl and others added 3 commits May 30, 2024 13:14
Co-authored-by: Alex <alexhg15@gmail.com>
Co-authored-by: Alex <alexhg15@gmail.com>
Co-authored-by: Alex <alexhg15@gmail.com>
@@ -51,21 +52,21 @@ policy:
shared_weights: False
checkpoint: backward

# Evaluator
Copy link
Collaborator

@carriepl carriepl May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format for these evaluator arguments is different than in the icml23/ctorus.yaml config file. Is that a problem?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure what this comment refers to exactly, but I would just say that the icml23/ctorus.yaml file is really old (January 2023) so it would be fine to deprecate it / adapt it if needed. Yes, it contains the experiments of a paper, but I believe it's ok to adapt it to the new state of the repo.



# def setup(sphinx):
# sphinx.connect("autoapi-skip-member", skip_util_classes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this part is meant to do. Is that something outdates that should be removed from the PR? Or is this a work in progress that should be finished and then uncommented?

scripts/eval_gflownet.py Outdated Show resolved Hide resolved
gflownet/gflownet.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
@carriepl
Copy link
Collaborator

carriepl commented Jun 4, 2024

Alright, at this point :

  • I think that the conflicts should be sorted out
  • the tests, black and isort are happy
  • I've done the first 3 out of 5 sanity check runs and they look great
  • I have not done the changes to the logger outlined by @vict0rsch. That will be the next step.

@alexhernandezgarcia
Copy link
Owner

Thanks Pierre Luc!

* I have not done the changes to the logger outlined by @vict0rsch. That will be the next step.

Could this be done in a new PR or should it be done before merging?

@carriepl
Copy link
Collaborator

carriepl commented Jun 5, 2024

Interesting... the CI is currently failing because of a test that was passing before my last commit which is only changing a commit. I guess that this is a test that fails very infrequently. At this point, I don't think it's related to this PR but I could be wrong.

@alexhernandezgarcia
Copy link
Owner

Interesting... the CI is currently failing because of a test that was passing before my last commit which is only changing a commit. I guess that this is a test that fails very infrequently. At this point, I don't think it's related to this PR but I could be wrong.

Don't worry. I am pretty sure this is related to tests of the Batch class that in this still branch use torch.equal but it should be fine with torch.isclose. All this has been changed in the famous big PR.

Copy link
Owner

@alexhernandezgarcia alexhernandezgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make a few additional changes after checking the sanity runs (Tetris topK figures were missing). The solution is a bit of a quick fix.

I have realised that a bunch of things will need more work, but this is a great step forwards since the Evaluator will give us the flexibility to extend the evaluation without having to do ugly additions to the former test() function of the GFN.

I have added a couple of quick issues about things that are needed as a reminder.

Great work everyone!!! I will merge.

@alexhernandezgarcia alexhernandezgarcia merged commit 2321aa4 into main Jun 6, 2024
1 check failed
@alexhernandezgarcia alexhernandezgarcia deleted the evaluator branch June 6, 2024 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants