New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the evaluation time spent #15
Comments
Evaluation for these models is slow. It currently takes us ~16 hrs to finish the unseen validation / test sets using 7 GPUs on an AWS EC2 There isn't any other good way to evaluate a model. You could imagine some sort of comparison of ground truth and predicted action sequences but this is likely to be unreliable for a few reasons:
Overall, we are aware that evaluation on this dataset is resource intensive. However, I am closing this issue as I do not think we are in a position to make a commitment to improving this aspect of it at the moment. |
I looked at the code and there are two places that can speed up the evaluation:
Then there is a question about the paper, each game is divided into multiple edh instances. will data leakage be caused during the training process? Because the longer edh instance's action sequences contain shorter edh instance action sequence from the same game . I mean if I train the longer sequence first, and the model will know the answer for the shorter edh action sequence. |
Sorry, please ignore the second acceleration suggestion |
Could you please let me know how much time was spent on the evaluation? It took me about two days to evaluate with 4 processes, and I found that a large part of the time was spent in the state initialization of edh instance, as well as reaching max_api_fails and max_traj_steps. And the time for the agent to take a step is also very long, which is very dependent on the frequency of the CPU. Can you tell me the settings of your experimental equipment? and is there any other way to evaluate the trained model?
The text was updated successfully, but these errors were encountered: