-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Study and discuss design and implementation of online measurements and actions #44
Comments
X:
The essential differences are only of purpose, not of structure.
Having said that, it can be the case that whatever part of the algorithm, is expected to act dynamically with a trial under evaluation, could be thought of as a subclass of So an alternative to what exists right now could be:
Example:
A certain algorithm, implementation of |
I agree with all that. I agree there would be two different methods, one for final observations and one for runtime exchanges with the process. However, the documentation of the methods is not clear enough about that. It should be cristal clear that observe will take a result, which is what the process send when it is completed, and might change the algorithm internal state while judge will take measurements, which are send from the process during it's lifetime, which might change the algorithm internal state. The doc of each method should also refer to each other to help understand the difference. I insist however on the fact that the method names By the way, the discussion diverged from the initial point. The initial discussion was solely about what algorithms need to score trials. Current implementation pass hyper-parameters. I believe it should pass entire history, including results and measurements. |
Discussion closed. Implementation of dynamic algorithms in progress |
Discussion from #36.
X:
C:
X:
C:
So what kind of interface do you propose for this? Consider that algorithms speak Python data structures and Numpy only. I move this to an issue, because we should study FreezeThaw and other possible client-server stuff, and how are we going to save online measurements and replies (I suggest reusing
Trial
objects and trials database).Also, we should have in mind that future exploitation by RL projects, like BabyAI Game, is possible. So that an environment (user script-client) could be used to train asynchronously and distributed agents (algorithm-server). A static trial (set of hyperparameters usually) means the training environment's variation factors and a game instance's initial state (this is
params
);results
is possibly episode's return. A dynamic trial means an observation tensor from the environment + a reward scalar + possibly a set of eligible actions (this isresults
), and an eligible action chosen as a response of sensing this information (this isparams
).The text was updated successfully, but these errors were encountered: