Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Study and discuss design and implementation of online measurements and actions #44

Closed
tsirif opened this issue Feb 27, 2018 · 3 comments
Labels
enhancement Improves a feature or non-functional aspects (e.g., optimization, prettify, technical debt) feedback Discuss the design decision in the development of something

Comments

@tsirif
Copy link
Member

tsirif commented Feb 27, 2018

Discussion from #36.

X:

Based on trial_to_tuple code, only hyper-parameter values are passed to the score_handle, not the results? The score_handle need to keep internally results information to match hyper-parameters with them?

C:

I restrict that we should score a predictive performance scalar for trials which have not been evaluated yet. Algorithm will score based on what it has seen already from observe.

X:

score_handle could receive suspended or interrupted trials, which means there could be measurements available for the scoring functions. An example where this is necessary is FreezeThaw.

C:
So what kind of interface do you propose for this? Consider that algorithms speak Python data structures and Numpy only. I move this to an issue, because we should study FreezeThaw and other possible client-server stuff, and how are we going to save online measurements and replies (I suggest reusing Trial objects and trials database).

Also, we should have in mind that future exploitation by RL projects, like BabyAI Game, is possible. So that an environment (user script-client) could be used to train asynchronously and distributed agents (algorithm-server). A static trial (set of hyperparameters usually) means the training environment's variation factors and a game instance's initial state (this is params); results is possibly episode's return. A dynamic trial means an observation tensor from the environment + a reward scalar + possibly a set of eligible actions (this is results), and an eligible action chosen as a response of sensing this information (this is params).

@tsirif tsirif added the enhancement Improves a feature or non-functional aspects (e.g., optimization, prettify, technical debt) label Feb 27, 2018
@tsirif
Copy link
Member Author

tsirif commented Mar 6, 2018

X:

@tsirif Do you think observe and judge could be merged? Otherwise I think they should be next to each other with a clear distinction between them in the doc. Beside the fact that one is on the results and the other on intermediate results, what is the difference?

The essential differences are only of purpose, not of structure.
So that:

  1. observe takes points and results and returns None; however suggest is called to return new samples.
  2. judge takes point and measurements and possibly returns serializable data.

Having said that, it can be the case that whatever part of the algorithm, is expected to act dynamically with a trial under evaluation, could be thought of as a subclass of BaseAlgorithm and do exactly the same stuff as it.

So an alternative to what exists right now could be:

  1. DynamicAlgo inherits BaseAlgorithm and implements default reactions, also mixes in property should_suspend and possibly score. In its init parameters needs necessarily as positional arguments a Space object (as a BaseAlgorithm) and a Channel object (I will come to that earlier).

  2. PrimaryAlgo inherits DynamicAlgorithm. It holds necessarily a BaseAlgorithm and possibly a DynamicAlgorithm. It is delegated also to route calls to observe/suggest correctly to its components, based on the question "is a trial currently active?".

  3. An appropriate Channel object exposed by implementations of BaseAlgorithm and it fulfills an API proposed by a concrete implementation of a DynamicAlgo.

Example:

FreezeThaw implements and inherits DynamicAlgo. It also exposes an API through an abstract class to-be-implemented FreezeThaw.Channel. FreezeThaw necessarily owns an object from this Channel.

A certain algorithm, implementation of BaseAlgorithm, wants to expose its state information in a manner which is useful to FreezeThaw. The developer of this algorithm expresses compatibility with FreezeThaw by implementing the interface class FreezeThaw.Channel. A concrete instance is retrieved by calling a BaseAlgorithm's property (perhaps BaseAlgorithm.channel) which will conditionally instantiate on the optional existence of FreezeThaw algorithm on the system (poll for DynamicAlgo.__subclasses__() or a corresponding Factory class for its types!)

@tsirif tsirif added the feedback Discuss the design decision in the development of something label Mar 6, 2018
@bouthilx
Copy link
Member

bouthilx commented Mar 7, 2018

I agree with all that. I agree there would be two different methods, one for final observations and one for runtime exchanges with the process. However, the documentation of the methods is not clear enough about that. It should be cristal clear that observe will take a result, which is what the process send when it is completed, and might change the algorithm internal state while judge will take measurements, which are send from the process during it's lifetime, which might change the algorithm internal state. The doc of each method should also refer to each other to help understand the difference.

I insist however on the fact that the method names observe, judge and score are not explicit enough. I'll try to find alternatives, you are welcome to suggest (:laughing:) some.

By the way, the discussion diverged from the initial point. The initial discussion was solely about what algorithms need to score trials. Current implementation pass hyper-parameters. I believe it should pass entire history, including results and measurements.

@bouthilx
Copy link
Member

Discussion closed. Implementation of dynamic algorithms in progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves a feature or non-functional aspects (e.g., optimization, prettify, technical debt) feedback Discuss the design decision in the development of something
Projects
None yet
Development

No branches or pull requests

2 participants