You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the rollout storage, we currently flatten tensors along some dimensions (combining the rollout index dim and the time dim into one). This is awkward and means that every actor critic model needs to remember this arbitrary ordering.
Solution
Let's stop flattening and fix existing models to expect these unflattened tensors (fixing the RNNStateEncoder should go a long way towards this).
The text was updated successfully, but these errors were encountered:
Extend the Task abstraction (and also VectorSampledTasks) to:
Accept multiple actions at once, e.g. the step should change from action: int to Union[int, Sequence[int]].
Return a sequence of rewards and observations (one for each agent). I don't know if we'd rather return Sequence[RLStepResult] or, instead, update RLStepResult to return sequences of values when appropriate. I have a slight preference for the second variant (as it would make it easier in the future to return values that are common to all agents, e.g. a joint reward) but could be convinced otherwise.
Update the RolloutStorage class so that one dimension is dedicated to different agents.
Other changes to handle the above changes multiple in the light_engine
I'm sure I'm missing some additional places that will need to be updated.
Problem
In the rollout storage, we currently flatten tensors along some dimensions (combining the rollout index dim and the time dim into one). This is awkward and means that every actor critic model needs to remember this arbitrary ordering.
Solution
Let's stop flattening and fix existing models to expect these unflattened tensors (fixing the
RNNStateEncoder
should go a long way towards this).The text was updated successfully, but these errors were encountered: