Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unflatten tensors in RolloutStorage #108

Closed
Lucaweihs opened this issue Aug 5, 2020 · 2 comments · Fixed by #141
Closed

Unflatten tensors in RolloutStorage #108

Lucaweihs opened this issue Aug 5, 2020 · 2 comments · Fixed by #141
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Lucaweihs
Copy link
Collaborator

Lucaweihs commented Aug 5, 2020

Problem

In the rollout storage, we currently flatten tensors along some dimensions (combining the rollout index dim and the time dim into one). This is awkward and means that every actor critic model needs to remember this arbitrary ordering.

Solution

Let's stop flattening and fix existing models to expect these unflattened tensors (fixing the RNNStateEncoder should go a long way towards this).

@Lucaweihs Lucaweihs added the enhancement New feature or request label Aug 5, 2020
@jordis-ai2
Copy link
Collaborator

I'm working on it

@anikem anikem added this to the 0.1 milestone Aug 12, 2020
@jordis-ai2
Copy link
Collaborator

jordis-ai2 commented Aug 15, 2020

Merged from 109:

Problem

We do not currently support multiple agents.

Solution

To support multiple agent's we'll need to:

  • Extend the Task abstraction (and also VectorSampledTasks) to:
    • Accept multiple actions at once, e.g. the step should change from action: int to Union[int, Sequence[int]].
    • Return a sequence of rewards and observations (one for each agent). I don't know if we'd rather return Sequence[RLStepResult] or, instead, update RLStepResult to return sequences of values when appropriate. I have a slight preference for the second variant (as it would make it easier in the future to return values that are common to all agents, e.g. a joint reward) but could be convinced otherwise.
  • Update the RolloutStorage class so that one dimension is dedicated to different agents.
  • Other changes to handle the above changes multiple in the light_engine

I'm sure I'm missing some additional places that will need to be updated.

Dependencies

This should be completed after #108.

@jordis-ai2 jordis-ai2 linked a pull request Aug 15, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants