Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POCA trainer #5005

Merged
merged 289 commits into from
Mar 12, 2021
Merged

POCA trainer #5005

merged 289 commits into from
Mar 12, 2021

Conversation

andrewcoh
Copy link
Contributor

@andrewcoh andrewcoh commented Feb 24, 2021

Proposed change(s)

This PR adds the POCA trainer and associated tests. In addition it makes changes to the extrinsic reward provider to enable team-based rewards to work.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

PR for documentation - to be merged after this one #5056
Explanation of some of the design choices:

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

andrewcoh and others added 7 commits March 10, 2021 15:57
* simple rl multiagent env

* runs but does not train

* assemble terminal steps

* seems to train

* fix final reward

* Merge changes

* fix multiple discrete actions

* Lots of small fixes for multiagent env

* Fix just_died

* Add simple RL tests

* Add LSTM simple_rl for COMA

* adding comments to multiagent rl

* Address comments

Co-authored-by: Ervin Teng <ervin@unity3d.com>
erge branch 'develop-poca-trainer' into develop-coma2-trainer
@andrewcoh andrewcoh changed the title COMA2 trainer POCA trainer Mar 10, 2021
Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will wait for at least one more reviewer

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve once all comments have been resolved.
Including those that were wrongfully marked as outdated by github.

)
return value_outputs, critic_mem_out

def forward(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this method. It has no reason to be public.

# Convert to tensors
current_obs = [ModelUtils.list_to_tensor(obs) for obs in current_obs]
group_obs = GroupObsUtil.from_buffer(batch, n_obs)
group_obs = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments got lost. Please review them in the conversation tab : #5005 (comment)

Comment on lines 28 to 32
if (
BufferKey.GROUPMATE_REWARDS in mini_batch
and BufferKey.GROUP_REWARD in mini_batch
):
if self.add_groupmate_rewards:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert these 2 ifs. No need to check the first one if there are no groumaterewards

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These if conditions could be better:

if self.add_groupmate_rewards and BufferKey.GROUPMATE_REWARDS in mini_batch : do the groupmate reward
if BufferKey.GROUP_REWARD in mini_batch : Do the group reward

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

ml-agents/mlagents/trainers/torch/networks.py Show resolved Hide resolved
return rsa, x_self_encoder

@staticmethod
def encode_observations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stand by my statement, make create_residual_self_attention a module with encode_observations its forward method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call it ObservationEncoder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

courtesty of @ervteng #5093

ml-agents/mlagents/trainers/settings.py Outdated Show resolved Hide resolved
@andrewcoh andrewcoh merged commit d63a9d7 into main Mar 12, 2021
@delete-merged-branch delete-merged-branch bot deleted the develop-coma2-trainer branch March 12, 2021 01:48
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants