Add networksettings to reward providers #4982

andrewcoh · 2021-02-19T20:00:56Z

Proposed change(s)

Describe the changes made in this PR.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

config/imitation/Hallway.yaml

andrewcoh · 2021-02-19T20:05:16Z

ml-agents/mlagents/trainers/settings.py

@@ -199,28 +199,37 @@ def structure(d: Mapping, t: type) -> Any:
            enum_key = RewardSignalType(key)
            t = enum_key.to_settings()
            d_final[enum_key] = strict_to_cls(val, t)
+            if "encoding_size" in val:


Backward compatible with old configs

Can you ad a comment around this code so we will remember it ?

ml-agents/mlagents/trainers/settings.py

ml-agents/mlagents/trainers/torch/components/reward_providers/gail_reward_provider.py

ervteng · 2021-02-19T20:12:15Z

cc @hvpeteet, @sini - this will introduce a (backwards-compatible) change for the YAMLs - just checking to make sure it won't cause any issues.

vincentpierre

Need to edit the documentation before merging.

vincentpierre · 2021-02-19T21:23:04Z

ml-agents/mlagents/trainers/settings.py

@@ -183,7 +183,7 @@ def to_settings(self) -> type:
 class RewardSignalSettings:
    gamma: float = 0.99
    strength: float = 1.0
-    normalize: bool = False
+    network_settings: NetworkSettings = attr.ib(factory=NetworkSettings)


I would make this one optional and if it is None, then use the Policy's network settings rather than our own defaults. How does that sound ?

I think I'd prefer to use our defaults since it's possible the policy has significantly more capacity than is needed i.e. the Crawler policy of 3/512 vs what we use for the discriminator 2/128. That being said, I also realize this enables users to specify memory which we probably want to explicitly prevent in the reward providers. cc @ervteng

Not opposed to either route, they have their own pros/cons. Either way as long as it's documented it should be fine.
Is getting the Policy settings super ugly?

Im not sure how future proof it is for multi-agent scenarios. We could have different policies to select from. Additionally, we currently create reward signals in the optimizer/torch_optimizer.py and in the future i think it will be necessary to remove the policy from the optimizer (also for multiagent) in which case this would need to be addressed by either keeping the policy around/moving the creation of the reward provider. My vote is for default network settings

vincentpierre · 2021-02-19T21:23:40Z

ml-agents/mlagents/trainers/settings.py

@@ -199,28 +199,37 @@ def structure(d: Mapping, t: type) -> Any:
            enum_key = RewardSignalType(key)
            t = enum_key.to_settings()
            d_final[enum_key] = strict_to_cls(val, t)
+            if "encoding_size" in val:


Can you ad a comment around this code so we will remember it ?

hvpeteet

Looks good to me, we don't use these fields for anything cloud specific.

add network settings to reward providers

b68fbdc

andrewcoh changed the base branch from master to fix-gail February 19, 2021 20:01

andrewcoh requested review from ervteng and vincentpierre and removed request for ervteng February 19, 2021 20:01

andrewcoh commented Feb 19, 2021

View reviewed changes

config/imitation/Hallway.yaml Show resolved Hide resolved

update pyramids rnd config

4f83bbf

andrewcoh commented Feb 19, 2021

View reviewed changes

ml-agents/mlagents/trainers/settings.py Outdated Show resolved Hide resolved

remove print statement

00eb2b2

ervteng reviewed Feb 19, 2021

View reviewed changes

ml-agents/mlagents/trainers/torch/components/reward_providers/gail_reward_provider.py Outdated Show resolved Hide resolved

fix vail tests

d971b14

andrewcoh requested a review from hvpeteet February 19, 2021 20:16

andrewcoh mentioned this pull request Feb 19, 2021

Set ignore done=False in GAIL #4971

Merged

10 tasks

vincentpierre approved these changes Feb 19, 2021

View reviewed changes

hvpeteet approved these changes Feb 19, 2021

View reviewed changes

andrewcoh added 4 commits February 20, 2021 08:52

add comment to RewardSignal structure

55ebc79

set default encoding size to optional int

fbb3ebe

update documentations in training config

cc72294

update changelog

22db076

ervteng approved these changes Feb 22, 2021

View reviewed changes

raise warning if memory specified in reward providers

b43e20c

andrewcoh merged commit 5fcbbc4 into fix-gail Feb 22, 2021

delete-merged-branch bot deleted the fix-gail-networksettings branch February 22, 2021 21:21

github-actions bot locked as resolved and limited conversation to collaborators Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add networksettings to reward providers #4982

Add networksettings to reward providers #4982

andrewcoh commented Feb 19, 2021

andrewcoh Feb 19, 2021

vincentpierre Feb 19, 2021

ervteng commented Feb 19, 2021

vincentpierre left a comment

vincentpierre Feb 19, 2021

andrewcoh Feb 20, 2021

ervteng Feb 22, 2021

andrewcoh Feb 22, 2021

vincentpierre Feb 19, 2021

hvpeteet left a comment

Add networksettings to reward providers #4982

Add networksettings to reward providers #4982

Conversation

andrewcoh commented Feb 19, 2021

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

andrewcoh Feb 19, 2021

Choose a reason for hiding this comment

vincentpierre Feb 19, 2021

Choose a reason for hiding this comment

ervteng commented Feb 19, 2021

vincentpierre left a comment

Choose a reason for hiding this comment

vincentpierre Feb 19, 2021

Choose a reason for hiding this comment

andrewcoh Feb 20, 2021

Choose a reason for hiding this comment

ervteng Feb 22, 2021

Choose a reason for hiding this comment

andrewcoh Feb 22, 2021

Choose a reason for hiding this comment

vincentpierre Feb 19, 2021

Choose a reason for hiding this comment

hvpeteet left a comment

Choose a reason for hiding this comment