bug fix: compute_reward for batch input #153

nicehiro · 2023-05-19T03:47:26Z

Description

Fix compute_reward function can not compute reward for batch input in AntMaze environment.

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

rodrigodelazcano · 2023-05-19T13:16:58Z

Hi @nicehiro! Can you give me a bit more detail on what type of bug this is? Some code example.

Also, can you run pre-commit(https://pre-commit.com/) locally to fix the CI?

rodrigodelazcano · 2023-05-19T13:18:18Z

gymnasium_robotics/envs/maze/maze.py

        elif self.reward_type == "sparse":
-            return 1.0 if np.linalg.norm(achieved_goal - desired_goal) <= 0.45 else 0.0
+            return - (d > 0.45).astype(np.float32)


Why are you changing the reward function? With your implementation it returns -1 if the goal is not reached and 0 otherwise.

nicehiro · 2023-05-19T13:59:51Z

Hi, @rodrigodelazcano, Thanks for your great work!

Can you give me a bit more detail on what type of bug this is?

We expect to get list of reward for each observation passed to compute_reward function. This is helpful when we use HER to re-compute reward. FYI, here's the implementation of HER by stable-baseline.

Why are you changing the reward function? With your implementation it returns -1 if the goal is not reached and 0 otherwise.

-1 is more convenient for training in sparse reward environment. Because 0 has no grad generate.

By the way, all the changes refer to design here.

Here's a simple example:

env = gym.make('AntMaze_UMaze-v3')
obs, _ = env.reset()

# compute reward for one obs input
desired_goal = obs['desired_goal']
achieved_goal = obs['achieved_goal']
reward = env.compute_reward(desired_goal, achieved_goal, None)

# compute rewards for batch obs input
batch_desired_goal = np.array([desired_goal for _ in range(10)])
batch_achieved_goal = np.array([achieved_goal for _ in range(10)])
rewards = env.compute_reward(batch_desired_goal, batch_achieved_goal, None)

Before:

reward: 0.0
rewards: 0.0

After

reward: -1.0
rewards: [-1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]

pseudo-rnd-thoughts

@nicehiro The suggested reward change makes sense but as we are trying to preserve at unmaintained work, we are going to keep the reward function as is.
If you want to use the alternative reward function, I would recommend using a reward wrapper to modify the value.

Could you allow add a test that confirms the function works as expected with a single and multiple rewards
Otherwise, looks good to merge

pseudo-rnd-thoughts · 2023-06-19T15:52:11Z

@nicehiro Could you add the multi-fetch environment in a different PR

nicehiro · 2023-06-20T01:47:56Z

@nicehiro Could you add the multi-fetch environment in a different PR

Yes. Sorry about that.

Could you allow add a test that confirms the function works as expected with a single and multiple rewards

Yes. Is it ok I add a test in tests/env/fetch/?

pseudo-rnd-thoughts · 2023-06-20T09:57:08Z

Yes. Is it ok I add a test in tests/env/fetch/?

Yes, I think makes sense

Kallinteris-Andreas · 2023-06-20T10:37:25Z

gymnasium_robotics/envs/maze/maze.py

        elif self.reward_type == "sparse":
-            return 1.0 if np.linalg.norm(achieved_goal - desired_goal) <= 0.45 else 0.0
+            return -(d > 0.45).astype(np.float32)


fp64 not fp32

rodrigodelazcano · 2023-06-20T23:07:56Z

I'm merging this. Thank you @nicehiro and for the reviews @pseudo-rnd-thoughts and @Kallinteris-Andreas

Kallinteris-Andreas · 2023-06-21T07:28:19Z

@rodrigodelazcano changing the reward function, would require a new revision

Also, my suggested changes have not been applied, revert?

This reverts commit 65bfa85.

pseudo-rnd-thoughts · 2023-06-21T11:41:38Z

@Kallinteris-Andreas I think we should revert the reward function change but the batch input shouldn't require a change of version

rodrigodelazcano · 2023-06-21T14:27:16Z

Sorry @pseudo-rnd-thoughts and @Kallinteris-Andreas . I thought the changes were made.

Can you make a PR with the old reward function?

Kallinteris-Andreas · 2023-06-21T14:47:35Z

also, i have no idea what axis=-1 does

pseudo-rnd-thoughts · 2023-06-21T14:58:43Z

Yeah, we need testing for this as well

rodrigodelazcano · 2023-06-21T14:58:52Z

also, i have no idea what axis=-1 does

Basically this is used to compute vector norms along a specific axis of a matrix, instead of an int norm of the full matrix. In this case the last axis.

Since the fix is for computing rewards for batch inputs of achieved and desired goal this addition is required.

Am I missing anything @nicehiro ?

nicehiro · 2023-06-21T15:06:09Z

Yes. This should work for single input and batch inputs both. I’ve tested it long times ago. Sorry that I don’t have a laptop to test it right now. Rodrigo de Lazcano ***@***.***>于2023年6月21日周三22:59写道：

…

also, i have no idea what axis=-1 does Basically this is used to compute vector norms along a specific axis of a matrix, instead of an int norm of the full matrix. In this case the last axis. Since the fix is for computing rewards for batch inputs of achieved and desired goal this addition is required. Am I missing anything @nicehiro <https://github.com/nicehiro> ? — Reply to this email directly, view it on GitHub <#153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEHTIQWMCA5NMXALEXVAH6LXMMD3PANCNFSM6AAAAAAYHH557E> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Kallinteris-Andreas · 2023-06-21T16:13:42Z

To clarify I know what axis=1 does, but I do not know what axis=-1 does.

pseudo-rnd-thoughts · 2023-06-21T16:20:40Z

-1 is the last axis I believe

rodrigodelazcano reviewed May 19, 2023

View reviewed changes

bug fix: compute_reward for batch input

356c272

nicehiro force-pushed the main branch from c4700cd to 356c272 Compare May 22, 2023 01:39

pseudo-rnd-thoughts requested changes Jun 15, 2023

View reviewed changes

nicehiro force-pushed the main branch from 7604b4a to 356c272 Compare June 20, 2023 01:33

Kallinteris-Andreas reviewed Jun 20, 2023

View reviewed changes

rodrigodelazcano merged commit 65bfa85 into Farama-Foundation:main Jun 20, 2023
12 checks passed

Kallinteris-Andreas added a commit to Kallinteris-Andreas/Gymnasium-Robotics-Kalli that referenced this pull request Jun 21, 2023

Revert "bug fix: compute_reward for batch input (Farama-Foundation#153)"

41a24e2

This reverts commit 65bfa85.

Kallinteris-Andreas mentioned this pull request Jun 21, 2023

revert maze reward function #158

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug fix: compute_reward for batch input #153

bug fix: compute_reward for batch input #153

nicehiro commented May 19, 2023 •

edited

rodrigodelazcano commented May 19, 2023

rodrigodelazcano May 19, 2023

nicehiro commented May 19, 2023

pseudo-rnd-thoughts left a comment

pseudo-rnd-thoughts commented Jun 19, 2023

nicehiro commented Jun 20, 2023

pseudo-rnd-thoughts commented Jun 20, 2023

Kallinteris-Andreas Jun 20, 2023

rodrigodelazcano commented Jun 20, 2023

Kallinteris-Andreas commented Jun 21, 2023 •

edited

pseudo-rnd-thoughts commented Jun 21, 2023

rodrigodelazcano commented Jun 21, 2023

Kallinteris-Andreas commented Jun 21, 2023

pseudo-rnd-thoughts commented Jun 21, 2023

rodrigodelazcano commented Jun 21, 2023

nicehiro commented Jun 21, 2023 via email

Kallinteris-Andreas commented Jun 21, 2023

pseudo-rnd-thoughts commented Jun 21, 2023

bug fix: compute_reward for batch input #153

bug fix: compute_reward for batch input #153

Conversation

nicehiro commented May 19, 2023 • edited

Description

Checklist:

rodrigodelazcano commented May 19, 2023

rodrigodelazcano May 19, 2023

Choose a reason for hiding this comment

nicehiro commented May 19, 2023

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts commented Jun 19, 2023

nicehiro commented Jun 20, 2023

pseudo-rnd-thoughts commented Jun 20, 2023

Kallinteris-Andreas Jun 20, 2023

Choose a reason for hiding this comment

rodrigodelazcano commented Jun 20, 2023

Kallinteris-Andreas commented Jun 21, 2023 • edited

pseudo-rnd-thoughts commented Jun 21, 2023

rodrigodelazcano commented Jun 21, 2023

Kallinteris-Andreas commented Jun 21, 2023

pseudo-rnd-thoughts commented Jun 21, 2023

rodrigodelazcano commented Jun 21, 2023

nicehiro commented Jun 21, 2023 via email

Kallinteris-Andreas commented Jun 21, 2023

pseudo-rnd-thoughts commented Jun 21, 2023

nicehiro commented May 19, 2023 •

edited

Kallinteris-Andreas commented Jun 21, 2023 •

edited