Training multiple models in same environment #1094

anilkurkcu · 2022-10-05T08:35:56Z

Hello,

I want to train an agent that takes actions from two independent networks. So the final action of the agent would be the concatenation of these two independent policies. The point is that the same reward function will be used to train these policies, and the reward could only be calculated when both of the actions from these independent networks are available, and because of this I cannot train the first model and then move on to the next model.

Would there be any suggestion for this?

araffin · 2022-10-05T13:14:35Z

Hello,
I'm not sure why you do such thing instead of having a network that output both actions.
If you are doing multi-agent training or self-play, you can take a look at #957 (comment) .
The answer is usually a callback (cf. doc).

if you need something more custom, then you will probably need to fork SB3.

anilkurkcu · 2022-10-06T05:03:55Z

Thank you for your reply.

The reason behind this is because I am kind of combining two different action types when my network outputs both actions.

To give a solid example, you may think about the output layer of a policy deciding on both X-Y-Z coordinate displacements and at the same time deciding which robot to apply these displacements, assuming you have multiple robots present. So the first 3 nodes are for the displacement, and the remaining nodes do a binary encoding for deciding on which robot to apply these displacements.

I was just thinking that two individual policies would better fit such a scenario. Would training a single policy for this purpose work anyway?

araffin · 2022-10-06T16:55:34Z

Then it is a duplicate of #527

You can also discretize a continuous output if you want to try things quickly.

Would training a single policy for this purpose work anyway?

I would expect similar performance as your using the same data to train both.

anilkurkcu added the question Further information is requested label Oct 5, 2022

araffin added the duplicate This issue or pull request already exists label Oct 6, 2022

araffin closed this as completed Oct 18, 2022

This was referenced May 5, 2023

[Question] Does it make sense to make small-sized discrete actions continious? #1482

Closed

[Question] Can I train agents in a nested loop in SB3? #1491

Closed

araffin mentioned this issue Apr 10, 2024

[Question] Discretize continuous actions/observations ? #1887

Closed

4 tasks

araffin mentioned this issue Jul 31, 2024

[Вопрос] how to train 2 models in parallel? #1982

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training multiple models in same environment #1094

Training multiple models in same environment #1094

anilkurkcu commented Oct 5, 2022

araffin commented Oct 5, 2022

anilkurkcu commented Oct 6, 2022

araffin commented Oct 6, 2022

Training multiple models in same environment #1094

Training multiple models in same environment #1094

Comments

anilkurkcu commented Oct 5, 2022

araffin commented Oct 5, 2022

anilkurkcu commented Oct 6, 2022

araffin commented Oct 6, 2022