Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training multiple models in same environment #1094

Closed
anilkurkcu opened this issue Oct 5, 2022 · 3 comments
Closed

Training multiple models in same environment #1094

anilkurkcu opened this issue Oct 5, 2022 · 3 comments
Labels
duplicate This issue or pull request already exists question Further information is requested

Comments

@anilkurkcu
Copy link

Hello,

I want to train an agent that takes actions from two independent networks. So the final action of the agent would be the concatenation of these two independent policies. The point is that the same reward function will be used to train these policies, and the reward could only be calculated when both of the actions from these independent networks are available, and because of this I cannot train the first model and then move on to the next model.

Would there be any suggestion for this?

@anilkurkcu anilkurkcu added the question Further information is requested label Oct 5, 2022
@araffin
Copy link
Member

araffin commented Oct 5, 2022

Hello,
I'm not sure why you do such thing instead of having a network that output both actions.
If you are doing multi-agent training or self-play, you can take a look at #957 (comment) .
The answer is usually a callback (cf. doc).

if you need something more custom, then you will probably need to fork SB3.

@anilkurkcu
Copy link
Author

Thank you for your reply.

The reason behind this is because I am kind of combining two different action types when my network outputs both actions.

To give a solid example, you may think about the output layer of a policy deciding on both X-Y-Z coordinate displacements and at the same time deciding which robot to apply these displacements, assuming you have multiple robots present. So the first 3 nodes are for the displacement, and the remaining nodes do a binary encoding for deciding on which robot to apply these displacements.

I was just thinking that two individual policies would better fit such a scenario. Would training a single policy for this purpose work anyway?

@araffin
Copy link
Member

araffin commented Oct 6, 2022

Then it is a duplicate of #527

You can also discretize a continuous output if you want to try things quickly.

Would training a single policy for this purpose work anyway?

I would expect similar performance as your using the same data to train both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants