@johnnyL7 will provide more details.
When we collect metrics from a multi-agent training the results seem different than what happens back in AL.
One theory is that for multi-agent, if an agent is skipped, we aren't adjusting for that. If there are 10 agents, and 9 are skipped, we still divide by 10 to calculate that metric, rather than dividing by 1 (the one agent that actually performed an action)
@johnnyL7 will provide more details.
When we collect metrics from a multi-agent training the results seem different than what happens back in AL.
One theory is that for multi-agent, if an agent is skipped, we aren't adjusting for that. If there are 10 agents, and 9 are skipped, we still divide by 10 to calculate that metric, rather than dividing by 1 (the one agent that actually performed an action)