The POMG struct gives the following objects:
γ
: discount factorℐ
: agents𝒮
: state space𝒜
: joint action space𝒪
: joint observation spaceT
: transition functionO
: joint observation functionR
: joint reward function
The agents ℐ
are the players of the game. The joint action space 𝒜
is the set of all possible ordered pairs of actions amongst all of the agents. The joint observation space 𝒪
is the set of all possible joint observations. The transition function takes in a state s
in 𝒮
, a joint action a
and a new state s'
and returns the transition probability of going from s
to s'
by taking action a
. The joint observation function takes in a state, s
, a joint action, a
, and a joint observation o
in 𝒪
and returns a probability of observing o
by taking action a
from state s
. The joint reward function R
takes a state and a joint action in 𝒜
and returns a reward value.