# An Analysis of Absorbing States in the BR Process with Inertia

## Introduction and Background

This is an approach to calculating the probability of being absorbed to any absorbing state given the initial state. It is a very nice approach that doesn't require computing limits or eigenvectors etc...

The goal to keep in mind is that we want to compute $P_\infty = \lim_{t\to\infty}{P^t}$, which gives us the probability of the BR process terminating at state $j$ given that it started at state $i$. 

This method is based on Chapter 3 of "Finite Markov Chains" by Kemeney and Snell.

**Def:** A markov chain is _absorbing_ if there is at least one absorbing state, and it is possible to transition in a finite number of steps from any state to at least one of the absorbing states.

This is very reminiscint of the definition of weakly acyclic games. This is exactly the type of Markov chain we wish to study. The best-reply process with inertia is an absorbing Markov chain. 

Suppose our weakly acyclic game has $r$ absorbing states (equilibrium policies) and $t$ transient states (non-equilibrium policies).

### Canonical Form:
We can represent the transition matrix for the BR process of such game in "canonical form" by grouping the absorbing states and transient states. This takes the form:
$$
\begin{bmatrix}
Q & R\\ 
\mathbf{0} & I_r
\end{bmatrix}
$$

Where Q is a $t \times t$ matrix giving the transition probabilities between transient states, R is a $t \times r$ matrix giving the transition probabilities from transient states to absorbing states, and $I_r$ is the $r \times r$ identity matrix (since once you reach an absorbing state you stay there).

We have the following theorem.

**Theorem [3.1.1; Chapter 3]:** *In any finite Markov chain, the probability that the process is at an ergodic state after $n$ steps tends to 1 as $n \to \infty$*

By this theorem, in $\lim_{t \to \infty}{P^t}$, the $Q$ submatrix of $P$ will tend to $0$ as we raise $P$ to higher and higher powers. So, the first $t$ columns of $P_\infty$ (in canonical form) will be $0$.


### The Fundamental Matrix:

Recalling the fact that $Q^t$ tends to $0$, we have the following theorem.

**Theorem [3.2.1; Chapter 3]:** *For any absorbing Markov chain $(I_t - Q)$ has an inverse, and it is given by*
$$
(I_t - Q)^{-1} = \sum_{k=0}^{\infty}{Q^k}
$$

**Def:** Let the "fundamental matrix" be $N = (I_t - Q)^{-1}$

**Theorem [3.2.4; Chapter 3]**: *$(N)_{i,j}$ is the expected number of times the process hits state $j$ given that it starts in $i$.*

This is very useful. Apart from establishing that the number of times the matrix is in any non-absorbing state is finite, it also allows us to compute other useful things about the process (including what we ultimately want for our research).


### Probability of being absorbed by each equilibrium

Finally, we can compute the probability of being absorbed by each equilibrium.

Define the $t \times r$ matrix $B = NR$.

**Theorem [3.2.7; Chapter 3]:** *$(B)_{i,j}$ is the probability that process is absorbed by absorbing state $j$ given that it started at transient state $i$.*

This is exactly what we wanted. $B$ is the only part of $P_\infty$ that is unknown; the rest is either $0$ or $1$ depending on whether the states are absorbing/transient.

With this method, we didn't have to compute any eigenvalues or eigenvectors, and we didn't have to come up with and solve a system of equations. We also didn't need to compute a limit (theorem 3.2.1 saved us from that). We did need to compute an inverse, but otherwise all we needed was just submatrices of the transition matrix.

## Applying to our simple problem

Let us use this method to compute $B$ for the simple problem we studied earlier with the following best-reply graph.

![brgraph](https://i.imgur.com/iDXX9n0.png)

In [1]:
import numpy as np
from sympy import *

from sympy.interactive import printing
from IPython.display import Math, display

This team problem has the following transition matrix for its best-reply process with inertia. 

Note: states are enumerated according to the binary representation of the joint policy (i.e.: joint policy ((1,), (0,)) is state 2). State 0 and state 3 are the two equilibria/absorbing states (3 being the globally optimal one). States 1 and 2 are transient states.

In [2]:
# define the transition matrix
P_ = np.array([[1.    , 0.    , 0.    , 0.    ],
               [0.0625, 0.1875, 0.1875, 0.5625],
               [0.5625, 0.1875, 0.1875, 0.0625],
               [0.    , 0.    , 0.    , 1.    ]])
P = Matrix(P_).applyfunc(nsimplify)
display(Math(f'P = {printing.default_latex(P)}'))

<IPython.core.display.Math object>

We now put this in Canonical form. First, find the $Q$ and $R$ submatrices.

In [3]:
abs_states = [0, 3]
trans_states = [1, 2]

In [4]:
Q_ = P_[trans_states][:, trans_states]
Q = Matrix(Q_).applyfunc(nsimplify)
display(Math(f'Q = {printing.default_latex(Q)}'))

<IPython.core.display.Math object>

In [5]:
R_ = P_[trans_states][:, abs_states]
R = Matrix(R_).applyfunc(nsimplify)
display(Math(f'R = {printing.default_latex(R)}'))

<IPython.core.display.Math object>

Now we get the transition matrix in canonical form by simply reordering the indices of the states.

In [6]:
reorder = [1, 2, 0, 3]
P_ro_ = P_[reorder][:, reorder]
P_ro = Matrix(P_ro_).applyfunc(nsimplify)
can_P = '\\begin{bmatrix} Q & R\\\\ \\mathbf{0} & I_r \\end{bmatrix}'
display(Math(f'{can_P} = {printing.default_latex(P_ro)}'))

<IPython.core.display.Math object>

Now, we compute the fundamental matrix $N = (I_t - Q)^{-1}$

In [7]:
I_t_ = np.eye(len(trans_states))
N_ = np.linalg.inv(I_t_ - Q_)
N = Matrix(N_).applyfunc(nsimplify)
display(Math(f'N = {printing.default_latex(N)}'))

<IPython.core.display.Math object>

Finally, we can compute the absorbing probability vector $B = NR$. This gives the probability of ending at each of the $r$ absorbing state, given that we start at any of the $t$ transient states.

In [8]:
B_ = N_ @ R_
B = Matrix(B_).applyfunc(nsimplify)
display(Math(f'B = {printing.default_latex(B)}'))

<IPython.core.display.Math object>

This of course matches the results from our earlier analysis via eigendecomposition which were confirmed via simulation.

It tells us, for example, that the probability of converging to the globally optimal equilibrium (2nd absorbing state) given that we started at the policy ((0,), (1,)) (the 1st transient state) is $3/4$.