# Succinct Explanation of Issue

The end goal is to calculate $P(\tau_{e,i} < \tau_{g,i})$, where, as before:

$\tau_{e,i} := min(t>0: \pi_t=\pi_e, \pi_0 = i)$

The approach Serdar suggested is to use conditional expectations of the hitting times to calculate $Pr(\tau_{e,i} < \tau_{g,i})$. Via a different approach (using $\lim_{t\to\infty}P^t$), as well as simulation, fixing $i=1$, we know that $P(\tau_{e,1} < \tau_{g,1}) = 1/4$ and $P(\tau_{e,1} < \tau_{g,1}) = 3/4$. (if $i=2$ those probabilities are switched, but let's focus on $i=1$ for now). But I am trying to calculate that using Serdar's approach with expected hitting times.

This is the calculation to get to $P(\tau_{e,i} < \tau_{g,i})$. We have that: 
$$E[\min(\tau_{e,i}, \tau{g,i})] = P(\tau_{e,i} < \tau_{g,i}) E[\min(\tau_{e,i},\tau_{g,i}) | \tau_{e,i} < \tau_{g,i}]
	+ P(\tau_{g,i} < \tau_{e,i}) E[\min(\tau_{e,i},\tau_{g,i}) | \tau_{g,i} < \tau_{e,i}]$$
    
Using the fact that $P(\tau_{g,i} < \tau_{e,i}) = 1 - P(\tau_{e,i} < \tau_{g,i})$ and rearranging, we get:

$$P(\tau_{e,i} < \tau_{g,i}) = \frac{E[\min(\tau_{e,i},\tau_{g,i})] - E[\tau_{g,i} | \tau_{g,i}<\tau_{e,i}]}{E[\tau_{e,i} | \tau_{e,i} < \tau_{g,i}] - E[\tau_{g,i} | \tau_{g,i} < \tau_{e,i}]}$$

So, if we know $E[\min(\tau_{e,i},\tau_{g,i})]$, $E[\tau_{g,i} | \tau_{g,i}<\tau_{e,i}]$, and $E[\tau_{e,i} | \tau_{e,i} < \tau_{g,i}]$, we can calculate $P(\tau_{e,i} < \tau_{g,i})$. This is what we want. The idea was that we can compute those expectations by the law of total expectation conditioning on the first transition. 

Using this method, you and I got the same answers. $E[\tau_{e,1} | \tau_{e,1} < \tau_{g,1}] = 43/13$ and $E[\tau_{e,2} | \tau_{e,2} < \tau_{g,2}] = 27/13$. By the same method, we get $E[\tau_{g,1} | \tau_{g,1} < \tau_{e,1}] = 27/13$ and $E[\tau_{g,2} | \tau_{g,2} < \tau_{e,2}] = 43/13$ (makes sense by the symmetry of the problem). But my claim is that those answers are somehow "wrong". 

We have that $E[\min(\tau_{e,i}, \tau_{g,i})] = 8/5$ for both $i=1,2$. This calculation is not an issue, I'm pretty confident this is correct.

Putting this together, for $i=1$ we get:

$$
P(\tau_{e,1} < \tau_{g,1}) = \frac{E[\min(\tau_{e,1},\tau_{g,1})] - E[\tau_{g,1} | \tau_{g,1}<\tau_{e,1}]}{E[\tau_{e,1} | \tau_{e,1} < \tau_{g,1}] - E[\tau_{g,1} | \tau_{g,1} < \tau_{e,1}]}
= \frac{(8/5)-(27/13)}{(43/13)-(27/13)} = -\frac{31}{80}
$$

Obviously this doesn't make sense. This is why I'm concluding that the expectations we're computing actually correspond to something else (a different condition).

Like you, I also initially confirmed my answers with simulation. But, now I believe that simulation is also "wrong" / corresponds to a different condition.

The two simulations below both give the answers we get via the iterated expectation calculations.

The first one uses the original transition matrix (not reduced) and samples the next state from it. If the next state is the other equilibrium, it samples again until it gets a valid state. It does this until it reaches the correct equilibrium.

The second one uses the reduced transition matrix directly (where the other equilibrium is removed and we normalized with respect it) to sample the next state until the equilibrium is reached.

At the end, each simulation returns the number of steps it took to reach equilibrium and we average over many simulations.

In [None]:
# simulations that give the "wrong" answer

def sim1(P, x_0, n_x, x_end):
    x_t = x_0
    t = 0
    
    while True:
        # if equilibrium reached, return hitting time.
        if x_t == x_end:
            return t
        
        # sample next state (keep sampling until valid state is given)
        while True:
            x_t_ = np.random.choice(range(P.shape[0]), p=P[x_t])
            if x_t_ != n_x:
                x_t = x_t_
                break
        t+= 1

def sim2(P_ng, x_0, x_end):
    x_t = x_0
    t = 0
    
    while True:
        # if equilibrium reached, return hitting time.
        if x_t == x_end:
            return t
        
        #sample next state from reduced transition matrix
        x_t = np.random.choice(range(P_ng.shape[0]), p=P_ng[x_t])
        
        t+=1

Both of these simualtions give the same answers we got (i.e.: $E[\tau_{e,1} | \tau_{e,1} < \tau_{g,1}] = 43/13$, etc...). But as we saw above, these answers must be wrong.

There is a different simulation I found that does give the correct answer. It simulates each episode using the full transition matrix to its completion regardless of which equilibrium is reached. It then returns which equilibrium was reached and the time to get there. Then to compute each expectation, you take only the simulations in which the relevant equilibrium was reached and average hitting times over that.

In [None]:
def sim_correct(P, x_0, x_end):
    x_t = x_0 # start at given initial state
    t = 0
    
    while True:
        # if one of equilibriums reached
        if x_t in x_end:
            return [x_t, t] # return the terminating equilibrium and the stopping time
        
        # sample next state
        x_t = np.random.choice(range(P.shape[0]), p=P[x_t])
        t+= 1

This simulation gives the following answers (rounded to 1 decimal place):

$E[\tau_{e,1} | \tau_{e,1} < \tau_{g,1}] \approx 2.2$, $E[\tau_{e,1} | \tau_{e,1} < \tau_{g,1}] \approx 1.4$

Together with $E[\min(\tau_{e,i}, \tau_{g,i})] = 8/5$, this gives:

$$
P(\tau_{e,1} < \tau_{g,1}) = \frac{E[\min(\tau_{e,1},\tau_{g,1})] - E[\tau_{g,1} | \tau_{g,1}<\tau_{e,1}]}{E[\tau_{e,1} | \tau_{e,1} < \tau_{g,1}] - E[\tau_{g,1} | \tau_{g,1} < \tau_{e,1}]}
\approx \frac{(8/5)-1.4}{2.2-1.4} = \frac{1}{4}
$$

Which is the correct answer.