Your author only focused on a portion of the problems in this chapter due to significant overlap with problems and concepts in other texts covering renewal theory (e.g. Feller volume 1, Gallagher and associated MIT OCW Discrete Stochastic Processes -- approximately 1/4 of that course is focused on renewal processes--, Ross's Stochastic Processes)  

While there are some very good items in the chapter,  on the whole the chapter felt rather rushed and a bit lopsided.  Selected problems as well as well as expansions on items from the main chapter, have been included below.  

Because simulations seem to tie in especially well with queueing problems, your author has dropped in code as a 'companion' to several of the problems in this chapter.  


In [1]:
import numpy as np
import numba
import sympy as sp

In [2]:
np.exp(-0.5)

0.60653065971263342

**remark on Renewal Rewards process:**  

the section beginning on page 174, tels us that 

$\frac{R(t)}{t} \longrightarrow_{as} \frac{E[R_1]}{E[X_1]}\longleftarrow \frac{E[R(t)]}{t} $   

The key building blocks for this are the SLLN (for LHS) and Elementary Renewal Theorem, which relies on Wald's Equation plus truncation (for the RHS).  

among these, the typical proof (note martingale methods can get this result and sidestep the below) has a final step that for a positive integer valued random variable, 

$E\big[N\big] = \sum_{k=0}^\infty Pr\{N\gt k\}$  
which was proven on page 28, by expanding the table containing these probabilities and summing the table two different ways (the fact that probabilities are non-negative means such a summation converges absolutely or not at all).  

There is deeper point that for any non-negative random variable $X$ we have   
$E\big[X\big] = \int_{0}^\infty Pr\{X\gt x\}dx$  

Typically this result is proven via integration by parts, or 'openning up' the CDF into a double integral and then justifying the interchange of integration.  It is worth remarking that the result is implied by the key theorem for Renewal Rewards.  

In particular consider a non-negative random variables $X_i$ (with finite mean) as the interarrival times in a renewal process.  And suppose we assign a constant reward at each instant during a renewal period.  

Then   
$\frac{1}{t}\int_0^t 1 dx = 1 = \frac{R(t)}{t} \longrightarrow_{as} \frac{E[R_1]}{E[X_1]}= \frac{E[R_1]}{\mu}$  
(i.e. in the limit of $t$)  

so, clearing the denominator,   
$E\big[R_1\big] = \mu$    

but 
$E\big[R_1\big] = \int_{0}^\infty Pr\{X\gt x\}dx$ 

hence 

$ E\big[X\big]  = \mu =\int_{0}^\infty Pr\{X\gt x\}dx$ 

- - - - 
in certain problems in this chapter, e.g. problem 7, we have a constant cost assigned during a distinct 'phase' of the renewal process, so e.g. after the first part fails, taking advantage of memorylessness of exponential random variables, we can see that the expected cost with only component $i$ is given by the time averaged 'reward' of 

$c_i = \frac{E[R(t)^{(i)}]}{t} \longrightarrow \frac{E[R_1^{(i)}]}{E[X_1^{(i)}]}$   
or  
$c_i \cdot E\big[X_1^{(i)}\big] = E\big[R_1^{(i)}\big]= \int_{0}^\infty c_i \cdot Pr\{X^{(i)}\gt x\} = c_i\Big(\int_{0}^\infty Pr\{X^{(i)}\gt x\}dx\Big)= c_i \cdot E\big[X_1^{(i)}\big]$  

and in general if the renewal distribution variable tends to a limit (e.g. no periodicity concerns), we should not be surprised by the above, because recalling e.g. from Feller, the limitting distribution is given by 

$\text{limitting CDF} = \frac{1}{E[X^{(i)}]} \int_{0}^t Pr\{X^{(i)}\gt x\}dx$, so   
$\frac{Pr\{X^{(i)}\gt x\}}{E[X^{(i)}]}$  
is the limitting probability measure, and taking the expection of the intra-period cost with his limitting distribution / measure then shows up in our time averaged value (which is approximately a cesaro mean).  Interpretting this limitting probability distribution can be challenging depending on viewpoint.  If we think of it with the steady state distribution of (homogenous) markov chains, it is very straightfoward.    



**Remark on the theory of runs:**  

while this is a fully worked problem in the text, it seems worth doing a short writeup on it here, because the approach is an extremely pleasant way of using renewal rewards theorem.  Note: your author still slightly prefers the renewal argument given in Feller -- see 'Feller_chp13_notes.ipynb'-- however this is a close second and nicely illuminates the power of setting up a problem in terms of renewal rewards.  It's worth remarking that the two approaches are structurally almost identical.  In each case we have the exact same raw event probaility on one side and a fraction on the right hand side with $\bar{X_1}$ in the denominator and the same finite series of probabilities in the numerator.  The difference is that the Feller approach has this RHS as 'step 2' i.e. after passing a limit to a renewal equation, and then relying on the main renewal limit theorem (Feller Erdos Pollard) which makes each renewal probability tend to $\frac{1}{\bar{X}}$ -- prior to passing the limit, we could work directly with equation and e.g. tease out a generating function, which is what Feller does; no concept of renewal rewards is needed here.  On the other hand the Ross Pekoz approach does not need the machinery of Feller Erdos Pollard (though it is proven in this book via markov chain developments) and it also does not need to discuss the potential nuisances of peridioc behavior that could 'mess up' that limit, since the renewal rewards approach here is a 'mere' time average.  Hence the calculations are in some ways almost identical here, yet they offer somewhat complementary frameworks and insights.      

The setup is to "count" something two different ways.  In particular, we can recall that a reward may only depend on events in the current epoch, but we have free rein to come up with any rewards scheme we are interested in.  Furthermore as a bit of book-keeping we can consider the initial epoch as $X_0$ and think of this as a delayed reward process.  For all subsequent epochs, we know that a renewal occurred because a k-length pattern occurred.  For a worked example consider the patterns $HHTHHH$ (overlap of a single $H$ and $HH$) where probability of heads is $p \in (0,1)$ and tails is $q = 1-p$.  (For avoidance of doubt re convergence:  we can model this problem as an irreducible finite state markov chain, so there are no transient states / we know that a renewal/ new cycle occurs with probability one and $E\Big[\big \vert X_i\big \vert \Big] = E\Big[X_i\Big] \lt \infty$ -- or we can make a stochastically larger argument involving the geometric distribution or using the simple Renewal Age matrix to get to the same result.)   

So we have the typical renewal process of getting a cold start with a new epoch every time the pattern comes up.  However, we are free to select any reward we like.  And a well chosen reward would be to give a reward of 1 every time $HHTHHH$ is seen by some 3rd party observer irrespective of epochs, overlaps, etc.  Since overlaps are not a concern for the 3rd party observer, we know that after a pattern of $t\geq k = 6$, coin tosses, we know that by applying linearity of expectations 

$\text{expected reward} = (t-k+1)p^5q$   

and 

$\text{time averaged expected reward} = \lim_{t \to \infty}\frac{(t-k+1)}{t}p^5q = p^5q $  

but we also know that 

$p^5q = \text{time averaged expected reward}  = \lim_{t \to \infty}\frac{E[R(t)]}{t} = \frac{E[R_1]}{E[X_1]}$   

now to belabor the point: $E[R_1] = 1 +\text{something}$  
because with probability one a renewal happens in each epoch and hence the reward is at least 1.  However there is something extra, because if the sequence of first tosses, to start the epoch is $THHH$, then we *know* the last epoch ended as $HHTHHH$ i.e. ended with $HH$, giving raw string of $HHTHHH$ which is our pattern and hence merits a reward of $1$ with probability of $qp^3 = P(A_4^{(n)})$ for this sequence occurring at the beginning of our epoch.  Further we also can say that if $HTHHH$ occurs at the beginning of the epoch, then we know that the prior epoch ended with $H$ and hence there is a combined string of $HHTHHH$ which gets a reward of 1 with probability $qp^4 = P(A_5^{(n)})$.   The fact that these two 'mini patterns' are dependent may throw the reader off but in typical form we can just write this as, for $n\geq 1$, we have a reward random variable $R_n$ for each epoch given by  

$R_n = \mathbb I_{A_4^{(n)}} +  \mathbb I_{A_5^{(n)}} + 1$  

which is our reward function, defined as a random variable, for epoch $n$ (where $A_4^{(n)}$ and $A_5^{(n)}$ denote the events of the above 4 and 5 'minipatterns' occurring at the beginning of the nth epoch respectively).  Now taking expectations gives 

$E\big[R_n\big] = E\big[\mathbb I_{A_4} +  \mathbb I_{A_5} + 1\big] = E\big[\mathbb I_{A_4}\big] +  E\big[\mathbb I_{A_5}\big] + E\big[1\big] = qp^3 + qp^4 + 1$   


putting this all together gives us 

$p^5q = \lim_{t \to \infty}\frac{E[R(t)]}{t} = \frac{E[R_1]}{E[X_1]} = \frac{qp^3 + qp^4 + 1}{E[X_1]}$   

and hence  

$E\big[X_1\big]= \frac{qp^3 + qp^4 + 1}{p^5q }$    


**The key ideas are**   

(i) compute something two different ways, one of which is easy (i.e. use linearity of expectations on the ignoring overlap case)

(ii) set up the reward function to map to the easy to compute thing

(iii) recognize that while the reward function would appear to depend on events outside the epoch, for $n \geq 1$ we know surely what happended at the end of a prior epoch, and hence we are not making any assumptions or contingencies for events outside the current epoch -- the contingencies are entirely based on what happens (at the beginning) of the epoch and the reward function allows a nice way to bridge the 'cold start' property associated with problems of runs with the raw observation of events / strings of heads, and tails, etc.  



**additional remark:**  
This is a very flexible setup which can accomodate modifications to the runs problem.  For example, we may consider the expected number of runs of any kind of length $k$ -- say $k = 8$, for rolling a dice.  

while symmetry isn't technically needed, it does simplify the results.  We have  

$\text{time averaged expected reward}  = \lim_{t \to \infty}\frac{R(t)}{t} = 6\cdot p^8$  

because these are disjoint events and we apply linearity of expectations   

But   
$E\Big[R_n\big \vert Y_{n-1} = i\Big] = 1 + p + p^2 + ... + p^{7} = \frac{1 - p^{8}}{1-p}$  
which reads as the conditional expected value of the reward in epoch n given that the prior reward was for a run of type $i$  

however this is the same, by symmetry, for all $i$ so 

$E\Big[ R_n \Big]=  E\Big[E\big[R_n\big \vert Y_{n-1} = i\big]\Big] =\frac{1 - p^{8}}{1-p}$  

and as always 

$\text{time averaged expected reward}  = \lim_{t \to \infty}\frac{R(t)}{t} = 6\cdot p^8 = \frac{E\big[R_n\big]}{E\big[X_n\big]}$  

$E\big[X_1\big] = E\big[X_n\big]  = \frac{1 - p^{8}}{6\cdot p^8(1-p)}$  

which is the 'typical' length of the streaks problem, except divided by $6$ (or the number of acceptable outcomes)  


# 2.) Derive the Renewal Equation   

with a renewal proces with $S_n = X_1 + X_2 + ... + X_n$  
and $t \geq 0$  

using first step analysis / conditional expectations, we condition on $X_1$  

$m(t) = E\Big[N(t)\Big] = E\Big[E\big[N(t)\big \vert X_1\big]\Big]$   

where 
$E\big[N(t)\big \vert X_1\big]$ is a random variable where for $X_1\big(\omega\big)= x_1$  
$E\big[N(t)\big \vert X_1\big] = 1 + m(t-x_1) \text{  if  } x_1 \in [0,t], \text{which has probability measure of }  dF(x_1)$   
$E\big[N(t)\big \vert X_1\big] = 0 \text{ otherwise, i.e. for } x_1 \gt t, \text{which occurs with probability 1-F(t)}$   
Thus 
$m(t) = E\Big[N(t)\Big] = E\Big[E\big[N(t)\big \vert X_1\big]\Big] = \int_0^t \big(1 + m(t-x_1)\big)dF(x_1) = F(t) + \int_0^t  m(t-x_1)dF(x_1)$   

but since the $X_i$ are iid, we can more succinctly write this as 

$m(t) = F(t) + \int_0^t  m(t-x)dF(x)$   

which is the renewal equation  


# 3.) Inspection Paradox 
for a renewal process with iid interarrival times of $X_i$, we have  
$P\big(X_{N(t)+1} \gt x\big) \geq 1 - F(x)$  

which reads that the complementary cdf of the first arrival after time $a$ is stochastically larger than the complementary CDF of each iid arrival.  In effect, this is adverse sampling.  Technical nit -- this applies for 'regular' renewal processes, not necessarily at the beginning of a delayed renewal process.  

**proof**  

If we orient ourself at time $a$ the time of the prior arrival (at which point the process renewed), we see for any arrival time in 

$s \in [a,\infty)$  

$P\big(X_{N(t)+1} \gt x\big) = 1 \cdot P\big(X \gt s\big \vert X \gt a \big) \geq P\big(X \gt a\big) \cdot P\big(X \gt s\big \vert X \gt a \big) =1 - F(x)$   




# 5.)  a queueing prbolem  

A room has $n$ machines, each iid exponential with parameter $m$, and a repairman is called as soon as $k\lt n$ machines break and it takes the repairmean $d$ days to arrive -- and he instantly fixes the broken machines on site and the process probabilistically starts over.  

So there are two stages (i.) all machines running and this continues until $k$ break then (ii.) repairman stage  
Thus the renewal process is $(i.) \to (ii.) \to (i.)$.  However, we could just as easily call the renewal process $(ii.) \to (i.) \to (ii.)$ where the very first stage starts in $(i.)$ and hence is a delayed renewal process.  Technically this latter interpretation fits slightly more naturally for question (a) as the repairman is called at the end of each renewal there-- but this is a minor bookkeeping point.  

(a) *question:* How often in the long run does the repairman get called?  
*answer:* This is a little messy linguistically, but your author's read on this is that it is asking for the expected duration of stage $(i)$ plus expected time of stage $(ii.)$.  The latter has length $d$.  The former, by a Poisson embedding argument, has expected time until absorbtion of 

$=\text{expected time until first arrival} +\text{expected time }\{1 \to 2\} +... + \text{expected time }\{k-1 \to k\} $  
$= \frac{1}{(n-0)m} +\frac{1}{(n-1)m} + ... + \frac{1}{(n-(k-1))m} = \frac{1}{m}\sum_{i=0}^{k-1}\frac{1}{m-i}$  

so the total expected length of a renewal cycle is  
$E\big[X\big] = d + \frac{1}{m}\sum_{i=0}^{k-1}\frac{1}{m-i}$  

hence the long-run / time averaged frequency of the repairment being called is given by 
$\frac{E[R]}{E[X]}=\frac{1}{d + \frac{1}{m}\sum_{i=0}^{k-1}\frac{1}{m-i}} =\frac{m}{md + \sum_{i=0}^{k-1}\frac{1}{m-i}}$  

where the repairman is called (with reward value 1) exactly once per cycle.  


(b) *question:*  What is the distirbution of the total number of broken machines the repairman finds when he arrives?  
*answer:* There are several ways to tackle this problem, some quite unpleasant.  Any easy and probablistically satisfying one is as follows: 
at the beginnong of (ii) we have WP1 $k$ machines broken and at the end we have at most $n$ machines broken.  Making use of memorylessness of exponential arrivals, we get a fresh start on all surviving machines once we've entered this (ii.).  

It is convenient to do an affine shift by $k$ such that we consider whether  $0$ or $ 1$ or $2$, or $...$ or  $r$ machines break during the period (where $r=n-k$).  It is a simple translation by $k$ from this to total machines found by the repairman.  

Now whether working directly with the independent exponential distributions, or via a Poisson splitting into $r$ streams argument, we find that at time $d$ when the repairman arrives, there is probability of $p = 1-\exp(-md)$ that a given machine is has broken and probability of $1-p$ that is has broken.  Hence we have a binomial distribution of 

$\binom{r}{j}p^i(1-p)^{r-j}$
that $j$ machines break during period $(ii)$, or equivalently, the probability is 
$\binom{r}{j}p^i(1-p)^{r-j}$ that the repairman finds $k + j$ machines broken for $j \in\{0, 1, 2, ..., n-k\}$  


(c) *question:* what fraction of time in the long-run are there more than $k$ broken machines in the room?  
*answer:*  This question in effect re-uses items from $(a)$.  It is convenient to focus on the complementary component of less than or equal to $k$ broken machines in the room, calling that expected value $\bar{Y}$ and noticing that the value we seek is given by $E\big[R\big] = \bar{X} - \bar{Y}$ (i.e. reward of one at each instant that more than $k$ machines are broken), and the ultimate answer of course is given by 

$\frac{E[R]}{E[X]} = \frac{\bar{X}-\bar{Y}}{\bar{X}} = 1 - \frac{\bar{Y}}{\bar{X}}$  

to finish this off:  
$\bar{Y} = \text{expected value of all of stage (i) } + \text{expected time until minimum of first arrival and d, while in stage (ii)}$    
$= \frac{1}{m}\sum_{i=0}^{k-1}\frac{1}{m-i} + \text{expected time until minimum of first arrival and d, while in stage (ii)}$  

a nice way to calculate $\text{expected time until minimum of first arrival and d, while in stage (ii)}$ involves Poisson embedding with parameter $mk$ and renewal rewards. 

$\text{expected time until minimum of first arrival and d, while in stage (ii)} $  
$= \int_0^d Pr\{\text{1st arrival} \gt x\} dx = \int_0^d \exp(- m\lambda t )dt = \frac{1 - \exp(-dmk)}{m}$  

- - - - 
note that the complementary value here is  
$\int_0^d Pr\{\text{1st arrival} \leq x\} dx = d-\big(\frac{1 - \exp(-dmk)}{m}\big)  $  

of course for total expected time of $d$ in stage (ii) 
- - - - 
Puttin all this together gives   
$\bar{Y} = \big(\frac{1}{m}\sum_{i=0}^{k-1}\frac{1}{m-i} \big) + \frac{1 - \exp(-dmk)}{m}$  

$\frac{E[R]}{E[X]} = 1 - \frac{\bar{\big(\frac{1}{m}\sum_{i=0}^{k-1}\frac{1}{m-i} \big) + \frac{1 - \exp(-dmk)}{m}}}{ d + \frac{1}{m}\sum_{i=0}^{k-1}\frac{1}{m-i}} = 1 - \frac{ 1 - \exp(-dmk)+ \sum_{i=0}^{k-1}\frac{1}{m-i}}{ md + \sum_{i=0}^{k-1}\frac{1}{m-i}}$  
$= \frac{md + \sum_{i=0}^{k-1}\frac{1}{m-i}- 1 + \exp(-dmk)- \sum_{i=0}^{k-1}\frac{1}{m-i}}{ md + \sum_{i=0}^{k-1}\frac{1}{m-i}}$  
$= \frac{md  - (1 - \exp(-dmk))}{ md + \sum_{i=0}^{k-1}\frac{1}{m-i}}$  




# 6.)  
Each produced item is either defective or acceptable.  (a) Initially each item is inspected and this continues until $k$ consecutive acceptable items are discovered.  At this point the inspection mode changes to (b) inspect items at independently random with probability $\alpha$, which goes on until a defective item is found and then the process converts back to the process of $(a)$.  Suppose each item is iid defective with probability $q$ -- what portion of items are inspected?  

**remark:  we'll set this up as a renewal rewards process, though it is worth consider that this problem consists of nesting various bernouli processes**  

The renewal process is cold start at $a \to b \to a$ and a renewal occurs immediately upon returning to $a$.  

$\text{portion of items inspected} = \frac{E[R_1]}{E[X_1]}$  

where $E[X_1]$ is expected number of iterations / items in the renewal process, and we get a reward of 1 for each item inspected. 

$E[X_1] = \big(\text{expected iterations from a to b}\big) + \big(\text{expected iterations from b to a}\big)$   

$\text{expected iterations from a to b}= \mu = \frac{1 - (1-q)^k}{q(1-q)^k}$  
*rational:*  expected time until a run of $k$ acceptables, with success probability of $(1-q)$.  This calculation is covered in this chapter using an interesting form of renewal reward *and* covered in the Martingales Chapter (with a martingale technique for recovering the variance).  However your author's preferred technique for recovering the expected renewal time is still the on that Feller mentions in chp 13 of volume 1 (i.e. partition events and pass limits using key renewal theorem).  

$\text{expected iterations from b to a} = (q\alpha)^{-1}$  
*rational:*   geometric distribution, indexing at one, that has a probability of stopping / success of $(q\alpha)$  
- - - -
$E[R_1] = \big(\text{expected items inspected in a }\big) + \big(\text{expected items inspected while in b}\big)$   
$\text{expected items inspected in a } = \text{expected iterations from a to b}= \mu$  
*rationale:* all items are inspected in (a)    

$\text{expected items inspected while in b} =  (q)^{-1}$   
*rationale:*  if we only count the items inspected, and we count items up to and including the first defective one, this makes the distribution geometric with 'success' parameter $q$ (indexing at 1).  


$\text{portion of items inspected} = \frac{E[R_1]}{E[X_1]}=\frac{\mu +(q)^{-1}}{\mu + (\alpha q)^{-1}}=\frac{q\mu +1}{q\mu + \alpha^{-1}} = \frac{\alpha(q\mu +1)}{\alpha q\mu + 1} =  \frac{\alpha q\mu +\alpha}{\alpha q\mu + 1} $   


**note: I think there is also a clever way to do this with conditional expectations that I should type up**  



In [3]:
#interlude: some code for problem 6 

@numba.jit(nopython= True)
def my_sim(prob_inspect, prob_defective, k, n_trials):
    # for problem 6
    alpha = prob_inspect
    q = prob_defective
    counter_of_inspected = 0
    total_iterations = 0 # total iterations 
    for _ in range(n_trials):
        # state 0 
        success_run_counter = 0
        while True: 
            my_first_number = np.random.rand()
            total_iterations += 1
            counter_of_inspected += 1 # all items are inspected while in state 0
            if my_first_number <= q: 
                # i.e. defective 
                success_run_counter = 0
            else:
                success_run_counter += 1
            if success_run_counter == k:
                break
        # state 1
        while True: 
            my_first_number = np.random.rand()
            my_second_number = np.random.rand()
            total_iterations += 1
            if my_first_number <= alpha:
                counter_of_inspected += 1
                if my_second_number <= q:
                    break
    return counter_of_inspected / total_iterations 
        

In [4]:
def the_function(q, alpha, k):
    mu = (1 -(1-q)**k)/(q*(1-q)**k)
    print("mu is ", mu)
    numerator = alpha * q * mu + alpha 
    denominator = alpha * q * mu + 1
    return numerator / denominator

In [5]:
q = 0.2
alpha = 0.03
k = 9
sim_results = my_sim(alpha, q, k, n_trials=100000)
# print(sim_results[0]/sim_results[1])
print(sim_results)
print(the_function(q, alpha, k))

0.18728997416365592
mu is  32.25290298461912
0.18727620942436682


# 7
This is a basic poisson embedding problem   
$p_1 = \text{probability 1 survives } = 1- \frac{\lambda_1 }{\lambda_1 + \lambda_2} = \frac{\lambda_2 }{\lambda_1 + \lambda_2} = \text{aka probability that 2 is the first arrival}$    
$p_2 = \text{probability 2 survives } = \frac{\lambda_1 }{\lambda_1 + \lambda_2}$    

- - - - 
let $A_1$ be the event that $1$ arrives second (and ignore the zero probability event of a tie), and we create the random variables:  
$T_0$ to be the time of the first cycle, $T_1$ be the time for a $1$ to arrive given a fresh start and $T_2$ be the time for a $2$ to arrive given a fresh start.  So, making use of memorylessness, we have  

$\text{expected total cost per cycle} $  
$= K + p_1 \cdot E\Big[(c T_0 + c_1 T_1)\big \vert A_1\Big] + p_2 \cdot E\Big[(c T_0 + c_2 T_2)\big \vert A_1^C\Big]$  
$= K + p_1 c \cdot E\Big[T_0\big \vert A_1\Big] + p_1 c_1\cdot E\Big[T_1\big \vert A_1\Big] + p_2 c_2\cdot E\Big[T_0\big \vert A_1^C\Big]+ p_2 c_2\cdot E\Big[ T_2\big \vert A_1^C\Big]$  
$= K + \Big( p_1c\cdot E\Big[T_0\big \vert A_1\Big] + p_2c \cdot E\Big[T_0\big \vert A_1^C\Big]\Big) + p_1c_1 \cdot E\Big[T_1\big \vert A_1\Big] + p_2c_2 \cdot E\Big[ T_2\big \vert A_1^C\Big]$  
$= K + c\Big( p_1\cdot E\Big[T_0\big \vert A_1\Big] + p_2 \cdot E\Big[T_0\big \vert A_1^C\Big]\Big) + p_1c_1 \cdot E\Big[T_1\Big] + p_2c_2 \cdot E\Big[ T_2\Big]$  
$= K + c\cdot \Big(E\Big[T_0\Big]\Big) + p_1c_1 \cdot E\Big[T_1\Big] + p_2c_2 \cdot E\Big[ T_2\Big]$  
$= K + c\big(\frac{1}{\lambda_1 + \lambda_2}\big) + p_1c_1 \frac{1}{\lambda_1} +p_2c_2 \frac{1}{\lambda_2} $  
where we justify  
$E\Big[T_1\big \vert A_1\Big] = E\Big[T_1\Big]$  
by memorylessness and   

$E\Big[T\Big] = \Big( p_1\cdot E\Big[T_0\big \vert A_1\Big] + p_2 \cdot E\Big[T_0\big \vert A_1^C\Big]\Big)$  
by total expectation 
- - - - 
alternatively we could also note, e.g. that  
$ E\Big[T_0\big \vert A_1\Big] = E\Big[T_0\Big]$  
because the $P(A_1) = p_2 = \frac{\lambda_2}{\lambda_1 + \lambda_2}$ for any choice of $t \gt 0$  
This may be verified by examining the PDF of our merged process, and interpretting it as a prior distribution.  Then the event $A_1$ occurs which has a uniform likelihood function for any $t\gt 0$ which means that the posterior distribution is the same as the prior.  Since the distributions are the same, the expected values are the same.  
- - - - 
The expected total time per cycle calculation is the same as the above, except $k=0$ and $c = c_1 = c_2 = 1$.  

$\text{expected total time per cycle}  = \big(\frac{1}{\lambda_1 + \lambda_2}\big) + p_1 \frac{1}{\lambda_1} +p_2 \frac{1}{\lambda_2} $  
- - - 
For a different look at the expected time per cycle, we could use order statistics which gives:  

the second 'arrival' has order statistic $X_{(2)}$ with complementary CDF   
$Pr\big(X_{(2)} \gt x\big) = 1-F_{X_{(2)}}(x) =1 -\big(1- e^{-\lambda_1 x}\big)\big(1- e^{-\lambda_2 x}\big)$  

$E\big[X_{(2)}\big] = \int_{0}^\infty \Big(1 -\big(1- e^{-\lambda_1 x}\big)\big(1- e^{-\lambda_2 x}\big)\Big)dx = \int_{0}^\infty \Big(-e^{-\lambda_1 x - \lambda_2 x} + e^{-\lambda_1 x}+e^{-\lambda_2 x}\Big)dx = \frac{1}{\lambda_1}+\frac{1}{\lambda_2}- \frac{1}{\lambda_1 + \lambda_2} $    
which, after some algebra, agrees with the above, e.g. see here:  

https://www.wolframalpha.com/input/?i=(1+%2Fa+%2B+1%2Fb+-+1%2F(a%2Bb))-(1%2F(a%2Bb)+%2B+b%2F(a%2Bb)*(1%2Fa)+%2B+a%2F(a%2Bb)*(1%2Fb))
- - - 
per renewal rewards theorem, we thus have  
$\text{expected long-run cost per unit time} = \frac{\text{expected total cost per cycle} }{\text{expected total time per cycle} } = \frac{c\big(\frac{1}{\lambda_1 + \lambda_2}\big) + K + c_1 p_1 \frac{1}{\lambda_1} +c_2 p_2 \frac{1}{\lambda_2}}{\frac{1}{\lambda_1}+\frac{1}{\lambda_2}- \frac{1}{\lambda_1 + \lambda_2}}$   

**tbc**.  and it would be good to tie this in with a theorem or proposition related to renewal rewards in this chapter.  A bi-paritition argument would probably do it with the two different failure types... 





# 10 
this problem is of some interest and still an open item  

# 11 

someone rolls a die repeatedly and adds up the numbers:  
whats more likely: probability that the sum ever hits 2 or probability that the sum ever hits 102 
- - - - -

See the two main matrices in 'Feller_chp15_notes.ipynb'. 

The matrix chosen in the chapter in the proof of Feller-Erdos-Pollard (or what the chapter calls lattice case of Blackwell)  is the first one in that notebook -- the renewal matrix corresponding to Age.  Your author's preference is the second matrix, the renewal matrix corresponding to Residual Life.  Either one works here.  Via 'grabbing' the top left component after two iterations, or direct calculation, we see 

$P\big(\text{sum ever hits 2}\big) = \frac{1}{6} + \frac{1}{36} = \frac{7}{36}$  

however this renewal chain has $\mu = \frac{1 + 2 + 3 + 4 + 5 + 6}{6}=  3.5 = \frac{7}{2}$, which gives an assymptotic estimate of 

$P\big(\text{sum ever hits 102}\big) \approx \frac{1}{\mu}  = \frac{2}{7}=\frac{10}{35}\gt \frac{10}{36}\gt \frac{7}{36}$

the exact calculations are in code, below



In [6]:

A = np.zeros((6,6))

for k in range(1,A.shape[0]):
    A[k,k-1]= 1
A[0,:] += 1/6

e_1 = np.zeros(6)
e_1[0] = 1
print(e_1 @ np.linalg.matrix_power(A,2) @ e_1)
print(e_1 @ np.linalg.matrix_power(A,102) @ e_1)
# note this is extremely close to 1/mu = 1/3.5 which is the steady state estimate 

B = np.zeros((6,6))
for k in range(5):
    q = 1/(6-k)
    B[k,0] = q
    B[k,k+1] = 1-q
B[-1,0] = 1
print("vs \n")
print(e_1 @ np.linalg.matrix_power(B,2) @ e_1)
print(e_1 @ np.linalg.matrix_power(B,102) @ e_1)

print("\nvs lr estimate of \n", 1/3.5)

0.194444444444
0.285714285714
vs 

0.194444444444
0.285714285714

vs lr estimate of 
 0.2857142857142857


# 13 

**the end result seems to tie in with example 5.6 on pages 145, 146 which has a very natural tie in with renewal rewards, as well as this particular queueing problem I think... there are more insights to be had in this problem**  

the resulting (jump) chain for $b$ is awfully similar to that used in ex 16 of chapter 5... 

$\lambda := \lambda_1 + \lambda_2 + ... + \lambda_n$   

(a) using the renewal rewards theorem 6.8 from p.174

we see that   
$\lim_{t \to \infty}\frac{R(t)}{t} \to_{as} \frac{E[R_1]}{E[X_1]} = \lim_{t \to \infty}\frac{E[R(t)]}{t}$  

where $R$ is the 'reward' for the server being busy, and $X_1$ is the time of a renewal interval.  For avoidance of doubt, we define the renewal process to be the server is empty, and the time until a first arrival (hence server is busy) and then time until it is empty again -- there is capacity for only one entity to be processed at a time.  By memorylessness of the exponential arrival and processing times, the entire process probabilistically starts over / "renews" after this sequence.  We give a reward only for the second of the two stages of this process.  

$E\big[X_1\big] = \big(\frac{1}{\lambda}\big) +\big(\frac{\lambda_1}{\lambda}\frac{1}{\mu_1} + \frac{\lambda_2}{\lambda}\frac{1}{\mu_2}+... + \frac{\lambda_n}{\lambda}\frac{1}{\mu_n}  \big) = \big(\frac{1}{\lambda}\big) +\big(\frac{1}{\lambda}\sum_{i=1}^n\frac{\lambda_i}{\mu_i}\big) = \frac{1}{\lambda}(1+ \sum_{i=1}^n\frac{\lambda_i}{\mu_i})$  

$E\big[R_1\big] = \big(\frac{1}{\lambda}\sum_{i=1}^n\frac{\lambda_i}{\mu_i}\big)$  

hence the time averaged reward is   
$\lim_{t \to \infty}\frac{E[R(t)]}{t} = \frac{E[R_1]}{E[X_1]}=\frac{\frac{1}{\lambda}\sum_{i=1}^n\frac{\lambda_i}{\mu_i}}{\frac{1}{\lambda}(1+ \sum_{i=1}^n\frac{\lambda_i}{\mu_i})}=\frac{\sum_{i=1}^n\frac{\lambda_i}{\mu_i}}{1+ \sum_{i=1}^n\frac{\lambda_i}{\mu_i}}$  


(b)
note the official problem overloaded $n$ in (b), so we instead call it $X_m$  


in row stochastic form  

$\mathbf A = 
\left[\begin{matrix}
q & \frac{\lambda_1}{\lambda+\mu_1} & \frac{\lambda_2}{\lambda+\mu_2} & \frac{\lambda_3}{\lambda+\mu_3} & \frac{\lambda_4}{\lambda + \mu_2} & \dots & \frac{\lambda_n}{\lambda + \mu_n} \\
\frac{\mu_1}{\lambda+\mu_1} & \frac{\lambda}{\lambda+\mu_1} & 0 & 0 & 0 & \dots & 0\\
\frac{\mu_2}{\lambda+\mu_2} & 0 & \frac{\lambda}{\lambda+\mu_2} & 0 & 0 & \dots & 0\\
\frac{\mu_3}{\lambda+\mu_3} & 0 & 0 & \frac{\lambda}{\lambda+\mu_3} & 0 & \dots & 0\\
\frac{\mu_4}{\lambda+\mu_4} & 0 & 0 & 0 & \frac{\lambda}{\lambda+\mu_4} & \dots & 0\\
\vdots & \vdots & \vdots & \vdots & \vdots & \ddots  & \vdots\\\frac{\mu_n}{\lambda+\mu_n} & 0 & 0 & 0 & 0 & \dots & \frac{\lambda}{\lambda+\mu_n}\end{matrix}\right] = \left[\begin{matrix}q & p_1\frac{\lambda_1}{\lambda} & p_2\frac{\lambda_2}{\lambda} & p_3\frac{\lambda_3}{\lambda} & p_4\frac{\lambda_4}{\lambda} & \dots & p_n\frac{\lambda_n}{\lambda} \\
1-p_1 & p_1 & 0 & 0 & 0 & \dots & 0\\
1-p_2 & 0 & p_2 & 0 & 0 & \dots & 0\\
1-p_3 & 0 & 0 & p_3 & 0 & \dots & 0\\
1-p_4 & 0 & 0 & 0 & p_4 & \dots & 0\\
\vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\
1-p_n & 0 & 0 & 0 & 0 & \dots & p_n\end{matrix}\right]$  

where $q = 1 - \sum_{i=1}^n \frac{\lambda_i}{\lambda+\mu_i} = \sum_{i=1}^n \frac{\mu_i}{\lambda+\mu_i}$  

being row stochastic, of course $\mathbf{A1} = \mathbf 1$  

we are to verify that this is in fact a reversible markov chain (note: given the self loops we can be sure that this is aperiodic which is nice and by inspection there is one communicating class)  

we have the test for reversibility -- i.e. the detailed balance equations --  

$\pi_i P_{i,j} = \pi_j P_{j,i}$  

where we have 

$(1-p_1) = P_{1,0}$  
and in general for off diagonal components of the transition matrix    
$(1-p_i) = P_{i,0}$  
$\frac{\lambda_i}{\lambda} p_i = P_{0,i}$ 

fixing $\pi_0:=1$ and normalizing later, 

this implies for $i \in \{1, 2... , n\}$   
$\pi_i (1-p_i) = 1 \cdot \frac{\lambda_i}{\lambda} p_i$  
$\pi_i = \frac{\lambda_i}{\lambda}\frac{p_1}{1-p_1} = \frac{\lambda_i}{\lambda}\frac{\frac{\lambda}{\lambda + \mu_i}}{\frac{\mu_i}{\lambda + \mu_i}} =\frac{\lambda_i}{\lambda}\frac{\lambda}{\mu_i}=\frac{\lambda_i}{\mu_i}$    

and we compute a normalizing constant 

$c^{-1} = 1 + \sum_{i=1}^n \frac{\lambda_i}{\mu_i}$  

hence we have a steady state vector of   

$\mathbf \pi^T = c\cdot\begin{bmatrix}1\\ 
\frac{\lambda_1}{\mu_1}\\ 
\frac{\lambda_2}{\mu_2}\\ 
\vdots\\ 
\frac{\lambda_n}{\mu_n}\\  
\end{bmatrix}^T$

**remark:**   
$1 - \pi_0 = \sum_{i=1}^n \pi_i= \frac{\sum_{i=1}^n\frac{\lambda_i}{\mu_i}}{1+ \sum_{i=1}^n\frac{\lambda_i}{\mu_i}} = \frac{E[R_1]}{E[X_1]}=\lim_{t \to \infty}\frac{E[R(t)]}{t}$  


In [20]:
# @numba.jit(nopython = True)
def run_simulation_part_b(lambda_array, mu_array, r_trials):
    # slightly different that typical, we have r_trials being the number of iterations / new arrivals to count
    # the goal is to get an estimate of the steady state vector  
    n = lambda_array.shape[0]
    counter_array = np.zeros(n + 1)
    cur_state = np.random.randint(1,n)
    # starting at one is a bit odd, but call it a delayed renewal setup      
    for _ in range(r_trials):
#         assert(cur_state > 0)
        departures_batch = np.random.exponential(1/mu_array, n) 
        arrivals_batch = np.random.exponential(1/lambda_array, n)         
        # unfortunately these seem to be throwing errors with numba... 
        # https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.exponential.html
        # recall that beta, the inverse lambda parameter is used here 
        fastest_arrival_index_shifted_up_one = np.argmin(arrivals_batch) + 1
        if np.min(arrivals_batch) < departures_batch[cur_state - 1]:
#         if arrivals_batch[cur_state - 1] <= departures_batch[cur_state - 1]:
            counter_array[cur_state] += 1
            # cur state doesn't change 
        else:
            # i.e. the system has cleared and is empty when next arrival occurs, and arrival is accepted
            counter_array[0] += 1
            cur_state = fastest_arrival_index_shifted_up_one
    return counter_array/r_trials    


In [21]:
# main simulation for part b
n_arrival_types = 10
lambdas = np.random.randint(2, 30, n_arrival_types)/2
mu_s = np.random.randint(2, 30, n_arrival_types)/2
big_lambda = np.sum(lambdas)

steady_state_vec = np.zeros(n_arrival_types + 1)
steady_state_vec[0] = 1
steady_state_vec[1:] = lambdas / mu_s
steady_state_vec *= 1/np.sum(steady_state_vec)

#normalization 
print(steady_state_vec)

print("now for the simulation")  
sim_result = run_simulation_part_b(lambdas, mu_s, 300000)
sim_result

[ 0.03299941  0.11549794  0.1402475   0.04242781  0.03688169  0.02121391
  0.04949912  0.03712434  0.06599882  0.02911713  0.42899234]
now for the simulation


array([ 0.03348667,  0.12136333,  0.13525   ,  0.04374   ,  0.03901333,
        0.02171333,  0.05072333,  0.03435   ,  0.06188   ,  0.02845667,
        0.43002333])

In [24]:
print(lambdas)
print(mu_s)

[ 14.    8.5  13.5   9.5   9.   13.5   9.    5.    7.5  13. ]
[  4.    2.   10.5   8.5  14.    9.    8.    2.5   8.5   1. ]


In [25]:
if n_arrival_types == 3:
    M = A.subs(l_1, lambdas[0]).subs(l_2, lambdas[1]).subs(l_3, lambdas[2])
    M = M.subs(u_1, mu_s[0]).subs(u_2, mu_s[1]).subs(u_3, mu_s[2])
    M = np.array(M).astype(np.float64)
    ones_v = np.ones(4)
    # np.linalg.matrix_power(M, 1000)
    Q, R = np.linalg.qr(M - np.identity(4))
    steady_state = Q[:,-1] 
    steady_state = steady_state / np.sum(steady_state)
    steady_state
else:
    pass

# 16 
This is of interest but the question, as written, does not feel complete.  An inhomogenous poisson process doesn't have stationary increments -- e.g. reference 
http://www.randomservices.org/random/poisson/Nonhomogeneous.html  
or http://www.randomservices.org/random/poisson/Compound.html 
with $p_i(s)$ as the emdedded function.  Yet in the much too short treatment of Poissons in this chapter, stationary and independent increments are held out as being key for a Poisson.    


**lemma on Poisson splitting**: 
if we have a Poisson process that has (homogenous) probability $\alpha$ of being accepted at each arrival -- i.e. a bernouli process layered into a Poisson -- then the acceptance 'stream' is a Poisson process with paramater 
$\alpha \lambda$  

To prove this we want 3 things: 1) Poisson distributed, 2) stationary increments and 3) independent increments.  In fact it is enough to prove $2$ and $3$ as the Poisson distribution is the only continuous time process possessing 2 and 3-- so we then get 1 for free.  
- - - -
note: more machinery is involved in the background but an even easier approach is to recognize that the above process has a renewal function of $m_{\alpha, \lambda}(t) = \alpha \cdot \lambda \cdot t$ which is linear in $t$ (and $t$ varies continuously) which is unique to the poisson process over all renewal processes.  

in particular a 'regular' poisson has expected value 

$E\big[N(t)\big] = m(t) = \lambda t = e^{-\lambda t} \sum_{k=0}^\infty \frac{(\lambda t)^k}{k!}\cdot k= e^{-\lambda t} \sum_{k=1}^\infty \frac{(\lambda t)^k}{k!}\cdot k = e^{-\lambda t} \sum_{k=1}^\infty \frac{(\lambda t)^k}{k!}\big(\sum_{i=1}^k 1\big)$  

when we embedd coin tossing to determine indidividual arrivals are accepted (with probabilty $\alpha$) we get 

$E\big[N_\alpha(t)\big] = m_\alpha(t) = e^{-\lambda t} \sum_{k=1}^\infty \frac{(\lambda t)^k}{k!}\big(\sum_{i=1}^k \alpha\big) = \alpha \Big( e^{-\lambda t} \sum_{k=1}^\infty \frac{(\lambda t)^k}{k!}\big(\sum_{i=1}^k 1\big)\Big) = \alpha \Big(\lambda t\Big)$  

or equivalently, where $J$ is poisson distributed and independent of the iid  $\mathbb I_j$, we have 

$m_\alpha(t) = \alpha \cdot \lambda t = E\big[\sum_{j=0}^J \mathbb I_j\big] =  E\big[\mathbb I_j\big]E\big[J\big]$  
- - - - 
*A slightly longer approach:*  
we can verify directly that   
$P\big(N_{\alpha, \lambda}(t) = 0\big) = P(N(t) =0) + P(N(t) =1)(1-\alpha) + P(N(t) =2)(1-\alpha)^2 + ... $  
$=P(N(t) =0)(1-\alpha)^0 + P(N(t) =1)(1-\alpha) + P(N(t) =2)(1-\alpha)^2 + ... $  
$=e^{-\lambda t}\sum_{k=0}^\infty \frac{((1-\alpha)\lambda t)^k}{k!}$  
$=e^{-\lambda t \alpha} \big(e^{-\lambda t(1-\alpha)}\sum_{k=0}^\infty \frac{((1-\alpha)\lambda t)^k}{k!}\big)$  
$=e^{-\lambda t \alpha}$  

but given the time homoegeneity of $\alpha$ we can easily verify that memorylessness still holds in this 'residual Poisson' -- in effect re-running through the argument on page 186

But this is overkill -- in the homogenous case we already have iid bernouli trials layer on with exponential arrival times.  Our expected time until next arrival conditioned on being at time $t+s$ is $=e^{-\lambda s \alpha}$ given the memorylessness of the underlying Poisson process and the above work gives us that the waiting time until the next arrival is exponentially distributed irrespective of whether there was an arrival at time $x$ and irrespective of whether that arrival was accepted or rejected.  Hence the process is entirely characterized the the exponential inter-arrival times with parameter $\lambda \alpha$, and re-using page 186 or other background material we *know* this is a Poisson counting process.  This proves in the homogenous case that Poisson splitting splits the process into other Poissons.  (Technical nit: it remains to verify that the splits are independent -- this is implied by the fact that the $\alpha $ stream has countably many arrivals and hence for any $k$ arrivals in that stream we have the same probability estimate of poisson with $r$ arrivals and parameter $(1-\alpha)\lambda$.)  

We thus know that 
$N_{\alpha, \lambda}(t) $ is Poisson distributed.  We skipped the most direct and cumbersome approach of verifying, which is to use convolutions and show 

$P\big(N_{\alpha, \lambda}(t) = k\big)= \sum_{i= k}^\infty P\big(N(t) = i\big)\cdot \alpha^k(1-\alpha)^{i-k}\binom{i}{k}= \sum_{i= k}^\infty P\big(N(t) = i\big)\cdot \text{Binomial}\big(i,k,\alpha\big) $  

we did this for the $k=0$ case and via memorylessnessness of the underlying process (Poisson and Bernouli -- which are both memoryless) we get the above for free.  


*Now for the inhomogenous case*  

the problem asks us to examine the case where the acceptance (/hazard) rate changes over time -- such that there is a probability of an arrival at time $s$ being classified as type 1 with probability $p_i(s)$ and that this results in a poisson distributed random variable with mean 

$E\big[N_{\alpha, \lambda}(t) \big]=\lambda \cdot \alpha  t  = \lambda \int_0^t p_1(s)ds$  

i.e. where $\alpha := \frac{1}{t}\int_0^t p_1(s)$  

what is interesting is if we use the results from problem 15 (or better: your author's long writeup in Gallagher folder on uniform rivals for Poissons conditioned on $i$ arrivals at time $t$) 

Then we can find that conditoned on 
$P\big(N(t) = i\big)$ each arrivals is uniformly distributed in $[0,t]$ and has acceptance probability of $\alpha = \frac{1}{t}\int_0^t p_1(s)ds$. 

(as a sweetener, consider the renewal rewards interpretation where a reward is accrued with value 1 each time something is accepted   
$\lim_{z\to \infty} \frac{r(z)}{z} \longrightarrow_{as} \frac{E[R]}{E[X]} = \frac{\int_0^t p_1(s)ds}{t}$  
where a renewal occurs exactly at time $t$ in a given epoch and hence the above time average gives the long-run average reward / acceptance probability)  


hence re-using the convolutional approach, we can see 

$P\big(N_{\alpha, \lambda}(t) = k\big)= \sum_{i= k}^\infty P\big(N(t) = i\big)\cdot \text{Binomial}\big(i,k, \alpha\big) $  

as before. Thus the resulting random variable *is* Poisson distributed.  Note that we do lose stationarity with this approach.  (Lingering technical nit: we can re-run this for the $1-\alpha$ case and verify that the two random variables are independent -- as before the point is that the acceptance /rejectance at each moment in time, while not stationary /  time homogenous, is independent of past actions and the duality of countable many arrivals in each stream in a continuous time setting, as well the underlying generator of arrivals being memoryless, means that knowledge of arrivals in one process does not change estimates of arrivals in another process.)  

The very ending here could be tightened up a bit.  Overall this is a nice approach of first distilling as much as we can from a homogenous case and then using it to glean insights for an inhomogenous case.  



** The below ideas work in many cases but more thought is needed to further distill them**  

renewal rewards argument for figuring out probability of arrival from one of two disjoint paths  

key theorem  
$\frac{R(t)}{t} \longrightarrow_{as} \frac{E[R_1]}{E[Z_1]}\longleftarrow \frac{E[R(t)]}{t}$  

suppose we have an ergodic markov chain with a state $m$ of interest and paths that reach it through $k$ or $n$. For now supposed no self loops at our node $Z$ and no other way of reaching it.  

We want to know the probability of an arrival having come from $k$ vs $n$, and to be able to compute such a thing via the expected recurrence times for those nodes of $\bar{X}$ and $\bar{Y}$ respectively.  

using the renewal rewards setup, we can say  
suppose there is a reward of one each time $Z$ is entered.  Hence we have 

$\frac{R(t)}{t} \longrightarrow_{as} \frac{1}{\bar{Z}} $  




an interesting extension is to consider via bipartition, 2 distinct classes that feed into $Z$.  

we know the in general at iteration $r$, with a bipartition, for something with, say time homogenous transtion probabilities (i.e. a typical markov chain)

$u_m^{(r)} = p_{k,m}u_k^{(r-1)} + p_{n,m}u_n^{(r-1)}$  

where $u_j^{(r)}$ is the probability of being in state $j$ on the $rth$ iteration, given some fixed starting state-- for notational cleanliness, we agree to fix the starting state, but not mention it in the above notation.  

So assuming we have an ergodic chain, we take a limit 

$\frac{1}{\bar{Z}}=\pi_m = \lim_{r \to \infty} u_m^{(r)} = \lim_{r \to \infty}p_{k,m}u_k^{(r-1)} + \lim_{r \to \infty}p_{n,m}u_n^{(r-1)} = p_{k,m}\pi_k + p_{j,n}\pi_n = p_{k,m}\frac{1}{\bar{X}} + p_{n,m}\frac{1}{\bar{Y}}$  

(i.e. for ergodic chains, we recover the fact that starting position doesn't matter)  

i.e. we get  

$\frac{1}{\bar{Z}}=p_{k,m}\pi_k + p_{j,n}\pi_n  = p_{k,m}\frac{1}{\bar{X}} + p_{n,m}\frac{1}{\bar{Y}}$  


now suppose we consider a cycle that starts and ends in $m$ and we get a reward each time we enter state $m$ through state $k$ (and not state $n$)    

then 

$ \frac{R(t)}{t} \longrightarrow_{as} p_{k,m}\pi_k  = \frac{E[R_1]}{\bar{Z}}$  
