<a href="https://colab.research.google.com/github/davidklhui/stochastic-modelling/blob/main/Slot_design_using_Markov_Chains.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

author: David K.L. Hui

This project will discuss different approach to model slot game mathematics using the concept of Markov Chains.

# Section 1: Introduction

## 1.1: Background

In previous projects, we discussed the mathematics of 3x3 and 5x3 slot game design including simulations; finding feasible solutions using hit frequency and RTP; payout allocations; and probability calculations.

In these projects, the main assumption is that, we assume that the outcomes of the reels are completely independent between game. This is useful when designing basic game as there will not have a dependencies between game.

However, in many modern casino, the slot machine itself has its own internal mechanism. For example, make consecutive large prize and medium prize be impossible.

To control this kind of prize behavior, very basic if-then-else flow control programming is capable of achieve this. Another possible way is to use Markov Chains. Using Markov Chains, we can easily visualize the transition of different states (using state-space diagram, or simply the transition probability matrix), the probability of transitions, as well as the limiting distributions.

It is worth to note that, all key concepts we have used like Hit Frequency and RTP are still applicable, although the calculation may be a bit differs.

## 1.2: Stochastic Processes

Stochastic Processes is a sequence of random variables over time. We usually classify a stochasic process using **state** and **stage**.

1. State: the states of the process is the possible value of the random variables. e.g.: non-negative integers for a queue; +1 / 0/ -1 to represent win/draw/loss of a game;

2. Stage: also the time, which is the time the random variables are collected. e.g: discrete time (n), or continuous time (t)

Stochasic Processes is a extremely broad class in probability theory. In our project, we will use a specific type called Markov Chains, a discrete time - discrete states stochasic processes with **Markov property**.

## 1.3: Markov Chains

One important property of a Markov Chains and all kind of Markov processes, are the property called **Markov property**.

The Markov Property is a memoryless property of a process, which means that if we know the present state of the process, then the future states are independent of the past history.

Mathematically, the Markov property stated that, given the present state $X_n$, the state of the future state $X_{n+1}, X_{n+2}, ...$ is independent of the past $X_{n-1}, X_{n-2}, ...$, i.e.

$$
P(X_{n+1}=j | X_n = i, X_{n-1} = i_{n-1}, X_{n-2} = i_{n-2}, ....) = P(X_{n+1}=j | X_n = i) \ \forall i,j,n
$$

In general, we needs either the **State-Space Diagram** or the **Transition Probability Matrix** to represents a Markov Chains. In this project, we will simply use the transition probability matrix, **P**, where its rows represent the current state, columns represent the next state, and the element $p_{i,j}$ represents the transition probability moving from state i to state j, i.e.

$$
p_{i,j} = P(X_{n+1} = j | X_n = i) \ \forall i,j
$$


## 1.4: Limiting Distribution

One important concept of a stochastic process is the long run behavior. For specific Markov Chains, we can derive its limiting distribution (also known as stationary distribution).

For the randomness property of a stochastic process, it is impossible to know the exact state of process (unless for specific class of chains like periodic, or reached the absorption states). However, if the limiting distribution exists, we can know that, the proportion of time the process is in specific state regardless of the initial state.

This concept is particular useful for us to define the Hit frequency and RTP because
1. we know on average the proportion of time in different states, especially in the state of not winning, so we can calculate the hit frequency
2. similar, we know how often it is in different prize state, so we can calculate the RTP as well


To calculate the limiting distribution, we can use the famous relationship

$$
\pi = \pi P
$$

where $\pi$ is the limiting distribution in row vector, $P$ is the transition probability matrix


# Section 2: Modeling

The above are just a tip of the area of Markov Chains and Stochastic Processes. There are many wonderful application area that may be useful for iGaming industry.

In this project, we will consider to model the game with the followings:
1. Classify the results into at least 4 states: Not win (0); Small Prize (1); Medium Prize (2); Large Prize (3)
2. Homogeneous Markov Chains: the transition probability is time-invariant, i.e. independent of the stage. (we can discuss further in later sections if we wish to retain some information about the consecutive loss)
3. Markov Property holds: given the latest state is enough to tell the transition probability, regardless of the past
4. Limiting Distribution exists: as we mentioned, limiting distribution exists only for specific conditions holds. However, it is no harm to assume it exists, if not, try use another transition probability matrix.


## 2.1: Example

Here we provide a very basic setup:

Step1: Define a transition probability matrix <br/>
Step2: find the limiting distribution through eigen problem (eigenvector with eigenvalue = 1) <br/>
Step3: find Prob(win) <br/>
Step4: distribute the payout according to the probability, and calculate the RTP

In [1]:
import numpy as np

In [2]:
P = np.array([[0.69, 0.205, 0.1, 0.005], [0.7, 0.2, 0.099, 0.001], [0.9, 0.09, 0.01, 0], [0.99, 0.01, 0, 0]])
P

array([[0.69 , 0.205, 0.1  , 0.005],
       [0.7  , 0.2  , 0.099, 0.001],
       [0.9  , 0.09 , 0.01 , 0.   ],
       [0.99 , 0.01 , 0.   , 0.   ]])

In [3]:
## You may take a look, for a long run, the N-steps transition probability will become this
from numpy.linalg import matrix_power

p = matrix_power(P, 10000)
p

array([[0.71221088, 0.19281342, 0.09122183, 0.00375387],
       [0.71221088, 0.19281342, 0.09122183, 0.00375387],
       [0.71221088, 0.19281342, 0.09122183, 0.00375387],
       [0.71221088, 0.19281342, 0.09122183, 0.00375387]])

In [4]:
# We can use solve the eigen equation to get the limiting distribution
# since it works only for column vector, so first we need to transpose the transition probability matrix
# then find the column vector associated with eigenvalue = 1
# finally, re-scale the eigenvector to sum = 1 (by default, it is of norm=1)

from numpy.linalg import eig

eigenvalues, eigenvectors = eig(np.transpose(P))

In [5]:
eigenvalues

array([ 1.        , -0.10229626,  0.01652407, -0.01422782])

In [6]:
eigenvectors

array([[ 0.95794711,  0.82418853,  0.68136944, -0.64133836],
       [ 0.2593404 , -0.45994563, -0.69796184,  0.70684662],
       [ 0.12269637, -0.32845472, -0.14734329, -0.24120953],
       [ 0.00504908, -0.03578818,  0.16393569,  0.17570126]])

In [7]:
p = np.transpose(eigenvectors)[0]
p = p/sum(p)

p

array([0.71221088, 0.19281342, 0.09122183, 0.00375387])

In [8]:
# perform further analysis
p0, p1, p2, p3 = p

In [9]:
p0

0.7122108798046978

In [10]:
p1, p2, p3

(0.19281341769910385, 0.09122183467947585, 0.0037538678167225933)

In [11]:
# P(Win) = Hit frequency in the long run
1 - p0

0.28778912019530223

In [12]:
# Suggested Payout
c = 10

r1 = c / 3 / p1
r2 = c / 3 / p2
r3 = c / 3 / p3


r1, r2, r3

(17.287870175794442, 36.54095913599626, 887.9730177189836)

In [13]:
# define the payout for each prize in sensible values
r = [0, 15, 35, 1000]

In [14]:
# expected payout
np.dot(r, p)

9.838833295990806

In [15]:
def sim(N):
  traces = []

  latest_state = 0

  for i in range(N):
    x = np.random.choice(a=range(4), size=1, p=P[latest_state])[0]
    traces.append(x)
    latest_state = x

  return traces


N = 1000000
traces = sim(N)

In [16]:
from collections import Counter

Counter(traces)

Counter({0: 711779, 2: 91219, 1: 193280, 3: 3722})

In [17]:
# verify the proportion of states
[xx * N for xx in p]

[712210.8798046978, 192813.41769910385, 91221.83467947585, 3753.8678167225935]

In [18]:
r

[0, 15, 35, 1000]

In [19]:
# RTP
sum(list(map(lambda x: r[x], traces))) / (N*c)

0.9813865

## 2.2: Example 2

Suppose this time, we wish to add two more states to represent loss: -1, and -2. So, states {0, -1, -2} are the states representing a loss, but their transition to win is differernt, the more they loss, the transition probability to 1 (small prize), 2 (medium prize), 3 (large prize), will be larger slighly. What ever a win to loss will back to state 0 first.

In [75]:
P = np.array([[0.6, 0, 0, 0.26, 0.135, 0.005],
              [0.65, 0, 0, 0.22, 0.125, 0.005],
              [0, 0.69, 0, 0.205, 0.1, 0.005],
              [0, 0, 0.7, 0.2, 0.099, 0.001],
              [0, 0, 0.9, 0.09, 0.01, 0],
              [0, 0, 0.99, 0.01, 0, 0]])
P

array([[0.6  , 0.   , 0.   , 0.26 , 0.135, 0.005],
       [0.65 , 0.   , 0.   , 0.22 , 0.125, 0.005],
       [0.   , 0.69 , 0.   , 0.205, 0.1  , 0.005],
       [0.   , 0.   , 0.7  , 0.2  , 0.099, 0.001],
       [0.   , 0.   , 0.9  , 0.09 , 0.01 , 0.   ],
       [0.   , 0.   , 0.99 , 0.01 , 0.   , 0.   ]])

In [76]:
[sum(x) for x in P]

[1.0, 1.0, 0.9999999999999999, 0.9999999999999999, 1.0, 1.0]

In [77]:
## You may take a look, for a long run, the N-steps transition probability will become this
from numpy.linalg import matrix_power

p = matrix_power(P, 10000)
p

array([[0.27269918, 0.16781488, 0.24320997, 0.20882443, 0.1038241 ,
        0.00362744],
       [0.27269918, 0.16781488, 0.24320997, 0.20882443, 0.1038241 ,
        0.00362744],
       [0.27269918, 0.16781488, 0.24320997, 0.20882443, 0.1038241 ,
        0.00362744],
       [0.27269918, 0.16781488, 0.24320997, 0.20882443, 0.1038241 ,
        0.00362744],
       [0.27269918, 0.16781488, 0.24320997, 0.20882443, 0.1038241 ,
        0.00362744],
       [0.27269918, 0.16781488, 0.24320997, 0.20882443, 0.1038241 ,
        0.00362744]])

In [78]:
# We can use solve the eigen equation to get the limiting distribution
# since it works only for column vector, so first we need to transpose the transition probability matrix
# then find the column vector associated with eigenvalue = 1
# finally, re-scale the eigenvector to sum = 1 (by default, it is of norm=1)

from numpy.linalg import eig

eigenvalues, eigenvectors = eig(np.transpose(P))

In [79]:
eigenvalues

array([ 1.00000000e+00+0.j        ,  5.46320879e-02+0.29808585j,
        5.46320879e-02-0.29808585j, -2.53918219e-01+0.j        ,
       -4.58939655e-02+0.j        ,  5.48008700e-04+0.j        ])

In [80]:
eigenvectors

array([[-5.86649279e-01+0.j        , -6.88614914e-01+0.j        ,
        -6.88614914e-01-0.j        ,  5.79021741e-01+0.j        ,
         6.82606339e-01+0.j        ,  6.86162374e-01+0.j        ],
       [-3.61014941e-01+0.j        ,  5.77766889e-01-0.3157944j ,
         5.77766889e-01+0.3157944j , -7.60672637e-01+0.j        ,
        -6.78294331e-01+0.j        , -6.32802156e-01+0.j        ],
       [-5.23210059e-01+0.j        ,  1.82171674e-01+0.22459656j,
         1.82171674e-01-0.22459656j,  2.79925567e-01+0.j        ,
         4.51153864e-02+0.j        , -5.02581285e-04+0.j        ],
       [-4.49237525e-01+0.j        , -3.64411264e-02+0.0769563j ,
        -3.64411264e-02-0.0769563j , -8.75606601e-02+0.j        ,
        -2.12411149e-01+0.j        , -1.16505619e-01+0.j        ],
       [-2.23353577e-01+0.j        , -3.38427512e-02+0.01550622j,
        -3.38427512e-02-0.01550622j, -9.12368538e-03+0.j        ,
         1.63740408e-01+0.j        , -2.06023948e-01+0.j        ],
     

In [81]:
p = np.transpose(eigenvectors)[0].real
p = p/sum(p)

p

array([0.27269918, 0.16781488, 0.24320997, 0.20882443, 0.1038241 ,
       0.00362744])

In [82]:
# perform further analysis
pm2, pm1, p0, p1, p2, p3 = p

In [83]:
# P(Win) = Hit frequency in the long run
p1+p2+p3

0.31627598117000605

In [84]:
# Suggested Payout
c = 10

r1 = c / 3 / p1
r2 = c / 3 / p2
r3 = c / 3 / p3


r1, r2, r3

(15.962372328758876, 32.10558223990266, 918.9205539589251)

In [85]:
# define the payout for each prize in sensible values
r = [0, 0, 0, 15, 30, 1000]

In [86]:
# expected payout
np.dot(r, p)

9.874534146865146

In [92]:
def sim(N):
  traces = []

  latest_state = 0

  for i in range(N):
    x = np.random.choice(a=[-2,-1,0,1,2,3], size=1, p=P[latest_state+2])[0]
    traces.append(x)
    latest_state = x

  return traces


N = 1000000
traces = sim(N)

In [93]:
from collections import Counter

Counter(traces)

Counter({-1: 167815, -2: 273419, 2: 103360, 0: 243010, 1: 208808, 3: 3588})

In [94]:
# verify the proportion of states
[xx * N for xx in p]

[272699.1751402867,
 167814.87700940706,
 243209.9666803001,
 208824.43190024942,
 103824.10474370638,
 3627.4445260502143]

In [95]:
r

[0, 0, 0, 15, 30, 1000]

In [96]:
# RTP
sum(list(map(lambda x: r[x+2], traces))) / (N*c)

0.982092

## 2.3: Discussion

From these 2 examples, we can see that we can define different states of win and loss (including consecutive loss like in subsection 2.2). In general, we can define as many states as we wish. The major challenges of this method are:

1. How to determine the transition probability between states
2. How to achieve the desire limiting distribution