## Introduction

There are lottery systems in many games. The simplest implementation is that one gives a fixed probability and draws the lottery with that probability. The number of trials for a win has a geometric distribution. This distribution has a drawback that the variance is very large especially when the probability is low. So we'd like to have some alternative systems to solve this problem.

In [1]:
import numpy as np
import plotly
import plotly.graph_objs as go
import random

from utils import *

### Let's first see how the geometric distribution looks like

In [2]:
def geometric(p):
    """Returns the procedure for the geometric distribution"""
    def procedure():
        return random.random() < p
    return procedure

In [3]:
geometric(0.1)()

False

In [4]:
geometric_dic = mc(geometric(0.01), simulation=1_000_000)

In [5]:
get_mean_std(geometric_dic)

(102.2163957886129, 102.34614340528987)

In [6]:
def theoretical_geometric_mean_std(p):
    mean = 1 / p
    var = (1 - p) / p / p
    return mean, np.sqrt(var)

In [7]:
theoretical_geometric_mean_std(0.01)

(100.0, 99.498743710662)

In [8]:
plot_pdf(geometric_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '14e28582-7bc5-49bb-b79d-235fde87b506',
 …

In [9]:
plot_cdf(geometric_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': 'ed384f55-f989-4935-a91e-7b2eeed86151',
 …

### For simple a geometric distribution, we see even though the mean is only 100, but there is still some probability to fail more than 300 trials

## A Better Solution

The challenging thing is how to make another probability distribution but keep the expectation unchanged. Suppose we'd like the expectation of every trial to be 0.01. When we fail for one trial, we don't want to lose completely but gain some value such as 0.005 which means in later trials it will generate the expectation of 0.005. 

Now we need to calculate the probability of winning and losing. Assume the winning probability is a, and losing probability is 1 - a. When win we gain 1, when lose we gain 0.005. So the "expectation" will be a + 0.005 * (1 - a), which should be equal to 0.01. So we can solve to get that a = 0.005.

So the initial probability is decreased, but later probability should raise, how? Take an example that we failed 60 trials, the gained value is accumulated to 0.3. If we win for the next trial, we consider that the value of 0.3 is used and it should be set back to zero, meaning that we only gain 0.7 since the gained value is decreased by 0.3. If the winning probability were still 0.005, we gained less, so the probability should increase. The expectation of the next trial is a * (1 - 0.3) + (1 - a) * 0.005. Again we can solve the value of a.

One more thing to notice is that when the accumulated gained value is more than 0.99. The expected gain is 0.01 for the next trial, then accumulated gain will exceed 1. This case indicates a must win and after winning the accumulated gain should be subtracted by 1.

### Let's see how to implement it in general

In [10]:
def accumulated(p, x):
    """Create a procedure that has the expectation of p and the gaining value increase by x for every failed trial"""
    assert p > x >= 0
    left = 0
    def procedure():
        nonlocal left
        if left + p < 1:
            r = (p - x) / (1 - left - x)
            win = random.random() < r
            if win:
                left = 0
            else:
                left += x
            return win
        else:
            left = left + p - 1
            return True
    return procedure

In [11]:
procedure = accumulated(0.01, 0.005)

In [12]:
acc_dic = mc(procedure, simulation=1_000_000)

In [13]:
get_mean_std(acc_dic)

(99.98200179982001, 57.217674507379265)

In [14]:
plot_pdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '68f44fe9-044d-4bad-97bb-a19196a24b18',
 …

In [15]:
plot_cdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '9f8dee27-707a-4d48-8cbf-2b0335dbea79',
 …

### This seems to have a uniform distribution

In [16]:
procedure = accumulated(0.01, 0.003)

In [17]:
acc_dic = mc(procedure, simulation=1_000_000)

In [18]:
get_mean_std(acc_dic)

(100.12996895964754, 72.96439030913295)

In [19]:
plot_pdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': 'ed90b34a-4bbb-485a-977f-27aac82b117e',
 …

In [20]:
plot_cdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': 'bf4d66b4-8610-4351-b197-1fc9755111c1',
 …

In [21]:
procedure = accumulated(0.01, 0.007)

In [22]:
acc_dic = mc(procedure, simulation=1_000_000)

In [23]:
get_mean_std(acc_dic)

(99.46290033817387, 41.970559544514906)

In [24]:
plot_pdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '88c458a8-bbbb-4e92-bbf7-cb48e8dbe000',
 …

In [25]:
plot_cdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': 'ca743f07-f553-405d-9c06-60f5047587ac',
 …

From the three examples above, we see the variance is reduce and you will never fail more than 400 times.

## More complex methods

In the above method, when lose we gain a fixed value x. We can change it to make more complex methods. For example, gain x1 when the accumulated gain is less than 0.5 and gain x2 otherwise.

In [26]:
def accumulated_complex(p, x1, x2):
    assert p > x1 >= 0 and p > x2 >= 0
    left = 0
    def procedure():
        nonlocal left
        to_left = x1 if left < 0.5 else x2
        if left + p < 1:
            r = (p - to_left) / (1 - left - to_left)
            win = random.random() < r
            if win:
                left = 0
            else:
                left += to_left
            return win
        else:
            left = left + p - 1
            return True
    return procedure

In [27]:
procedure = accumulated_complex(0.01, 0.007, 0.003)

In [28]:
acc_dic = mc(procedure, simulation=1_000_000)

In [29]:
get_mean_std(acc_dic)

(99.90578479368568, 48.44833147640655)

In [30]:
plot_pdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '74b4925a-9771-4865-8c43-56a0a07ba419',
 …

In [31]:
plot_cdf(acc_dic)

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': 'cab0347d-5098-47cb-94f7-03713227539b',
 …

## Conclusion

We introduced a lottery method that has lower variances than geometric distributions and can also easily keep the gaining expectation unchanged.