# Linear Congruential Random Number Generators

>A **linear congruential generator** (**LCG**) is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear function. The method represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is relatively easy to understand, and they are easily implemented and fast, especially on computer hardware which can provide modular arithmetic by storage-bit truncation.
>
>The generator is defined by the recurrence relation:
>$$X_{n+1} = \left( a X_n + c \right)\bmod m$$
>where $X$ is the sequence of pseudo-random values, and
>- $m,\, 0\lt m$ is the "modulus",
>- $a,\,0 \lt a \lt  m$ is the "multiplier",
>- $c,\,0 \le c \lt  m$ is the "increment",
>- $X_0,\,0 \le X_0 \lt  m$ is the "seed" or "start value",
>These are integer constants that specify the generator.
>If $c=0$, the generator is often called a "multiplicative congruential generator" (MCG), or *Lehmer RNG*.
>If $c≠0$, the method is called a "mixed congruential generator".
>
>[[Wikipedia](https://en.wikipedia.org/wiki/Linear_congruential_generator)]

### Tasks

- Create an LCG with your Student ID as the modulus $m$, and suitable random values for $a, c$, and the *seed*. (See starter code below.)
- Use Decision Tress (DTs) from the `scikit-learn` library to assess the quality of your chosen PRNG. (If it is easy to predict the next digits then it is less random.)
    - Select 3 hyper-parameters and study their effect.

Explain your reasoning, and justify any choices of the hyperparameters (and/or run experiments to find the optimal ones).

Evaluate your models, and use visualisation to show the trees and any relevant plots.

Write a conclusion that summarises your findings, and makes recommendations.

In [1]:
from math import log
from random import randint
from matplotlib import pyplot as plt
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestClassifier

## Initialisation of the LCG parameters

Assign suitable values to the fllowing variables.

In [None]:
#.#.#.#.#.#.# IMPORTANT #.#.#.#.#.#.#

MODULUS = ............. # Set this to your Student ID

#.#.#.#.#.#.# IMPORTANT #.#.#.#.#.#.#

In [None]:
A = 101
C = 13
SEED = 321

### Base $b$ representation of numbers

In [3]:
def base_b(n, b):
    """ Get a list representing the number n written in base 'b' """
    bitlength = 1+int(log(MODULUS)/log(b))
    r = []
    for _ in range(bitlength):
        r.insert(0, n%b)
        n //= b
    return r

In [4]:
base_b(11,3) # Example: 11 in base 3 is:   2+0*3+1*3^2   -->   102   -->   [0,0,...,1,0,2]

[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2]

## LCG

In [5]:
def lcg(seed, modulus, a, c):
    """ Linear congruential generator: 𝑋_{𝑛+1} = (𝑎𝑋_𝑛+𝑐) mod 𝑚 """
    while True:
        seed = (a * seed + c) % modulus
        yield seed

In [6]:
generator = lcg(SEED, MODULUS, A, C)

## Data generation

In [7]:
stream = [next(generator) for _ in range(10_000)]
stream[:10] # Example

[32434, 65991, 121936, 93405, 51262, 115779, 88828, 82809, 92170, 49983]

In [8]:
def get_features(stream, base):
    ''' Repalce each random number from 'stream' by a vector of its base b digits '''
    return [base_b(n, base) for n in stream]

In [9]:
data = get_features(stream, base=3)

In [10]:
stream[0], data[0] # Example

(32434, [0, 1, 1, 2, 2, 1, 1, 1, 0, 2, 1])

In [11]:
X = data[:-1]
y = data[1:]
len(X), len(y)

(9999, 9999)

In [12]:
X[0], y[0] # Example

([0, 1, 1, 2, 2, 1, 1, 1, 0, 2, 1], [1, 0, 1, 0, 0, 1, 1, 2, 0, 1, 0])

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
len(X_train), len(X_test)

(7499, 2500)

In [None]:
# ...................

# Conclusion

........