# Random Digits Exercice

### Description:
This is an exercise in hypothesis testing, specifically determining if a series of digits [0,9] is random or not, and if the origin is human (i.e. yourself!). In order to determine this, one has to design and apply statistical tests, from which the degree of consistency of being random can be calculated.

### Your task:
Using tests of your choice/design, determine which of the given data sets are consistent with being random, and which aren't. One is based on the digits you entered in the course questionaire, while the others range from obviously non-random over poor/quasi random to digits of pi and truly random. See if you among these can determine your (human) dataset.

I chose to try to implement the 32x32 binary rank test instead of the above exercise.

###  Author: 
 - Troels Petersen ([petersen@nbi.dk](mailto:petersen@nbi.dk))

###  Date:
 - 7th of December 2023

---

In [1]:
import numpy as np
from scipy import stats

---
## Define your tests:

Here is an example plot from the data, just for convenience. It is all up to you from here...

In [162]:
def get_32_matrix_row(x, bits=32):
    return np.array([int(i) for i in bin(x)[2:].zfill(bits)])

def binary_rank_test_32(numbers):
    N_iterations = 40_000
    ranks = np.zeros(N_iterations)
    for i in range(N_iterations):
        if i % 5000 == 0:
            print(f'On {i}th iteration')
        sample = np.random.choice(numbers, size=32)
        matrix = np.array([get_32_matrix_row(num) for num in sample])
        rank = np.linalg.matrix_rank(matrix)
        ranks[i] = rank
    return ranks
# Random (positive) 32-bit integer
numbers1 = np.random.randint(0, 4_294_967_295, size=1000)
ranks = binary_rank_test_32(numbers1)

On 0th iteration
On 5000th iteration
On 10000th iteration
On 15000th iteration
On 20000th iteration
On 25000th iteration
On 30000th iteration
On 35000th iteration


In [163]:
ranks[ranks<=29] = 29
unique, counts = np.unique(ranks, return_counts=True)
unique_expected = ["=<29", "30", "31", "32"]
counts_expected = [211.4, 5134.0, 23103.0, 11551.5]
print("Rank value | count \n")
for i in range(len(unique)):
    print(f"{unique[i]} | {counts[i]}")
print("\n")
print("Rank value | Expected counts (given true randomness)\n")
for i in range(len(unique_expected)):
    print(f"{unique_expected[i]} | {counts_expected[i]}")


Rank value | count 

29.0 | 461
30.0 | 2836
31.0 | 12409
32.0 | 24294


Rank value | Expected counts (given true randomness)

=<29 | 211.4
30 | 5134.0
31 | 23103.0
32 | 11551.5


The idea is to do a chi square test on this to see if what we get matches what is expected.

In [164]:
chi2 = np.sum((counts-counts_expected)**2 / counts_expected)
ndof = 3
p_val = stats.chi2.sf(chi2, ndof)
print(p_val)

0.0


Clearly, with a p-value of 0, we can safely say that the generated numbers are not random (which is unsurprising, since they are from a pseudo-random number generator.)