### Random Seeds

The `random` module provides a variety of functions related to (pseudo) random numbers.

The problem when you use random numbers in your code is that it can be difficult to debug because the same random number sequence is not the same from run to run of your program. If your code fails somewhere in the middle of a run it is difficult to make the problem **repeatable**. Debugging intermittent and non-repeatable failures is one of the worst things to do!

Fortunately, when using the `random` module, we can set the `seed` for the random underlying random number generator.

Random numbers are not truly random - they are generated in such a way that the numbers *appear* random and evenly distributed, but in fact they are being generated using a specific algorithm.

That algorithm depends on a **seed** value. That seed value will determine the exact sequence of randomly generated numbers (so as you can see, it's not truly random). Setting different seeds will result in different random sequences, but setting the seed to the same value will result in the same sequence being generated.

By default, the seed uses the system time, hence every time you run your program a different seed is set. But we can easily set the seed to something specific - very useful for debugging purposes.

In [4]:
import random

In [4]:
for _ in range(10):
    print(random.randint(10, 20), random.random())

15 0.2744539473546337
13 0.8696250242406662
14 0.3697144258854075
18 0.5945778682818538
15 0.7694636962835182
17 0.820862450549917
10 0.6467347679589829
20 0.8048988506681894
12 0.5880472380199475
20 0.8715275342775027


In [5]:
for _ in range(10):
    print(random.randint(10, 20), random.random())

14 0.931305656287958
10 0.23039405306234007
12 0.8337388005835649
18 0.4590462920405187
10 0.36743475564890316
13 0.7100875772566404
12 0.9750441656612154
12 0.7442020027100001
18 0.23667309950795434
20 0.41553858798609267


As you can see the sequence of numbers is not the same (and even restarting the kernel will result in different numbers).

We can set the **seed** as follows:

In [6]:
random.seed(0)
for i in range(10):
    print(random.randint(10, 20), random.random())

16 0.7579544029403025
16 0.04048437818077755
18 0.48592769656281265
14 0.9677999949201714
15 0.5833820394550312
13 0.5046868558173903
14 0.1397457849666789
11 0.6183689966753316
14 0.9872592010330129
18 0.9827854760376531


If we run this code again, the sequence will still be different:

In [7]:
for i in range(10):
    print(random.randint(10, 20), random.random())

19 0.9021659504395827
14 0.09876334465914771
11 0.8988382879679935
20 0.33019721859799855
18 0.1007012080683658
16 0.31619669952159346
20 0.9130110532378982
18 0.47700977655271704
18 0.2604923103919594
18 0.9159944803568847


Instead what we have to do is reset the seed (which happens if you set the seed to a specific number at the start of running your program - then evey random number generated will be repeatable from run to run).

Here, we just need to reset the seed before running that loop to get the same effect:

In [8]:
random.seed(0)
for i in range(20):
    print(random.randint(10, 20), random.random())

16 0.7579544029403025
16 0.04048437818077755
18 0.48592769656281265
14 0.9677999949201714
15 0.5833820394550312
13 0.5046868558173903
14 0.1397457849666789
11 0.6183689966753316
14 0.9872592010330129
18 0.9827854760376531
19 0.9021659504395827
14 0.09876334465914771
11 0.8988382879679935
20 0.33019721859799855
18 0.1007012080683658
16 0.31619669952159346
20 0.9130110532378982
18 0.47700977655271704
18 0.2604923103919594
18 0.9159944803568847


In [9]:
random.seed(0)
for i in range(20):
    print(random.randint(10, 20), random.random())

16 0.7579544029403025
16 0.04048437818077755
18 0.48592769656281265
14 0.9677999949201714
15 0.5833820394550312
13 0.5046868558173903
14 0.1397457849666789
11 0.6183689966753316
14 0.9872592010330129
18 0.9827854760376531
19 0.9021659504395827
14 0.09876334465914771
11 0.8988382879679935
20 0.33019721859799855
18 0.1007012080683658
16 0.31619669952159346
20 0.9130110532378982
18 0.47700977655271704
18 0.2604923103919594
18 0.9159944803568847


As you can see, the sequence of random numbers generated is now the same every time.

What's interesting is that even functions like `shuffle` will shuffle in the same order!

Let's see this:

In [10]:
def generate_random_stuff(seed=None):
    random.seed(seed)
    results = []
    
    # randint will generate the same sequence (for same seed)
    for _ in range(5):
        results.append(random.randint(0, 5))
    
    # even shuffling generates in the same way (for same seed)
    characters = ['a', 'b', 'c']
    random.shuffle(characters)
    results.append(characters)
    
    # same with the Gaussian distribution
    for _ in range(5):
        results.append(random.gauss(0, 1))
        
    return results

In [11]:
print(generate_random_stuff())

[4, 3, 2, 0, 5, ['b', 'c', 'a'], 0.2753548343351636, -0.5989933403172317, -0.6515943978936821, 1.7412073870280294, 0.24161779723044724]


In [12]:
print(generate_random_stuff())

[3, 5, 1, 5, 3, ['c', 'a', 'b'], -0.6334510789171736, -0.3564859849845763, 0.46562328656890606, -2.1891281426767746, -1.1983958517185107]


Now let's use a seed value:

In [13]:
print(generate_random_stuff(0))

[3, 3, 0, 2, 4, ['a', 'c', 'b'], 1.6391095109274887, -0.9249345372119703, 0.9223306019157185, -0.1891931090669293, 0.5456115709634167]


In [14]:
print(generate_random_stuff(0))

[3, 3, 0, 2, 4, ['a', 'c', 'b'], 1.6391095109274887, -0.9249345372119703, 0.9223306019157185, -0.1891931090669293, 0.5456115709634167]


As long as we use the same seed value the results are repeatable. But if we set different seed values the sequences will be different (but still be the same for the same seed):

In [15]:
print(generate_random_stuff(100))

[1, 3, 3, 1, 5, ['a', 'c', 'b'], -1.639893943131093, 0.7278930291928233, -0.4000719319137612, -0.08390378703116254, -0.3013546798244102]


In [16]:
print(generate_random_stuff(100))

[1, 3, 3, 1, 5, ['a', 'c', 'b'], -1.639893943131093, 0.7278930291928233, -0.4000719319137612, -0.08390378703116254, -0.3013546798244102]


Lastly let's see how we would calculate the frequency of randomly generated integers, just to see how even the distribution is.

Basically, given a sequence of random integers, we are going to create a dictionary that contains the integers as keys, and the values will the frequency of each:

In [5]:
def freq_analysis(lst):
    return {k: lst.count(k) for k in set(lst)}

In [6]:
lst = [random.randint(0, 10) for _ in range(100)]

In [7]:
print(lst)

[7, 8, 6, 1, 6, 1, 2, 6, 7, 9, 1, 5, 4, 6, 1, 9, 4, 10, 5, 5, 4, 0, 8, 3, 1, 7, 7, 3, 6, 1, 8, 0, 5, 7, 3, 5, 0, 7, 6, 1, 4, 9, 3, 6, 9, 4, 3, 2, 5, 0, 1, 6, 5, 7, 9, 1, 0, 5, 6, 2, 10, 2, 4, 0, 2, 1, 8, 9, 7, 3, 5, 7, 2, 10, 4, 8, 1, 10, 4, 10, 6, 10, 0, 5, 8, 8, 10, 7, 4, 8, 10, 3, 9, 8, 3, 9, 3, 5, 8, 8]


In [11]:
random.seed(0)
freq_analysis(lst)

{0: 7, 1: 11, 2: 6, 3: 9, 4: 9, 5: 11, 6: 10, 7: 10, 8: 11, 9: 8, 10: 8}

In [14]:
random.seed(0)
freq_analysis([random.randint(0, 10) for _ in range(1_000_000)])

{0: 90935,
 1: 91184,
 2: 91002,
 3: 91042,
 4: 90766,
 5: 91072,
 6: 90678,
 7: 90985,
 8: 90409,
 9: 91383,
 10: 90544}

Of course, it usually pays to know what's in the standard library :-)

The collections library has a Counter class that can be used to do this precise thing!

In [17]:
from collections import Counter

In [21]:
random.seed(0)
Counter([random.randint(0, 10) for _ in range(1_000_000)])

Counter({0: 90935,
         1: 91184,
         2: 91002,
         3: 91042,
         4: 90766,
         5: 91072,
         6: 90678,
         7: 90985,
         8: 90409,
         9: 91383,
         10: 90544})