Read out the historic data to base our new numbers with

In [1]:
import pandas as pd

df = pd.read_csv("euromillions-hotpicks-draw-history.csv")

df.head()


Unnamed: 0,DrawDate,Ball 1,Ball 2,Ball 3,Ball 4,Ball 5,DrawNumber
0,09-May-2023,13,17,21,28,46,1631
1,05-May-2023,3,8,18,34,49,1630
2,02-May-2023,7,32,44,47,48,1629
3,28-Apr-2023,11,13,16,23,34,1628
4,25-Apr-2023,10,29,30,40,45,1627


In [2]:
probabilities = {i+1: 1/50 for i in range(50)}
print(probabilities)


{1: 0.02, 2: 0.02, 3: 0.02, 4: 0.02, 5: 0.02, 6: 0.02, 7: 0.02, 8: 0.02, 9: 0.02, 10: 0.02, 11: 0.02, 12: 0.02, 13: 0.02, 14: 0.02, 15: 0.02, 16: 0.02, 17: 0.02, 18: 0.02, 19: 0.02, 20: 0.02, 21: 0.02, 22: 0.02, 23: 0.02, 24: 0.02, 25: 0.02, 26: 0.02, 27: 0.02, 28: 0.02, 29: 0.02, 30: 0.02, 31: 0.02, 32: 0.02, 33: 0.02, 34: 0.02, 35: 0.02, 36: 0.02, 37: 0.02, 38: 0.02, 39: 0.02, 40: 0.02, 41: 0.02, 42: 0.02, 43: 0.02, 44: 0.02, 45: 0.02, 46: 0.02, 47: 0.02, 48: 0.02, 49: 0.02, 50: 0.02}


As the number of samples approaches infinity, we expect the above probabilities to be close to reality. However, in our small set, we have an alternative which should move towards approaching this distribution.

In [3]:
import numpy as np
n = df.shape[0] * 5
occurrences = np.concatenate([df["Ball 1"], df["Ball 2"], df["Ball 3"], df["Ball 4"], df["Ball 5"]])

unique, counts = np.unique(occurrences, return_counts=True)
print(unique, len(unique))
running_counts = dict(zip(unique, counts))
print(running_counts)


[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50] 50
{1: 2, 2: 4, 3: 5, 4: 2, 5: 3, 6: 2, 7: 4, 8: 7, 9: 4, 10: 3, 11: 6, 12: 6, 13: 9, 14: 2, 15: 4, 16: 10, 17: 6, 18: 4, 19: 8, 20: 5, 21: 10, 22: 2, 23: 5, 24: 7, 25: 5, 26: 7, 27: 5, 28: 4, 29: 6, 30: 3, 31: 7, 32: 3, 33: 7, 34: 10, 35: 9, 36: 4, 37: 7, 38: 3, 39: 2, 40: 3, 41: 3, 42: 7, 43: 4, 44: 7, 45: 6, 46: 8, 47: 7, 48: 4, 49: 4, 50: 5}


Now we calculate standard deviation for the probabilities of each number. We are assuming that double draws can happen even though they cannot within every 5 ball window. This is a close approximation in the larger datasets.

In [15]:
import math
running_probabilities = {i+1: (running_counts[i + 1] / n) for i in range(50)}
print(running_probabilities)
ave = sum([running_probabilities[i + 1] for i in range(50)]) / 50.0
sd = math.sqrt(sum([(probabilities[i + 1] - running_probabilities[i + 1])**2 for i in range(50)]) / 50.0)
print(ave, sd)

{1: 0.007692307692307693, 2: 0.015384615384615385, 3: 0.019230769230769232, 4: 0.007692307692307693, 5: 0.011538461538461539, 6: 0.007692307692307693, 7: 0.015384615384615385, 8: 0.026923076923076925, 9: 0.015384615384615385, 10: 0.011538461538461539, 11: 0.023076923076923078, 12: 0.023076923076923078, 13: 0.03461538461538462, 14: 0.007692307692307693, 15: 0.015384615384615385, 16: 0.038461538461538464, 17: 0.023076923076923078, 18: 0.015384615384615385, 19: 0.03076923076923077, 20: 0.019230769230769232, 21: 0.038461538461538464, 22: 0.007692307692307693, 23: 0.019230769230769232, 24: 0.026923076923076925, 25: 0.019230769230769232, 26: 0.026923076923076925, 27: 0.019230769230769232, 28: 0.015384615384615385, 29: 0.023076923076923078, 30: 0.011538461538461539, 31: 0.026923076923076925, 32: 0.011538461538461539, 33: 0.026923076923076925, 34: 0.038461538461538464, 35: 0.03461538461538462, 36: 0.015384615384615385, 37: 0.026923076923076925, 38: 0.011538461538461539, 39: 0.00769230769230769

Now we use these probabilities to generate random numbers and review how far the current probability of this number is from the mean value of 0.019999. If the average is further than one s.d. apart, then we accept this as a likely candidate:

In [16]:
import random
candidates = []
while len(candidates) < 4:
    new_guess = random.randint(1, 50)
    if new_guess not in candidates and running_probabilities[new_guess] - ave < (0.7 * sd):
        candidates.append(new_guess)
print(candidates)

[14, 48, 32, 2]


Now we will repeat the operation but with a longer lasting set of data points.

In [17]:
running_counts = {
    1: 54, 2: 57, 3: 58, 4: 42, 5: 58, 6: 52, 7: 52, 8: 49, 9: 53, 10: 54,
    1: 51, 2: 60, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0,
    1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0,
    1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0,
    1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0
}
running_probabilities = {i+1: (running_counts[i + 1] / n) for i in range(50)}
print(running_probabilities)
ave = sum([running_probabilities[i + 1] for i in range(50)]) / 50.0
sd = math.sqrt(sum([(probabilities[i + 1] - running_probabilities[i + 1])**2 for i in range(50)]) / 50.0)
print(ave, sd)

KeyError: 11

In [23]:
number_of_attempts = 1000 * 2 * 52
chance_of_winning = 1 / (50 * 49 * 48 * 47 * 0.5**4)
number_of_attempts * 1.5

156000.0