In this project, we are going to contribute to the development of a mobile app by writing a couple of functions that are mostly focused on calculating probabilities. The app is aimed to both prevent and treat lottery addiction by helping people better estimate their chances of winning.

The app idea comes from a medical institute which is specialized in treating gambling addictions. The institute already has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities. For the first version of the app, they want us to focus on the 6/49 lottery and build functions that can answer users the following questions:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three) winning numbers on a single ticket?

The scenario we're following throughout this project is fictional — the main purpose is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.

In [1]:
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator


# One-ticket Probability

We need to build a function that calculates the probability of winning the big prize for any given ticket. For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers.

The engineer team told us that we need to be aware of the following details when we write the function:

Inside the app, the user inputs six different numbers from 1 to 49.
Under the hood, the six numbers will come as a Python list and serve as an input to our function.
The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.
Below, we write the one_ticket_probability() function, which takes in a list of six unique numbers and prints the probability of winning in a way that's easy to understand.

In [2]:
def one_ticket_probability(user_numbers):
    n_combinations = combinations(49,6)
    probability_one_ticket = 1/n_combinations
    percentage_form = probability_one_ticket * 100
    print('''Your chances to win the big prize with the numbers {} are {:.7f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(user_numbers,
                    percentage_form, int(n_combinations)))
    

In [3]:
test_input_1 = [5,25,48,6,8,26]
one_ticket_probability(test_input_1)

Your chances to win the big prize with the numbers [5, 25, 48, 6, 8, 26] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [4]:
test_input_2 = [77,100,6,99,7,1]
one_ticket_probability(test_input_2)

Your chances to win the big prize with the numbers [77, 100, 6, 99, 7, 1] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


# Historical Data Check for Canada Lottery

The institute also wants us to consider the data coming from the national 6/49 lottery game in Canada. The data set contains historical data for 3,665 drawings, dating from 1982 to 2018 (the data set is included in this repository and can be downloaded from here  ).

In [11]:
import pandas as pd

lottery_canada = pd.read_csv('649.csv')
lottery_canada.shape


(3665, 11)

In [12]:
lottery_canada.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [14]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [15]:
def visualize_lottery_data(df):
    """Creates visualizations for lottery draw data."""
    
    # Convert 'DRAW DATE' to datetime format
    df['DRAW DATE'] = pd.to_datetime(df['DRAW DATE'])
    
    # Select only the number columns
    number_columns = ['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3',
                      'NUMBER DRAWN 4', 'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'BONUS NUMBER']
    
    # Correlation heatmap
    plt.figure(figsize=(10, 6))
    sns.heatmap(df[number_columns].corr(), annot=True, cmap='coolwarm', fmt='.2f')
    plt.title("Correlation Heatmap of Drawn Numbers")
    plt.show()
    
    # Distribution plot of all drawn numbers
    plt.figure(figsize=(12, 6))
    all_numbers = df[number_columns].values.flatten()
    sns.histplot(all_numbers, bins=50, kde=True, color='blue')
    plt.xlabel("Drawn Numbers")
    plt.ylabel("Frequency")
    plt.title("Distribution of Drawn Lottery Numbers")
    plt.show()
    
    # Pairplot for relationships between drawn numbers
    sns.pairplot(df[number_columns])
    plt.show()