# Canada Lottery Insights – A Mobile App for Lottery Addiction

## 1. Introduction

Many people start playing the lottery for fun, but for some, this activity turns into a habit that eventually escalates into an addiction. Like other compulsive gamblers, lottery addicts may begin dipping into their savings and loans, accumulating debts, and even engaging in desperate behaviors such as theft.

A medical institute focused on preventing and treating gambling addictions plans to build a dedicated mobile app to help lottery addicts better understand their chances of winning. The institute has a team of engineers to develop the app, but they need us to create its logical core and calculate the necessary probabilities.

For the first version of the app, they want us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and develop functions that allow users to answer questions such as:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play `40` different tickets (or any other number)?
- What is the probability of having at least `5` (or `4`, or `3`, or `2`) winning numbers on a single ticket?

The institute also wants us to incorporate historical data from the national 6/49 lottery game in Canada. The [dataset](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) we'll use contains information on `3,665` drawings, dating from `1982` to `2018`.

## 2. Building the Core Functions

In the 6/49 lottery, `6` numbers are drawn from a set of `49`, ranging from `1` to `49`. The drawing is done without replacement, meaning once a number is drawn, it's not returned to the set.

First, we'll define two functions that we will use frequently:
- `factorial()` – a function that calculates factorials.
- `combinations()` – a function that calculates combinations.

In [1]:
# Function to calculate the factorial of a given number 'n'
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

# Function to calculate the number of combinations (n choose k)
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n - k)
    return numerator / denominator

## 3. Calculating the Probability for a Single Lottery Ticket

In the 6/49 lottery, a player wins the big prize if all six numbers on their ticket match the six numbers drawn. If even one number differs, the player doesn't win.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize based on the numbers they play on a single ticket, where each ticket consists of `6` numbers chosen out of `49`. To achieve this, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

We discussed the requirements with the engineering team from the medical institute, and they informed us of a few details to keep in mind when writing the function:
- In the app, the user will input `6` different numbers from `1` to `49`.
- Behind the scenes, these six numbers will be passed as a Python list, which will be the sole input to our function.
- The engineering team wants the function to display the probability in a user-friendly format, making it easy for people without a background in probability to understand.

In [2]:
# Function to calculate the probability of winning the big prize with a single lottery ticket
def single_ticket_prob(user_numbers):
    
    # Calculate the total number of possible combinations in 6/49 lottery
    n_combinations = combinations(49, 6)
    
    # Calculate the probability of winning with a single ticket, and convert it into percentage form
    single_ticket_p = 1 / n_combinations
    percentage_form = single_ticket_p * 100
    
    # Print the result in a user-friendly format
    print('''Your chances of winning the big prize with the numbers {} are {:.7f}%.
In other words, you have a 1 in {:,} chance of winning.'''.format(user_numbers, percentage_form,
                                                               int(n_combinations)))

Next, we're going to test the `single_ticket_prob` function using two different inputs.

In [3]:
# Calculate the probability of winning the lottery with the defined test input using one ticket
test_input_1 = [2, 43, 22, 23, 11, 5]
single_ticket_prob(test_input_1)

Your chances of winning the big prize with the numbers [2, 43, 22, 23, 11, 5] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.


In [4]:
# Calculate the probability of winning the lottery with the defined test input using one ticket
test_input_2 = [9, 26, 41, 7, 15, 6]
single_ticket_prob(test_input_2)

Your chances of winning the big prize with the numbers [9, 26, 41, 7, 15, 6] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.


## 4. Using Historical Data from the Canada Lottery

For the first version of the app, users should also be able to compare their tickets against historical lottery data to determine whether they would have ever won by now. We'll focus on exploring the historical data from the Canada 6/49 lottery using a clean dataset that can be downloaded from [Kaggle](https://www.kaggle.com/datasets/datascienceai/lottery-dataset).

In [5]:
# Import the relevant libraries
import pandas as pd

# Load the Canada lottery dataset and print its dimensions
lottery_canada = pd.read_csv('Datasets/649.csv')
print(lottery_canada.shape)

# Display the first and last 3 rows of the dataset
display(lottery_canada.head(3))
display(lottery_canada.tail(3))

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The dataset contains `3,665` entries with `11` columns. Each row represents a lottery drawing, including the draw date and the six main drawn numbers along with a bonus number. Moreover, the earliest draw is from `June 12, 1982`, and the most recent is from `June 20, 2018`. The data structure allows for analysis of winning patterns and historical performance in the Canada 6/49 lottery.

## 5. Comparing User Tickets to Historical Data

Next, we're going to write a function that will enable users to compare their tickets against the historical lottery data in Canada and determine whether they would have ever won by now. The engineering team informed us that we need to be aware of the following details:

- Inside the app, the user inputs `6` different numbers from `1` to `49`.
- Under the hood, the six numbers will come as a Python list and serve as input to our function.
- The engineering team wants us to write a function that prints the number of times the selected combination occurred in the Canada lottery dataset, as well as the probability of winning the big prize in the next drawing with that combination.

In [6]:
# Define a function to extract winning numbers from a row of the dataset
def winning_numbers(row):
    
    # Extract the winning numbers from columns 4 to 9 (inclusive)
    row = row[4:10]
    # Convert the winning numbers to a set to ensure uniqueness
    row_set = set(row.values)
    return row_set

# Apply the function to each row of the dataset to get winning numbers
winning_nums = lottery_canada.apply(winning_numbers, axis=1)
winning_nums.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Now, we'll write a function that takes the user numbers and the historical numbers as input, then prints information regarding the number of occurrences and the probability of winning in the next drawing.

In [7]:
# Define a function to check the historical occurrence of user-provided lottery numbers
def check_occurrence(user_numbers, historical_numbers):

    # Convert user numbers to a set and count how often they occur in the historical data
    user_numbers_set = set(user_numbers)
    n_occurrences = (historical_numbers == user_numbers_set).sum()
    
    # Check if the combination has never occurred
    if n_occurrences == 0:
        print('''The combination {} has never occurred.
This doesn't mean it's more likely to occur in the next drawing.
Your chances of winning the big prize in the next drawing using the same combination are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.'''.format(user_numbers_set))
    
    # If the combination has occurred, display the number of occurrences 
    # and the probability of winning in the next drawing
    else:
        print('''The number of times the combination {} has occurred in the past is {}.
Your chances of winning the big prize in the next drawing using the same combination are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.'''.format(user_numbers_set, n_occurrences))

Let's now test the `check_occurrence` function using two different inputs.

In [8]:
# Check the historical occurrence of the defined test input combination in winning numbers
test_input_3 = [33, 36, 37, 39, 8, 41]
check_occurrence(test_input_3, winning_nums)

The number of times the combination {33, 36, 37, 39, 8, 41} has occurred in the past is 1.
Your chances of winning the big prize in the next drawing using the same combination are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.


In [9]:
# Check the historical occurrence of the defined test input combination in winning numbers
test_input_4 = [3, 2, 44, 22, 1, 44]
check_occurrence(test_input_4, winning_nums)

The combination {1, 2, 3, 44, 22} has never occurred.
This doesn't mean it's more likely to occur in the next drawing.
Your chances of winning the big prize in the next drawing using the same combination are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.


## 6. Calculating the Probability for Multiple Lottery Tickets

Lottery addicts often play multiple tickets in a single drawing, believing this will significantly increase their chances of winning. Since our goal is to help them better understand their actual odds, we'll write a function that allows users to calculate their chances of winning based on the number of different tickets they play.

We consulted with the engineering team, and they provided the following details:
- Users will input the number of tickets they plan to play, without specifying the exact combinations.
- The input will be an integer between `1` and `13,983,816`. The latter represents the maximum number of different possible tickets.
- The function should print the probability of winning the big prize based on the number of tickets played.

In [10]:
# Define a function to calculate the probability of winning with one or more tickets
def multi_ticket_prob(n_tickets):
    
    # Calculate the total number of possible combinations in 6/49 lottery
    n_combinations = combinations(49, 6)
    
    # Calculate the probability based on the number of tickets played
    multi_ticket_p = n_tickets / n_combinations
    percentage_form = multi_ticket_p * 100
    
    # Display probability for one ticket
    if n_tickets == 1:
        print('''Your chances of winning the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chance of winning.'''.format(percentage_form, 
                                                                  int(n_combinations)))
    # Display probability for multiple tickets
    else:
        adj_n_combinations = round(n_combinations / n_tickets)
        print('''Your chances of winning the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chance of winning.'''.format(n_tickets, percentage_form,
                                                                  adj_n_combinations))

Now, let's run a couple of tests for the `multi_ticket_prob` function.

In [11]:
# Store different ticket counts to test
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

# Loop through each test input and calculate the probability
for t_input in test_inputs:
    multi_ticket_prob(t_input)
    print('------------------------')

Your chances of winning the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chance of winning.
------------------------
Your chances of winning the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chance of winning.
------------------------
Your chances of winning the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chance of winning.
------------------------
Your chances of winning the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chance of winning.
------------------------
Your chances of winning the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chance of winning.
------------------------
Your chances of winning the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chance of winning.
------------------------
Your chances of winn

## 7. Calculating the Probability of Matching Winning Numbers (2 to 5)

Since users might be interested in knowing the probability of matching exactly `2`, `3`, `4`, or `5` winning numbers, we'll write a function to calculate these probabilities. In most 6/49 lotteries, there are smaller prizes for matching `2`, `3`, `4`, or `5` numbers.

These are the engineering details to keep in mind:
- Users will input `6` different numbers from `1` to `49`, along with an integer between `2` and `5` representing the number of winning numbers they expect to match.
- The function will print the probability of matching the inputted number of winning numbers.

For the sake of example, let's say a player chooses the following six numbers on a ticket: `(1, 2, 3, 4, 5, 6)`. Out of these six numbers, we can form `6` five-number combinations: `(1, 2, 3, 4, 5)`, `(1, 2, 3, 4, 6)`, `(1, 2, 3, 5, 6)`, `(1, 2, 4, 5, 6)`, `(1, 3, 4, 5, 6)`, `(2, 3, 4, 5, 6)`. For each of these combinations, there are `44` possible successful outcomes in a lottery drawing. For instance, there are `44` lottery outcomes that would return a prize for the combination `(1, 2, 3, 4, 5)`.

However, we need to leave out the outcome `(1, 2, 3, 4, 5, 6)` because we're only interested in outcomes that match exactly five numbers (not at least five numbers). This means that for each of our `6` five-number combinations, we have only `43` possible successful outcomes, not `44`.

Since there are `6` five-number combinations and each combination corresponds to `43` successful outcomes, we need to multiply `6` by `43` to find the total number of successful outcomes: `6 × 43 = 258`. Furthermore, since there are `13,983,816` total possible outcomes, the probability of having exactly `5` winning numbers for a single lottery ticket is: `258 / 13,983,816 = 0.00001845`.

In [12]:
# Define a function to calculate the probability of matching winning numbers (2 to 5)
def winning_prob_under_6(n_winning_numbers):
    
    # Calculate the number of ways to choose the winning numbers from the ticket,
    # and the remaining numbers from the non-winning pool
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    
    # Calculate the total number of successful outcomes and possible combinations in the lottery
    n_successful_outcomes = n_combinations_ticket * n_combinations_remaining
    n_combinations_total = combinations(49, 6)
    
    # Calculate the probability of having the specified number of winning numbers
    prob = n_successful_outcomes / n_combinations_total
    prob_percentage = prob * 100
    
    # Adjust the total combinations and print the results in a user-friendly format
    adj_n_combinations = round(n_combinations_total / n_successful_outcomes)
    print('''Your chances of having {} winning numbers with this ticket are {:.4f}%.
In other words, you have a 1 in {:,} chance of winning.'''.format(n_winning_numbers, prob_percentage,
                                                                  int(adj_n_combinations)))

Now, let's run a couple of tests for the `winning_prob_under_6` function.

In [13]:
# Iterate through the counts of winning numbers and calculate probabilities
for t_input in [2, 3, 4, 5]:
    winning_prob_under_6(t_input)
    print('------------------------')

Your chances of having 2 winning numbers with this ticket are 13.2378%.
In other words, you have a 1 in 8 chance of winning.
------------------------
Your chances of having 3 winning numbers with this ticket are 1.7650%.
In other words, you have a 1 in 57 chance of winning.
------------------------
Your chances of having 4 winning numbers with this ticket are 0.0969%.
In other words, you have a 1 in 1,032 chance of winning.
------------------------
Your chances of having 5 winning numbers with this ticket are 0.0018%.
In other words, you have a 1 in 54,201 chance of winning.
------------------------


## 8. Conclusion

In this project, a medical institute focused on preventing and treating gambling addictions planned to build a mobile app to help lottery addicts understand their chances of winning. The institute's engineering team would develop the app, but they needed us to create the core logic and calculate the probabilities.

For the first version, they asked us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and develop functions to answer key questions. We also incorporated historical data from Canada's national 6/49 lottery, which included `3,665` drawings from `1982` to `2018`. Here are the functions we defined:
- `factorial()` – calculates the factorial of a number `n`.
- `combinations()` – calculates combinations (`n` choose `k`).
- `single_ticket_prob()` – calculates the probability of winning the big prize with a single ticket.
- `winning_numbers()` – extracts winning numbers from a dataset row.
- `check_occurrence()` – checks how often user-provided numbers have won.
- `multi_ticket_prob()` – calculates the probability of winning with one or more tickets.
- `winning_prob_under_6()` – calculates the probability of matching exactly `2`, `3`, `4`, or `5` numbers.

Furthermore, we tested the `single_ticket_prob` and `check_occurrence` functions using two different inputs for each, as well as conducted several tests for the `multi_ticket_prob` and `winning_prob_under_6` functions with multiple test inputs.