# Cracking the Odds: Understanding Your Chances in a 6/49 Lottery

<img src='background.jpg' style='width:100%'/>

## Introduction

This project examines the probabilities of winning in a 6/49 lottery, focusing on scenarios such as matching all six numbers, partial matches, and the impact of purchasing multiple tickets. The goal is to provide clear insights into the odds of success, helping users understand the improbability of jackpot wins and the likelihood of smaller prizes.

To achieve this, I used mathematical combinatorics to calculate probabilities, analyzed historical occurrences of specific number combinations, and assessed how multiple ticket purchases affect winning chances. Additionally, I explored the odds of matching 2, 3, 4, or 5 numbers to highlight how these outcomes compare to hitting the jackpot.

The analysis revealed that while the odds of winning the jackpot are extremely low (approximately **0.0000072%**), smaller matches are much more achievable, with probabilities of **13.24%** for matching 2 numbers and **1.77%** for matching 3 numbers. These findings offer a practical perspective on lottery probabilities, illustrating the steep odds of major wins while emphasizing the importance of setting realistic expectations when playing.

## Table of Content
1. [Essential Functions for Probability](#Essential-Functions-for-Probability)  
2. [Calculating the Probability of Winning the Big Prize with a Single Ticket](#Calculating-the-Probability-of-Winning-the-Big-Prize-with-a-Single-Ticket)  
3. [Loading and Exploring the Canada Lottery Data Set](#Loading-and-Exploring-the-Canada-Lottery-Data-Set)  
4. [Checking Historical Occurrences of a Lottery Combination](#Checking-Historical-Occurrences-of-a-Lottery-Combination)  
5. [Calculating Multi-Ticket Probability of Winning the Lottery](#Calculating-Multi-Ticket-Probability-of-Winning-the-Lottery)  
6. [Function to Calculate the Probability of Matching Less Winning Numbers](#Function-to-Calculate-the-Probability-of-Matching-Less-Winning-Numbers)  
7. [Conclusion](#Conclusion)  

## Essential Functions for Probability

In [6]:
from math import prod, comb

# Calculates the factorial of a given number n
def factorial(n):
    return prod([i for i in range(1, n + 1)])

# Calculates the number of ways to choose k items from n items
def combinations(n, k):
    return comb(n, k)

## Calculating Factorials and Combinations

To answer questions about lottery probabilities, we need to repeatedly calculate factorials and combinations. These mathematical concepts are essential for understanding probabilities, especially in games like the 6/49 lottery.

---

### Factorials
A factorial, denoted as `n!`, is the product of all positive integers from `1` to `n`. Factorials are crucial for calculating combinations, which are key to determining lottery probabilities.

---

### Combinations
Combinations calculate the number of ways to select `k` items from a total of `n` items without considering the order. This is critical for the 6/49 lottery, where we need to determine the total number of possible combinations of winning numbers.

---

By using factorials and combinations, we can accurately compute the probabilities associated with lottery games like the 6/49, providing a clearer understanding of the odds involved.

## Calculating the Probability of Winning the Big Prize with a Single Ticket

In [9]:
def one_ticket_probability(ticket):
    # Check if the input is valid (a list of six unique numbers from 1 to 49)
    if len(set(ticket)) != 6 or not all(1 <= num <= 49 for num in ticket):
        print("Invalid ticket. Please provide a list of six unique numbers between 1 and 49.")
        return
    
    # Total number of possible combinations in a 6/49 lottery
    total_combinations = combinations(49, 6)
    
    # The probability of winning the big prize
    probability = 1 / total_combinations
    probability_percentage = probability * 100
    
    # Print the result in a friendly way
    print(f"The probability of winning the big prize with your ticket is 1 in {total_combinations:,}.")
    print(f"That's approximately {probability_percentage:.7f}%.")

# Example 1:
one_ticket_probability([1, 2, 3, 4, 5, 6])

The probability of winning the big prize with your ticket is 1 in 13,983,816.
That's approximately 0.0000072%.


In [10]:
# Example 2:
one_ticket_probability([1, 1, 2, 3, 4, 5])

Invalid ticket. Please provide a list of six unique numbers between 1 and 49.


In [11]:
# Example 3:
one_ticket_probability([1, 2, 3, 4, 5])

Invalid ticket. Please provide a list of six unique numbers between 1 and 49.


## Analysis of The Probability Calculation Function

In a 6/49 lottery, players select six unique numbers from a range of 1 to 49. To win the jackpot, all six numbers on the player's ticket must match the six numbers drawn. The odds of this happening are extremely low, and the `one_ticket_probability` function calculates and presents this probability in a clear, user-friendly way.

---

### How the Function Works

#### **1. Input Validation**
- The ticket must contain exactly six unique numbers.  
- Each number must be between 1 and 49.  
- If the input fails to meet these criteria, the function displays an error message and stops execution.  

#### **2. Probability Calculation**
- The function calculates the total number of possible combinations in the 6/49 lottery using the formula \( C(49, 6) \), which determines the number of ways to choose six numbers from a pool of 49.  

#### **3. User-Friendly Output**
- The probability is displayed in the format "1 in X," where \( X \) represents the total number of possible combinations.  
- The probability is also expressed as a percentage for easier understanding.  

---

This function provides a straightforward way to comprehend the extremely low odds of winning the 6/49 lottery jackpot.

## Loading and Exploring the Canada Lottery Data Set

In [14]:
import pandas as pd
import janitor

# Load the dataset
raw_data = pd.read_csv('649.csv').clean_names()

# Dispaly the first 5 rows of the raw dataset
raw_data.head()

Unnamed: 0,product,draw_number,sequence_number,draw_date,number_drawn_1,number_drawn_2,number_drawn_3,number_drawn_4,number_drawn_5,number_drawn_6,bonus_number
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [15]:
# Dispaly the last 5 rows of the raw dataset
raw_data.tail()

Unnamed: 0,product,draw_number,sequence_number,draw_date,number_drawn_1,number_drawn_2,number_drawn_3,number_drawn_4,number_drawn_5,number_drawn_6,bonus_number
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [16]:
# Display the number of rows and columns
print('Number of rows and columns:', raw_data.shape, '\n')

# Display the number of missing values in each column
print('Number of missing values:', '\n',raw_data.isnull().sum(), sep='') 

Number of rows and columns: (3665, 11) 

Number of missing values:
product            0
draw_number        0
sequence_number    0
draw_date          0
number_drawn_1     0
number_drawn_2     0
number_drawn_3     0
number_drawn_4     0
number_drawn_5     0
number_drawn_6     0
bonus_number       0
dtype: int64


In [17]:
raw_data.describe()

Unnamed: 0,product,draw_number,sequence_number,number_drawn_1,number_drawn_2,number_drawn_3,number_drawn_4,number_drawn_5,number_drawn_6,bonus_number
count,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0
mean,649.0,1819.494952,0.030832,7.327694,14.568076,21.890859,28.978445,36.162619,43.099045,24.599454
std,0.0,1039.239544,0.237984,5.811669,7.556939,8.170073,8.069724,7.19096,5.506424,14.360038
min,649.0,1.0,0.0,1.0,2.0,3.0,4.0,11.0,13.0,0.0
25%,649.0,917.0,0.0,3.0,9.0,16.0,23.0,31.0,40.0,12.0
50%,649.0,1833.0,0.0,6.0,14.0,22.0,30.0,37.0,45.0,25.0
75%,649.0,2749.0,0.0,10.0,20.0,28.0,35.0,42.0,47.0,37.0
max,649.0,3591.0,3.0,38.0,43.0,45.0,47.0,48.0,49.0,49.0


## Checking Historical Occurrences of a Lottery Combination

In [19]:
# Function to extract winning numbers as a set from a row
def extract_numbers(row):
    """
    Extracts six winning numbers from a row of the DataFrame and returns them as a set.
    """
    return {
        row['number_drawn_1'], 
        row['number_drawn_2'], 
        row['number_drawn_3'], 
        row['number_drawn_4'], 
        row['number_drawn_5'], 
        row['number_drawn_6']
    }

# Create a copy of the raw_data DataFrame to avoid modifying the original directly
raw_data = raw_data.copy()

# Apply the function to each row of the DataFrame to create a new column 'winning_numbers'
# The 'winning_numbers' column will store sets containing the six winning numbers for each row
raw_data['winning_numbers'] = raw_data.apply(extract_numbers, axis=1)

# Display the first five rows of the new 'winning_numbers' column
raw_data['winning_numbers'].head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
Name: winning_numbers, dtype: object

In [20]:
def check_historical_occurence(user_numbers, winning_numbers=raw_data['winning_numbers']):   
    """
    Compares users' tickets against the historical lottery data in Canada
    and determines whether they would have ever won by now.

    """
    # Check if the input is valid (a list of six unique numbers from 1 to 49)
    if len(set(user_numbers)) != 6 or not all(1 <= num <= 49 for num in user_numbers):
        print("Invalid ticket. Please provide a list of six unique numbers between 1 and 49.")
        return
        
    # Convert the user's list of numbers into a set for easier comparison
    set_user_numbers = set(user_numbers)
    
    # Initialize a counter for how many times the user's combination has matched the historical winning numbers
    winning_count = 0
    
    # Iterate through the Series of winning numbers
    for row in winning_numbers:
        # Find the intersection between the user's numbers and the winning numbers
        winning_matches = set_user_numbers & row
        
        # If the intersection contains all 6 numbers, count it as a match
        if len(winning_matches) == 6:
            winning_count += 1
    
    # Inform the user how many times their combination has appeared historically
    print(
        f'Your combination: {user_numbers} has occurred in the past exactly {winning_count} times!',
        '\nBut don\'t despair! With this astonishing combination of numbers, you still have a chance of winning the lottery!'
    )
    
    # Call the one_ticket_probability function to show the probability of winning
    one_ticket_probability(user_numbers)

# Example input to test the function
test_numbers = [1, 6, 23, 24, 27, 39]
check_historical_occurence(test_numbers)

Your combination: [1, 6, 23, 24, 27, 39] has occurred in the past exactly 1 times! 
But don't despair! With this astonishing combination of numbers, you still have a chance of winning the lottery!
The probability of winning the big prize with your ticket is 1 in 13,983,816.
That's approximately 0.0000072%.


In [21]:
test_numbers_2 = [1, 2, 3, 4, 5, 6]
check_historical_occurence(test_numbers_2)

Your combination: [1, 2, 3, 4, 5, 6] has occurred in the past exactly 0 times! 
But don't despair! With this astonishing combination of numbers, you still have a chance of winning the lottery!
The probability of winning the big prize with your ticket is 1 in 13,983,816.
That's approximately 0.0000072%.


In [22]:
test_numbers_3 = [5, 4, 3, 2, 1]
check_historical_occurence(test_numbers_3)

Invalid ticket. Please provide a list of six unique numbers between 1 and 49.


## Analysis of Checking Historical Occurrences of a Lottery Combination

The `check_historical_occurence()` function determines whether a user's chosen lottery combination has ever been a winning set in the past. It takes two inputs:  

1. **`user_numbers`**: A Python list of six numbers (e.g., `[1, 6, 23, 24, 27, 39]`).  
2. **`winning_numbers`**: A pandas Series of sets containing historical winning numbers (default: `raw_data['winning_numbers']`).  

---

### Key Steps in the Function

1. **Convert Input to Set**:  
   - The user's combination is converted into a set using Python’s `set()` for easy comparison with historical winning combinations.  

2. **Iterate Through Historical Data**:  
   - The function checks for exact matches between the user's set and any historical winning combination.  

3. **Count Matches**:  
   - A counter keeps track of how many times the user’s combination has appeared as a winning set.  

4. **Display Results**:  
   - Outputs the number of times the combination occurred historically.  
   - Shows the probability of winning in the next draw using the `one_ticket_probability()` function.  

---

### Why This Step Matters

This step provides users with valuable insights into the historical performance of their chosen numbers. By seeing if their combination has ever been lucky, users gain a deeper understanding of their chances. While this analysis highlights historical occurrences, it also reminds users of the actual probabilities for future draws, keeping their expectations grounded yet hopeful.

## Calculating Multi-Ticket Probability of Winning the Lottery

In [25]:
def multi_ticket_probability(tickets_number):
    """
    Calculates and prints the probability of winning the lottery's big prize based on the 
    number of tickets played. The lottery assumes a 6/49 system where 6 numbers are drawn 
    from a pool of 49 without replacement.
    """
    
    # Total number of possible combinations in a 6/49 lottery
    total_combinations = combinations(49, 6)

    # The probability of winning the big prize
    probability = tickets_number / total_combinations
    probability_percentage = probability * 100

    # Print the probability as a percentage in an easy-to-understand format
    print(
        f'The probability of winning the lottery with {tickets_number} tickets is exactly\
        {probability_percentage:.7f}%.\n', '='*92, sep=''
    )

# Testing the function with various inputs representing the number of tickets
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for inputs in test_inputs:
    multi_ticket_probability(inputs)

The probability of winning the lottery with 1 tickets is exactly        0.0000072%.
The probability of winning the lottery with 10 tickets is exactly        0.0000715%.
The probability of winning the lottery with 100 tickets is exactly        0.0007151%.
The probability of winning the lottery with 10000 tickets is exactly        0.0715112%.
The probability of winning the lottery with 1000000 tickets is exactly        7.1511238%.
The probability of winning the lottery with 6991908 tickets is exactly        50.0000000%.
The probability of winning the lottery with 13983816 tickets is exactly        100.0000000%.


## Analysis of Calculating Multi-Ticket Probability

The `multi_ticket_probability()` function calculates and displays the probability of winning the jackpot in a 6/49 lottery based on the number of tickets played. Here’s how the function works:

---

1. **Calculate Total Possible Outcomes**:  
   - In a 6/49 lottery, six numbers are drawn from a pool of 49 without replacement, resulting in a fixed number of possible combinations.  

2. **Determine Successful Outcomes**:  
   - The number of tickets played represents the number of successful outcomes (e.g., playing 1 ticket means there is 1 successful outcome).  

3. **Calculate Probability**:  
   - The probability of winning is calculated as:  
     Probability = Number of Tickets Played / Total Possible Combinations  

4. **Display Results as a Percentage**:  
   - The calculated probability is converted into a percentage and displayed with precision up to 7 decimal places for clarity.  

---

## Function to Calculate the Probability of Matching Less Winning Numbers

In [28]:
def probability_less_6(number):
    """
    Calculates and prints the probability of matching a specific number of winning numbers
    in a 6/49 lottery where 6 numbers are drawn without replacement.
    """
    
    # Calculate the number of combinations to choose 'number' correct winning numbers
    winning_combinations = combinations(6, number)

    # Calculate the number of combinations for the remaining numbers (non-winning)
    remaining_combinations = combinations(43, 6 - number)

    # Calculate the total number of successful outcomes for the given match count
    successful_outcomes = winning_combinations * remaining_combinations

    # Calculate the total number of possible combinations in the 6/49 lottery
    total_combinations = combinations(49, 6)

    # Calculate the probability as a percentage
    probability = successful_outcomes / total_combinations
    probability_percentage = probability * 100

    # Display the result in an intuitive format
    print(
        f'The probability of matching exactly {number} winning numbers is\
        {probability_percentage:.8f}%.\n', '='*77, sep=''
    )

# Test the function for matching 2, 3, 4, or 5 winning numbers
for i in range(2, 6):
    probability_less_6(i)

The probability of matching exactly 2 winning numbers is        13.23780290%.
The probability of matching exactly 3 winning numbers is        1.76504039%.
The probability of matching exactly 4 winning numbers is        0.09686197%.
The probability of matching exactly 5 winning numbers is        0.00184499%.


## Calculating Probabilities for Matching 2, 3, 4, or 5 Winning Numbers

### Overview
This analysis examines the probabilities of matching 2, 3, 4, or 5 numbers in a 6/49 lottery. While most players aim for the jackpot by matching all six numbers, understanding the odds of fewer matches offers insight into the likelihood of winning smaller prizes.

---

### What the Function Does
The `probability_less_6()` function calculates and displays the probability of matching exactly 2, 3, 4, or 5 numbers in a 6/49 lottery. Here's how it works:

1. **Input**:  
   - The function accepts a single input, `number`, which specifies how many numbers the user wants to match (e.g., 2, 3, 4, or 5).  

2. **Calculation Steps**:  
   - **Winning Combinations**: Calculates the number of ways to select `number` correct numbers from the 6 drawn.  
   - **Remaining Combinations**: Determines the number of ways to choose the remaining numbers from the non-winning pool of 43 numbers.  
   - **Successful Outcomes**: Multiplies the values for winning and remaining combinations to get the total number of successful outcomes.  
   - **Probability**: Divides the total successful outcomes by the total possible combinations for the 6/49 lottery.  

3. **Output**:  
   - The function displays the probability as a percentage, rounded to eight decimal places, accompanied by a user-friendly message.  

---

By calculating these probabilities, players gain a clearer understanding of their chances of winning smaller prizes, which can enhance their overall perspective on lottery odds.

## Conclusion

This project analyzed the probabilities of winning a 6/49 lottery, focusing on various scenarios and outcomes. Starting with the odds of winning the jackpot with a single ticket, I expanded the analysis to include multi-ticket probabilities and the likelihood of matching fewer numbers, providing a comprehensive view of potential outcomes.

Using detailed calculations and real-world examples, I quantified the rarity of winning combinations and contextualized these probabilities with historical data from the Canada Lottery. The results emphasize the astronomical odds of winning the jackpot while offering a structured framework for understanding lottery probabilities. This analysis not only illustrates the mathematics of chance but also provides practical insights into similar probability-based scenarios.