# Helping Lottery Participants to Play Responsibly

## Author: Salvatore Porcheddu
### Date: 2021/05/12

# Introduction

This project is about developing the logical core for an application that calculates winning probabilities for the [Canada 6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) so that players can have fun responsibly and knowing exactly what their odds of winning are, possibly preventing and/or helping to treat lottery addiction.

A 6/49 lottery game usually works this way:

- there are 49 numbers ranging from 1 to 49;
- six numbers are drawn from this set; *there is usually a bonus number too, but we will not be considering it during this project*;
- players win the big prize if all the numbers on their ticket match the numbers drawn.

In the course of the project we will build functions that allow users to answer questions like the following:

- what is the probability of winning the big prize with a single ticket?
- what is the probability of winning the big prize playing n different tickets?
- what is the probability of having at least n winning numbers on a single ticket?

We will also consider [historical data](https://www.kaggle.com/datascienceai/lottery-dataset) of Canada 49/6 lottery drawings between 1982 and 2018 to see which combinations of winning numbers have already occurred.

## Defining basic functions

We will start by defining two functions that we will use extensively in the next steps:

- a function that given a number computes its **factorial**;
- a function that computes the number of **combinations** when we're sampling without replacement and we only take a certain number of objects from a group of objects.

In [88]:
# factorial function
def factorial(n):
    """This function receives an integer and
    computes its factorial;
    
    Parameters:
        n: an int >= 0
        
    Returns:
        the factorial of n
    """
    if n < 0:
        raise ValueError ("Please insert a value >= 0")
    
    n = int(n)
    f = n
    while n > 1:
        f *= (n - 1)
        n -= 1
    return f

# combinations function
def combinations(n, k):
    """This function receives two integer numbers, n and k
    and computes the number of possible combinations 
    when only k objects are taken from a group of n objects;
    
    Parameters:
        n: an int >= 0, corresponding to the total number of 
           objects in the group
        k: an int >= 0 and <= n, corresponding to the number of
           objects taken from the group
    
    Returns:
        the number of combinations possible when taking from a group
        of n objects a subset k of objects.
    """
    n = int(n)
    k = int(k)
    fn = factorial(n)
    fk = factorial(k)
    fnk = factorial(n - k)
    C = fn / (fk * fnk)
    return int(C)

## Probability of winning the big prize

Now that we have elementary functions to compute factorials and combinations, we will focus on determining the **probability of winning the big prize on a single ticket**.

Let's build a function that computes this probability.

In [89]:
def one_ticket_probability(numbers):
    """
    This function gives the probability of winning the big prize
    of a 6/49 lottery when playing a single ticket.
    
    Parameters:
        numbers: list of 6 integers.
        
    Returns:
        prints the probability of winning the big prize.
    """
    if not isinstance(numbers, list):
        raise TypeError ("Please insert the numbers into a Python list.")
        
    if len(numbers) != 6:
        raise ValueError ("Please insert six numbers.")
        
    for number in numbers:
        if number > 49 or number < 1:
            raise ValueError ("Please insert only numbers ranging from 1 to 49.")
    
    int_numbers = []
    for number in numbers:
        int_numbers.append(int(number))  
    
    n_outcomes = combinations(49, len(numbers)) # total number of outcomes possible
    prob = 1 / n_outcomes # only one outcome is successful
    print("With a single ticket, you will win once every {0:,} attempts, which means that your probability of winning is only {1:.6%}.\n".format(n_outcomes, prob))

In [90]:
# for example, given the following list of numbers, the function returns...
numbers = [2, 34, 6, 19, 1, 40]
one_ticket_probability(numbers)

With a single ticket, you will win once every 13,983,816 attempts, which means that your probability of winning is only 0.000007%.



## Analyzing the historical data

Let's now proceed and take a look at the historical data from the Canada 6/49 lottery from 1982 to 2018.

We will then create another function in order to give users the chance to see whether they would have won past lottery games with their set of numbers of choice or not.

In [91]:
import pandas as pd
import numpy as np

drawings = pd.read_csv("649.csv")

print("The dataset has {} rows and {} columns".format(drawings.shape[0], drawings.shape[1]))

# printing the first and the last three rows of the dataset
drawings.iloc[[0, -3, -2, -1],:]

The dataset has 3665 rows and 11 columns


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


As we can see, our historical data contains 3665 drawings and the six winning numbers are listed in the columns `NUMBER DRAWN ...`.

Now, we will create a function to extract all the winning numbers for every draw, which we will then use to build the function that we mentioned above.

In [92]:
def extract_numbers(row):
    row = row[4:10].copy()
    winning_numbers = set()
    for number in row:
        number = int(number)
        winning_numbers.add(number)
    return winning_numbers

# now let's create a new column in the drawings dataframe
# for the sets of winning numbers
drawings["winning_numbers"] = drawings.apply(extract_numbers, axis=1)

drawings.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER,winning_numbers
0,649,1,0,6/12/1982,3,11,12,14,41,43,13,"{3, 41, 11, 12, 43, 14}"
1,649,2,0,6/19/1982,8,33,36,37,39,41,9,"{33, 36, 37, 39, 8, 41}"
2,649,3,0,6/26/1982,1,6,23,24,27,39,34,"{1, 6, 39, 23, 24, 27}"
3,649,4,0,7/3/1982,3,9,10,13,20,43,34,"{3, 9, 10, 43, 13, 20}"
4,649,5,0,7/10/1982,5,14,21,31,34,47,45,"{34, 5, 14, 47, 21, 31}"


In [93]:
# next, we build the function to check the historical occurrence
# of a given set of numbers

def check_historical_occurrence(numbers, historical_data):
    """
    Function that takes in a list of six numbers and checks
    if ALL of these numbers have occurred in a past lottery drawing.
    
    Parameters:
        numbers: list of six numbers
        historical_data: pandas series containing winning numbers 
                         from past lottery drawings.
    
    Returns:
        Prints information about the number of times that the combination
        of numbers has occurred.
        Prints information about the probability of winning the big prize
        in the next drawing with these numbers.
    """
    import pandas
    
    if not isinstance(numbers, list):
        raise TypeError ("Please insert the numbers into a Python list.")
        
    if len(numbers) != 6:
        raise ValueError ("Please insert six numbers.")
        
    for number in numbers:
        if number > 49 or number < 1:
            raise ValueError ("Please insert only numbers ranging from 1 to 49.")
            
    if not isinstance(historical_data, pandas.core.series.Series):
        raise TypeError ("Please provide a pandas Series object containing past winning numbers.")
        
    numbers_set = set(numbers)
    matchings = numbers_set == historical_data
    n_matchings = matchings.sum()
    
    n = 49
    k = len(numbers)
    outcomes = combinations(n, k)
    prob = 1 / outcomes 
    
    print("The combination {} has occurred {} times in the past {} drawings.".format(numbers, n_matchings, historical_data.shape[0]))
    print("--------------------------------------------")
    print("If you play a single ticket with the combination {} in the next drawing, your chance of winning is only 1 in {:,} times, which translates to a probability of winning of only {:.6%}.\n".format(numbers, outcomes, prob))

    
# for example, given the following list, the function returns...
combination = [2, 29, 37, 14, 48, 8]

check_historical_occurrence(combination, drawings["winning_numbers"])

The combination [2, 29, 37, 14, 48, 8] has occurred 0 times in the past 3665 drawings.
--------------------------------------------
If you play a single ticket with the combination [2, 29, 37, 14, 48, 8] in the next drawing, your chance of winning is only 1 in 13,983,816 times, which translates to a probability of winning of only 0.000007%.



## Multiple lottery tickets

Many people and especially addicts play more than one ticket on a single drawing to increase their winning chances. So, we will build another function that allows the users to **specify the number of tickets they want to play**, from 1 to 13,983,816 (which is the total number of different combinations as we have seen earlier).

As the number of tickets is now bigger than one, it is of no use what numbers are played for each of them - assuming of course that the combinations are all different - so this time we will not allow the users to give the function a list of numbers: they will only input the number of tickets they want to play.

After processing the number of tickets, the function will output the **probability of winning the big prize associated with the multi-ticket scenario**.

In [94]:
def multi_ticket_probability(tickets):
    """
    This function computes the probability of winning the big prize
    of a 6/49 lottery by playing one or more tickets on a single drawing, 
    each containing a different combination of numbers.
    
    Parameters:
        tickets: int between 1 and 13,983,816 inclusive.
    
    Returns:
        Prints information about the probability of winning using n tickets
    """
    tickets = int(tickets)
    
    if tickets < 1 or tickets > 13983816:
        raise ValueError ("Please insert a number between 1 and 13,983,816")
        
    n = 49
    k = 6
    outcomes = combinations(n, k)
    attempts_to_win = int(outcomes / tickets)
    prob = tickets / outcomes
    
    print("If you play {:,} tickets on a single drawing, the odds are one in {:,} times, which means that your probability of winning is {:.6%}.\n".format(tickets, attempts_to_win, prob))
    

# for example, if we play the following numbers of tickets, we get...
tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for ticket in tickets:
    multi_ticket_probability(ticket)

If you play 1 tickets on a single drawing, the odds are one in 13,983,816 times, which means that your probability of winning is 0.000007%.

If you play 10 tickets on a single drawing, the odds are one in 1,398,381 times, which means that your probability of winning is 0.000072%.

If you play 100 tickets on a single drawing, the odds are one in 139,838 times, which means that your probability of winning is 0.000715%.

If you play 10,000 tickets on a single drawing, the odds are one in 1,398 times, which means that your probability of winning is 0.071511%.

If you play 1,000,000 tickets on a single drawing, the odds are one in 13 times, which means that your probability of winning is 7.151124%.

If you play 6,991,908 tickets on a single drawing, the odds are one in 2 times, which means that your probability of winning is 50.000000%.

If you play 13,983,816 tickets on a single drawing, the odds are one in 1 times, which means that your probability of winning is 100.000000%.



## Matching less numbers

Most lotteries don't just have a big prize awarded if all numbers are matched, but also smaller prizes associated with matching five, four, three or even two numbers. The Canada 6/49 lottery is no exception to this, so we will build one last function that takes into account the odds of winning when less numbers are matched.

We will once again assume that only one ticket is played and give the users the chance of inserting their desired number of matches between two and five.

The function will return the **probability of winning the prize associated with the given number of expected matches**.

In [96]:
def probability_less_6(matches):
    """
    Function that computes the probability of winning with a single ticket
    when matching as many numbers as specified by the user.
    
    Parameters:
        matches: int between 2 and 5 inclusive.
        
    Returns:
        Prints information about the probability of winning.
    """
    matches = int(matches)
    
    if matches > 5 or matches < 2:
        raise ValueError ("Please insert a number of matches between 2 and 5 inclusive.")
    
    n = 49
    
    # calculating the number of combinations between 6 and matches
    comb = combinations(6, matches)
    
    # calculating how many outcomes match exactly matches numbers
    succ_outcomes = comb * combinations(43, 6 - matches)
    
    # calculating the probability of winning
    total_outcomes = combinations(n, 6)
    prob = succ_outcomes / total_outcomes
    odds = int(total_outcomes / succ_outcomes)
    
    print("If you only want to match {} numbers, your odds of winning are 1 in {:,} chances, which means that your probability of winning is {:.4%}.\n".format(matches, odds, prob))

# the odds of winning with every possible input are the following
matches = [2, 3, 4, 5]

for match in matches:
    probability_less_6(match)

If you only want to match 2 numbers, your odds of winning are 1 in 7 chances, which means that your probability of winning is 13.2378%.

If you only want to match 3 numbers, your odds of winning are 1 in 56 chances, which means that your probability of winning is 1.7650%.

If you only want to match 4 numbers, your odds of winning are 1 in 1,032 chances, which means that your probability of winning is 0.0969%.

If you only want to match 5 numbers, your odds of winning are 1 in 54,200 chances, which means that your probability of winning is 0.0018%.



# Conclusion

The functions that we developed clearly show that **the odds of winning the Canada 6/49 lottery, or basically any other 6/49 lottery, are very slim, especially if one aims to win the big prize**.

Hopefully they will one day be useful to educate people on their real odds of winning and thus encourage them to play in a more responsible way.