# Programming a Mobile App for Estimating Lottery Winning Probabilities

## Introduction

We are working with a medical institute to aid in the development of a mobile app intended to prevent and treat lottery addiction. The app will help lottery addicts better estimate their chances of winning by answering basic probability questions related to the lottery, i.e., "What is the probability of winning the big prize if we have play n different tickets?". 

While the institute's engineers will build the app itself, we are responsible for creating the core logic and calculating the probabilities. For the first version of the app, we focus on the 6/49 lottery, one of the national lottery games in Canada, where six (6) numbers are drawn from a set of forty-nine (49). The app will also enable users to compare their habits to historical data. For this purpose, we use a [lottery dataset](https://www.kaggle.com/datascienceai/lottery-dataset) containing 3,665 drawings of the 6/49 game in Canada between 1982 and 2018.

## Coding Central Functions

Before we begin calculating probabilities and combinatorics, we need to define two critical functions that we will use throughout our work. 

The `factorial` function calculates factorials using the formula $$n! = n \times (n-1) \times (n-2) \times ... \times 2 \times 1,$$

and the `combinations` function calculates n choose k, or the number of ways of choosing an unordered subset of k elements from a set of n elements, as $$_nC_k = \frac{n!}{k!(n-k)!}.$$

In [65]:
def factorial(n):
    """Calculate n!.
    
    Parameters
    ----------
    n : int
        Integer of which to calculate factorial
    
    Returns
    -------
    int
        Result of n!.
    """
    if n == 1:
        return n
    else:
        return n * factorial(n-1)

In [66]:
def combinations(n, k):
    """Calculate number n choose k combinations.
    
    Parameters
    ----------
    n : int
        Number of elements in set.
    k : int
        Number of elements in unordered subset.
        
    Returns
    -------
    int
        Result of n choose k (number of combinations).
    """
    return factorial(n) / (factorial(k) * factorial(n-k))

## Calculating the Probability of Winning the Big Prize

With the above core functions in place, we can now begin calculating probabilities. For the initial app version, we want to allow users to calculate the probability of winning the prize given their individual tickets. For the 6/49 lottery, this constitutes calculating the probability of six numbers drawn *without replacement* from a set of 49 matching the six numbers on a player's ticket exactly (but in any order). 

Below, we define a function to calculate the probability of winning the big prize for any given ticket. The function takes a list of six different numbers input by the user and outputs a probability value in a user-friendly way.

In [84]:
def one_ticket_probability(ticket):
    """
    Calculate probability of winning big prize of 6/49 lottery game 
    for given ticket.
    
    Parameters
    ----------
    ticket : list of int
        Six unique integers, each ranging from 1 to 49.
    
    Returns
    -------
    float
        Probability of ticket matching the winning numbers exactly.
    """
    
    n_outcomes = combinations(49, 6)
    n_successes = 1
    p_big_win = n_successes / n_outcomes
    
    print('The probability of winning the big prize with ticket',
          '{} is {:.7f}%.'.format(ticket, p_big_win * 100))
    print('This means you have a {} in {:,} chance of winning.'
          .format(n_successes, int(n_outcomes)))
    
    return p_big_win

In [85]:
# test function
test_one_ticket_prob = one_ticket_probability([1,2,3,4,5,6])

The probability of winning the big prize with ticket [1, 2, 3, 4, 5, 6] is 0.0000072%.
This means you have a 1 in 13,983,816 chance of winning.


### Comparing Against Historical Lottery Data

The app will also contain an option for users to compare their tickets against historical lottery data to determine if they would have ever been winners in the past.

#### Exploring the Historical Data

We begin programming this task by exploring the historical lottery dataset, which contains 3,665 drawings of the Canadian 6/49 lottery game between the years 1982 and 2018. Each drawing is represented as a single row, with each of the six numbers drawn represented in an individual column.

In [69]:
# import pandas and read in dataset
import pandas as pd
data_649 = pd.read_csv('649.csv')

In [70]:
# print number of rows and columns
data_649.shape

(3665, 11)

In [71]:
# print first three rows
data_649.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [72]:
# print last three rows
data_649.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


#### Determining Past Winners

Using this historical data, we can now write a function to enable users to compare their tickets against past 6/49 lottery drawings to determine if their numbers would have ever been winners before. This function takes a list of six different numbers, representing the user's ticket, and outputs the number of times the ticket was a big prize winner in the past, as well as the probability of the ticket being a big prize winner in the future.

We first extract all the sets of winning numbers from the dataframe. Then we pass these along with the user's ticket to the historical comparison function.

In [73]:
def extract_numbers(drawing):
    """
    Return set of winning numbers from drawn number columns in df row.
    
    Parameters
    ----------
    drawing : pandas.Series
        Row of dataframe, representing individual lottery drawing.
    
    Returns
    -------
    set of int
        Six winning numbers.
    """
    
    return set(drawing[4:10].values)

In [74]:
# extract all sets of winning numbers
winners = data_649.apply(extract_numbers, axis=1)
winners.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [89]:
def check_historical_occurence(ticket, winners):
    """
    Determine the number of times ticket has won in past lotteries.
    
    Parameters
    ----------
    ticket : list of int
        Six unique integers, each ranging from 1 to 49.
    winners : pandas.Series
        Series of sets of six winning numbers.
    
    Returns
    -------
    int
        Number of times ticket has won in the past.
    """
    
    ticket = set(ticket)
    n_wins = (winners == ticket).sum()
    
    print('This ticket has won the big prize {} time(s) in the past.'
          .format(n_wins))
    
    future_prob = one_ticket_probability(ticket)
    
    return n_wins

In [90]:
check_historical_occurence([6, 22, 24, 31, 32, 34], winners)

This ticket has won the big prize 1 time(s) in the past.
The probability of winning the big prize with ticket {32, 34, 6, 22, 24, 31} is 0.0000072%.
This means you have a 1 in 13,983,816 chance of winning.


1

### Calculating Probabilities for Multiple Tickets

The app we're developing is geared towards lottery addicts. Most people who play the lottery often play multiple tickets per drawing, hoping to increase their odds of winning. We want to add a function to the app that will enable users to realize their chances of winning for any number of different tickets played at a time. This function takes an integer between 1 and 13,983,816 (the maximum number of unique tickets), indicating the number of tickets the user is playing at once, and returns the probability of the user winning the big prize with the given number of tickets. The specific numbers on the tickets are not relevant.

In [106]:
def multi_ticket_probability(n):
    """
    Calculate probability of winning big prize of 6/49 lottery game 
    for given number of tickets.
    
    Parameters
    ----------
    n : int
        Number of lottery tickets being played.
    
    Returns
    -------
    float
        Probability of one of tickets being a winner.
    """
    
    n_outcomes = combinations(49, 6)
    n_successes = n
    p_big_win = n_successes / n_outcomes
    
    print('The probability of winning the big prize with',
          '{} ticket(s) is {:.7f}%.'.format(n, p_big_win * 100))
    print('This means you have a {} in {:,} chance of winning.'
          .format(1, int(n_outcomes / n_successes)))
    
    return p_big_win

In [107]:
n_tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for n_tix in n_tickets:
    n_tix_prob = multi_ticket_probability(n_tix)

The probability of winning the big prize with 1 ticket(s) is 0.0000072%.
This means you have a 1 in 13,983,816 chance of winning.
The probability of winning the big prize with 10 ticket(s) is 0.0000715%.
This means you have a 1 in 1,398,381 chance of winning.
The probability of winning the big prize with 100 ticket(s) is 0.0007151%.
This means you have a 1 in 139,838 chance of winning.
The probability of winning the big prize with 10000 ticket(s) is 0.0715112%.
This means you have a 1 in 1,398 chance of winning.
The probability of winning the big prize with 1000000 ticket(s) is 7.1511238%.
This means you have a 1 in 13 chance of winning.
The probability of winning the big prize with 6991908 ticket(s) is 50.0000000%.
This means you have a 1 in 2 chance of winning.
The probability of winning the big prize with 13983816 ticket(s) is 100.0000000%.
This means you have a 1 in 1 chance of winning.


## Calculating the Probability of Smaller Winnings

The final app functionality we want to include is to allower users to calculate the probabilities of having some but not all of the winning numbers. Most 6/49 lotteries offer smaller prizes for having tickets with two, three, four, or five of the winning numbers, so it's likely our users will be interested in their chances of these smaller winnings.

Below, we define a function to calculate the probabilities of winning each of the smaller prizes, with between two and five winning numbers, for a given ticket. The function takes an integer representing the number of potential winning numbers and returns the probability of having that many winning numbers.

In [121]:
def probability_less_6(n):
    """
    Calculate probability of having n winning numbers of 6/49
    lottery game, where n is between 2 and 5.
    
    Parameters
    ----------
    n : int
        Number of potential winning numbers.
        
    Returns
    -------
    float
        Probability of having n winning numbers.
    """
    
    n_successes = combinations(6, n) * combinations(43, 6-n)
    n_outcomes = combinations(49, 6)
    p = n_successes / n_outcomes
    
    print('The probability of having {} winning numbers is {:.7f}%.'
          .format(n, p * 100))
    print('This means you have a {} in {:,} chance of winning.'
          .format(1, int(n_outcomes / n_successes)))

In [123]:
for i in [2, 3, 4, 5]:
    probability_less_6(i)

The probability of having 2 winning numbers is 13.2378029%.
This means you have a 1 in 7 chance of winning.
The probability of having 3 winning numbers is 1.7650404%.
This means you have a 1 in 56 chance of winning.
The probability of having 4 winning numbers is 0.0968620%.
This means you have a 1 in 1,032 chance of winning.
The probability of having 5 winning numbers is 0.0018450%.
This means you have a 1 in 54,200 chance of winning.


## Conclusion

We have now successfully created the logic behind the first version of our lottery probability app. There are many future improvements we could make in the second version, such as presenting probabilities in more understandable ways by comparing them to real world events and calculating the probabilities of having *at least*, instead of exactly, some multitude of winning numbers. We leave these steps for the future.