# Lottery Probabilities

People are often optimistic of their chances of winning the lottery, they form cognitive biases about numbers or events and predict based on factors that have no influence on the actual lottery numbers.  

In this project, we take a data based analysis on the probabilities of winning the lottery. Taking into consideration the following questions
 - What is the probability of winning the big prize with a single ticket?
 - What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
 - What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?
We will also have a look at the historical data of a Canadian lottery [Lotto 6/49](https://www.kaggle.com/datascienceai/lottery-dataset)

## Core functions

As we are dealing with probability calculations, there are some functions we will use frequently as they are beneficial to the analysis.

In [1]:
def factorial(n):
    factorial = 1
    for i in range(n, 0, -1):
        factorial *= i
    return factorial
def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k)*factorial(n-k)
    return numerator/denominator
#This calculates the number of possible set of numbers that can be gotten from a range.

## One Ticket Probability

In order to win the big lottery prize with one ticket, you have to pick 6 correct numbers from the range of 1 to 49. All numbers have to match but the order does not matter. Given this we can calculate the probablity of winning.

In [2]:
def one_ticket_probability(pickd_nos):
    n_combinations = combinations(49,6)
    probabilty = 1/n_combinations
    percentage_prob = probabilty * 100
    print('''The probability that a ticket with the numbers {} will win the lottery is {:.7f}%, as you have 1 in {:,} chances of winning.'''.format(pickd_nos, percentage_prob, int(n_combinations)))

In [3]:
#Test Inputs
print(one_ticket_probability([1,2,3,4,5,6]))

The probability that a ticket with the numbers [1, 2, 3, 4, 5, 6] will win the lottery is 0.0000072%, as you have 1 in 13,983,816 chances of winning.
None


In [4]:
print(one_ticket_probability([5,6,7,4,3,7]))

The probability that a ticket with the numbers [5, 6, 7, 4, 3, 7] will win the lottery is 0.0000072%, as you have 1 in 13,983,816 chances of winning.
None


In [5]:
## Historical Data

In [6]:
import pandas as pd

lottery_data = pd.read_csv("649.csv", parse_dates =["DRAW DATE"])
lottery_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null datetime64[ns]
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: datetime64[ns](1), int64(10)
memory usage: 315.0 KB


In [7]:
lottery_data.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,1982-06-12,3,11,12,14,41,43,13
1,649,2,0,1982-06-19,8,33,36,37,39,41,9
2,649,3,0,1982-06-26,1,6,23,24,27,39,34
3,649,4,0,1982-07-03,3,9,10,13,20,43,34
4,649,5,0,1982-07-10,5,14,21,31,34,47,45


In [8]:
lottery_data.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,2018-06-06,10,15,23,38,40,41,35
3661,649,3588,0,2018-06-09,19,25,31,36,46,47,26
3662,649,3589,0,2018-06-13,6,22,24,31,32,34,16
3663,649,3590,0,2018-06-16,2,15,21,31,38,49,8
3664,649,3591,0,2018-06-20,14,24,31,35,37,48,17


The dataset most importantly includes the 6 winning numbers drawn on various days. 

In [14]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_numbers = lottery_data.apply(extract_numbers, axis=1)
winning_numbers = winning_numbers["DRAW NUMBER"]
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
Name: DRAW NUMBER, dtype: object

In [17]:
def historical_occurence(user_numbers,winning_nos):
    user_numbers = set(user_numbers)
    check_occurrence = winning_nos ==user_numbers  
    n_occurrences = check_occurrence.sum()
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

In [19]:
#Test Inputs
historical_occurence([7,8,9,10,49,2], winning_numbers)

The combination {2, 7, 8, 9, 10, 49} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {2, 7, 8, 9, 10, 49} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-Ticket Probability

Many people try to get multiple tickets with the belief that their chances are significantly higher. As we have seen previously there are 13,983,816 possible unique combinations of numbers. Let's see the probability of winning with multiple lottery tickets

In [20]:
def multi_ticket_probability(n_tickets):
    
    n_combinations = combinations(49, 6)
    
    probability = n_tickets / n_combinations
    percentage_form = probability * 100
    
    if n_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(percentage_form, int(n_combinations)))
    
    else:
        combinations_simplified = round(n_combinations / n_tickets)   
        print('''Your chances to win the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_tickets, percentage_form,
                                                               combinations_simplified))

In [21]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') 

Your chances to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chances to win.
------------------------
Your chances to win the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chances to win.
------------------------
Your chances to win the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chances to win.
------------------------
Your chances to win the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chances to win.
------------------------
Your chances to win the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chances to win.
------------------------
Your chances to win the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chances to win.
------------------------
Your chances to win the big prize with 13,983,816 different ti

## Probability of getting less than 6 winning numbers

In most 6/49 lotteries, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. This means that players might be interested in finding out the probability of having two, three, four, or five winning numbers. Here we look at the probabilities of winning these prizes.

In [24]:
def probability_less_6 (n_winning_numbers):
    
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))

In [25]:
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------')

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances to win.
--------------------------
Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances to win.
--------------------------
Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances to win.
--------------------------
Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances to win.
--------------------------
