<h1 style="text-align:center;">The Probability of Winning 6/49 Lottery in Canada</h1>

This project is to build an algorithm to estimate the probabilites of winning lottery by using the national 6/49 lottery game [dataset](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) in Canada from 1982 to 2018.
In the 6/49 Lottery, 6 numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

The project will fulfill two main purposes:
1. calculate the probability of winning the big prize(6 numbers winning for one or multiple ticket(s) on a single drawing)
2. calculate the probability of winning the smaller prize(2,3,4,or 5 numbers winning for one or multiple ticket(s) on a single drawing)

## Introduction and Historical Data

In [1]:
import pandas as pd

In [26]:
lottery_data=pd.read_csv("649.csv")

In [3]:
print(lottery_data.shape)

(3665, 11)


In [4]:
lottery_data['SEQUENCE NUMBER'].value_counts()

0    3591
1      45
2      19
3      10
Name: SEQUENCE NUMBER, dtype: int64

In [5]:
lottery_data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [6]:
lottery_data.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Core functions
To find the probability of winning lottery, we need to build a basic function:
- A function to simulate the number of combinations drawn from the lottery sets

In [8]:
#the function to calculate the factorial
def factorial(n):
    f=1
    for i in range(n,1,-1):
        f *= i
    return f

In [9]:
# the function to calculate the number of combination drawn from the lottery sets

def combination(n,k):# n is the total number of lottery number set, k is the number of numbers drawn ou.
    return factorial(n)/(factorial(k) *factorial(n-k))

In [10]:
win=1/combination(49,6)

In [11]:
win

7.151123842018516e-08

In [12]:
factorial(3)

6

## One-ticket Probability for the Big Prize
For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

In [21]:
def one_ticket_probability(array):# array is a list of 6 unique numbers
    total_combinations=combination(49,6)
    p_oneticket_win=1/total_combinations
    print("Your ticket '{}' chance to win the lottery is 1 in {:,.0f}".format(array,total_combinations))

In [22]:
#testing the function
array=[13,22,24,27,42,44]
one_ticket_probability(array)

Your ticket '[13, 22, 24, 27, 42, 44]' chance to win the lottery is 1 in 13,983,816


## Compare the Ticket against the Historical Data

After we have the funcation to calculate the probability of winning the big prize for any given ticket, we also want the users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.
To do this, we need to build two functions:
- A function to extract the each draw from the historical data
- A fucntion to compare the user's inputs(tickets) against the historical draws

In [28]:
#define a function to  take as input a row of the lottery dataframe and return a set containing all the six winning numbers
def extract_numbers(row):
    row=row[4:10]
    row=set(row.values)
    return row

In [29]:
hist_draw=lottery_data.apply(extract_numbers,axis=1)

In [30]:
hist_draw

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3         {3, 9, 10, 43, 13, 20}
4        {34, 5, 14, 47, 21, 31}
5        {8, 41, 20, 21, 25, 31}
6       {33, 36, 42, 18, 25, 28}
7        {7, 40, 16, 17, 48, 31}
8        {5, 38, 37, 10, 23, 27}
9        {4, 37, 46, 15, 48, 30}
10        {33, 38, 7, 9, 42, 21}
11      {36, 11, 43, 17, 19, 20}
12       {37, 7, 14, 47, 17, 20}
13      {35, 44, 25, 28, 29, 30}
14       {36, 39, 8, 41, 47, 18}
15       {9, 12, 13, 14, 44, 48}
16       {4, 40, 43, 44, 14, 18}
17      {34, 35, 36, 13, 16, 18}
18      {36, 11, 23, 25, 28, 29}
19       {37, 7, 45, 18, 23, 25}
20      {37, 11, 45, 18, 19, 31}
21       {8, 14, 16, 48, 18, 31}
22       {4, 11, 45, 23, 24, 25}
23        {33, 34, 3, 4, 48, 19}
24       {5, 43, 17, 21, 28, 30}
25       {36, 6, 38, 46, 17, 24}
26        {4, 9, 10, 11, 43, 46}
27       {32, 33, 7, 13, 45, 23}
28      {35, 37, 11, 18, 22, 28}
29      {35, 45, 48, 25, 26, 31}
          

In [35]:
# a function to calculate the number of times that the input(s) match(es) the historical draws

def check_historical_occurence(user_inputs):
    user_inputs_set=set(user_inputs)
    winning_numbers=lottery_data.apply(extract_numbers,axis=1)
    occurence=(user_inputs_set==winning_numbers).sum()  # the number of matches with the historical draws
    
    
    print("Your ticket {} occured {} time(s) in the historical winning drawings. Your chance to win the big price in the next drawing is 1 in 13,983,816".format(user_inputs,occurence))

In [37]:
check_historical_occurence([35,37,14,48,24,31])

Your ticket [35, 37, 14, 48, 24, 31] occured 1 time(s) in the historical winning drawings. Your chance to win the big price in the next drawing is 1 in 13,983,816


## Multi-ticket Probability for the Big Prize
Lottery players usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning on a single drawing.

We will write a function to calculate the probability of winning the big prize depending on the number of different tickets on a single drawing.


In [39]:
def multi_ticket_probability(n_tickets):
    n_combination=combination(49,6)
    prob= n_tickets/n_combination
    percentage=prob*100
    combination_sp=n_combination/n_tickets
    print(" Your chance to win the big prize with {:,} different tickets on a single drawing are {:.6f}%,In other words, have a 1 in {:,} chances to win".format(n_tickets,percentage,combination_sp))
    

In [40]:
n_tickets=100
multi_ticket_probability(n_tickets)

 Your chance to win the big prize with 100 different tickets on a single drawing are 0.000715%,In other words, have a 1 in 139,838.16 chances to win


## Less Winning Numbers
In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.
We will build a function probability_less_numbers() to calculate the probability of winning the price with two, three, four, or five numbers. The function takes the number of winning numbers as input and prints out the probability of winning and corresponding message.

In [56]:
def probability_less_numbers(n,n_tickets): # n is the number of winning numbers: 2,3,4,or 5; n_tickets is the number of tickets on a single drawing
    combination_winning_numbers=combination(6,n)
    combination_remaining=combination(43,6-n)
    combination_less_numbers=combination_winning_numbers * combination_remaining
    
    combination_total=combination(49,6)
    
    prob=(combination_less_numbers/combination_total)*n_tickets
    
    percentage=prob *100
    
    if percentage > 100:
        percentage=100
    
    chance_n=round(combination_total/(combination_less_numbers*n_tickets))
    
    print(" You chance of winning with {} winning numbers on this ticket : {:.6f}%. In other words, you have 1 in {:,} chance to win.".
          format(n,percentage, chance_n))

In [58]:
#testing the function
probability_less_numbers(2,8)

 You chance of winning with 2 winning numbers on this ticket : 100.000000%. In other words, you have 1 in 1 chance to win.
