# Probabilities to win the lottery
In this project I will write different function to calculate the probability of winning in the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49). The project is devided in four main parts:
* calculate the probability of winning the big prize with a single ticket
* check whether a certain combination has occurred in the Canada lottery data set
* calculate the probability for any number of of tickets between 1 and 13,983,816
* calculate the probability of having two, three, four or five winning numbers

## Core Functions
Throughout the project, I will need to calculate repeatedly probabilities and combinations. Therefore I will create two functions:

* A function that calculates factorials
* A function that calculates combinations.

To calculate factorials, the formula is:

$$ n!=n\times(n-1)\times(n-2)\times...\times2\times1$$

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

To find the number of combinations when sampling without replacement and taking only k objects from a group of n objects, following formula is used:
$$ _nC_k=\binom{n}{k}=\frac{n!}{k!(n-k)!}$$

In [1]:
# factorial function
def factorial(n):
    end_result = 1
    for i in range(n,0,-1):
        end_result *= i
    return end_result

# cominations function
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator / denominator

## One-ticket Probability
In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, he doesn't win.

In [2]:
# function which takes a list of six unique numbers and prints the probability of winning in an easy way
def one_ticket_probability(user_numbers):
    total_outcomes = combinations(49, 6)
    probability = 1 / total_outcomes
    result = probability * 100
    print('''Your chances to win the big prize with the numbers {} are {:.7f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(user_numbers,
                    result, int(total_outcomes)))

In [3]:
one_ticket_probability([13, 22, 24, 27, 42, 44])

Your chances to win the big prize with the numbers [13, 22, 24, 27, 42, 44] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Historical Data Check for Canada Lottery

The next step is to check the lottery ticket against the historical lottery data in Canada to determine whether that number combination ever won before. 
 The data set can be downloaded from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset) and contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

* NUMBER DRAWN 1
* NUMBER DRAWN 2
* NUMBER DRAWN 3
* NUMBER DRAWN 4
* NUMBER DRAWN 5
* NUMBER DRAWN 6

In [4]:
import pandas as pd

lottery = pd.read_csv("649.csv")

In [5]:
print(lottery.shape)
lottery.head()

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [6]:
lottery.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Function for Historical Data Check
A function that can help determine whether the number combination would have ever won by now using a certain combination of six numbers. 

First the singel numbers in the dataset need to be combined to get a set of six numbers. The extract_numbers() function will go over each row of the dataframe and extract the six winning numbers as a Python set.

In [7]:
# function that compares ticket against historical lottery data
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

lottery["winning_numbers"] = lottery.apply(extract_numbers, axis = 1)
lottery.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER,winning_numbers
0,649,1,0,6/12/1982,3,11,12,14,41,43,13,"{3, 41, 11, 12, 43, 14}"
1,649,2,0,6/19/1982,8,33,36,37,39,41,9,"{33, 36, 37, 39, 8, 41}"
2,649,3,0,6/26/1982,1,6,23,24,27,39,34,"{1, 6, 39, 23, 24, 27}"


Below the check_historical_occurrence() function takes in the user numbers and the historical numbers and prints information with respect to the number of occurrences and the probability of winning in the next drawing.

In [8]:
# function to check historical occurences
def check_historical_occurrence(user_numbers, historical_numbers = lottery["winning_numbers"]):   
    user_numbers_set = set(user_numbers)
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

#### function testing:

In [9]:
check_historical_occurrence([33, 31, 37, 39, 5, 49])

The combination [33, 31, 37, 39, 5, 49] has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination [33, 31, 37, 39, 5, 49] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [10]:
check_historical_occurrence([3, 11, 12, 14, 41, 43])

The number of times combination [3, 11, 12, 14, 41, 43] has occured in the past is 1.
Your chances to win the big prize in the next drawing using the combination [3, 11, 12, 14, 41, 43] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-ticket Probability
A lot of people play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. The next function is to help better estimate the chances of winning for any number of different tickets.

The `multi_ticket_probability()` function below takes in the number of tickets and prints probability information depending on the input.

In [11]:
def multi_ticket_probability(number_tickets):
    total_outcomes = combinations(49, 6)
    probability = number_tickets / total_outcomes
    probability = probability * 100
    if number_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(probability, int(total_outcomes)))
    
    else:
        combinations_simplified = round(total_outcomes / number_tickets)   
        print('''Your chances to win the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(number_tickets, probability,
                                                               combinations_simplified))

#### some test runs for the function

In [12]:
for i in [1,10,100,10000, 1000000, 6991908, 13983816]:
    multi_ticket_probability(i)
    print("")

Your chances to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chances to win.

Your chances to win the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chances to win.

Your chances to win the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chances to win.

Your chances to win the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chances to win.

Your chances to win the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chances to win.

Your chances to win the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chances to win.

Your chances to win the big prize with 13,983,816 different tickets are 100.000000%.
In other words, you have a 1 in 1 chances to win.



## Less Winning Numbers â€” Function
In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. Therefore it may be interesting in knowing the probability of having two, three, four, or five winning numbers.

In [13]:
# function takes integer between 2 and 5
def probability_less_6(n_winning_numbers):
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(49 - n_winning_numbers, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    n_combinations_total = combinations(49, 6)
    
    probability = successful_outcomes / n_combinations_total
    probability_percentage = probability * 100
    
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))

#### Testing the function

In [14]:
for i in [2, 3, 4, 5]:
    probability_less_6(i)
    print("")

Your chances of having 2 winning numbers with this ticket are 19.132653%.
In other words, you have a 1 in 5 chances to win.

Your chances of having 3 winning numbers with this ticket are 2.171081%.
In other words, you have a 1 in 46 chances to win.

Your chances of having 4 winning numbers with this ticket are 0.106194%.
In other words, you have a 1 in 942 chances to win.

Your chances of having 5 winning numbers with this ticket are 0.001888%.
In other words, you have a 1 in 52,969 chances to win.

