# Mobile app for lottery addiction - Probability in 6/49 lottery

In this project, we are going to create the logical core for a dedicated mobile app to help lottery addicts better estimate their chances of winning through calculating probabilities of winning a lottery. We will start with 6/49 lottery and reveal the answer below questions:

1. probability of winning the big prize with a single ticket
2. probability of winning the big prize if we play 40 different tickets (or any other number)
3. probability of having at least five (or four, or three, or two) winning numbers on a single ticket

Moreover, we will use the historical data coming from the national 6/49 lottery game in Canada to show how likely a given combination is ever drawn. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

In [1]:
import pandas as pd
import numpy as np
import re

In [2]:
# Read data
hist = pd.read_csv('649.csv')
hist.shape

(3665, 11)

In [3]:
hist.head(5)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


Noted that in the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize only if the six numbers on their tickets match all the six numbers drawn. Since the order does not matter, we would need to calculate the number of combinations instead of all possible outcomes.

In [4]:
# function for calculating factorial
def factorial(n):
    n_list = range(n)
    value = 1
    for i in n_list:
        value *= i+1
    return value

# function for calculating combinations
def combinations(n,k):
    return factorial(n)/factorial(n-k)/factorial(k)

## Q1. Probability of winning the big prize with a single ticket

To win the big prize with a single ticket implies we only have 1 combination to win among all possible combinations. Here is the information required from the users:

- six different numbers from 1 to 49.

In [5]:
def one_ticket_probability(user_numbers):
    n_combinations = combinations(49,6)
    print(
        '''
The probability to win the big prize with the numbers {} are {:.7%}.
In other words, the winning chance is 1 in {:,}.
        '''.format(
            user_numbers, 1/n_combinations, int(n_combinations)
        )
    )

In [6]:
one_ticket_probability([1,2,3,4,5,6])


The probability to win the big prize with the numbers [1, 2, 3, 4, 5, 6] are 0.0000072%.
In other words, the winning chance is 1 in 13,983,816.
        


In the historical data, the winning numbers are stored in the columns `NUMBER DRAWN 1` to `NUMBER DRAWN 6` separately. We have to select these numbers and compared them to the numbers picked by the user in order to show the number of existence in the history of their choice.

In [7]:
# select winning number and transform into set
def extract_numbers(row):
    winning_num = row[4:10]
    return set(winning_num)

In [8]:
winning_num_hist = hist.apply(extract_numbers, axis=1)
winning_num_hist.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [27]:
def check_historical_occurence(user_nums, winning_nums):
    user_nums_set = set(user_nums)
    n_win = (user_nums_set == winning_nums).sum()
    
    if n_win == 0:        
        print('''
The numbers {} has never occured in the history.
        '''.format(user_nums))
    else:
        print('''
The numbers {} has occured {} time(s) in the history.
        '''.format(user_nums, int(n_win)))

    n_combinations = combinations(49,6)        
    print('''
The chance of winning the big prize in the next drawing with above numbers is {:.7%}.
In other words, that is 1 in {}.
    '''.format(1/n_combinations, int(n_combinations/n_win)))

In [28]:
user_nums = [3,11,12,14,41,43]
check_historical_occurence(user_nums, winning_num_hist)


The numbers [3, 11, 12, 14, 41, 43] has occured 1 time(s) in the history.
        

The chance of winning the big prize in the next drawing with above numbers is 0.0000072%.
In other words, that is 1 in 13983816.
    


## Q2. probability of winning the big prize if we play 40 different tickets (or any other number)

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets. Here is the information required from the users:

- number of tickets they want to buy

In [29]:
def multi_ticket_probability(n_ticket):
    n_combinations = combinations(49,6)
    prob = int(n_combinations)/int(n_ticket)
    print('''
The probability to win the big prize with {:,} ticket(s) are {:.7%}.
In other words, the winning chance is 1 in {:,}.
        '''.format(
            int(n_ticket), n_ticket/n_combinations, int(prob))
    )

In [30]:
n_tickets = [1,10,100,10000,1000000, 6691908, 13983816]
for i in n_tickets:
    multi_ticket_probability(i)
    print('------')


The probability to win the big prize with 1 ticket(s) are 0.0000072%.
In other words, the winning chance is 1 in 13,983,816.
        
------

The probability to win the big prize with 10 ticket(s) are 0.0000715%.
In other words, the winning chance is 1 in 1,398,381.
        
------

The probability to win the big prize with 100 ticket(s) are 0.0007151%.
In other words, the winning chance is 1 in 139,838.
        
------

The probability to win the big prize with 10,000 ticket(s) are 0.0715112%.
In other words, the winning chance is 1 in 1,398.
        
------

The probability to win the big prize with 1,000,000 ticket(s) are 7.1511238%.
In other words, the winning chance is 1 in 13.
        
------

The probability to win the big prize with 6,691,908 ticket(s) are 47.8546628%.
In other words, the winning chance is 1 in 2.
        
------

The probability to win the big prize with 13,983,816 ticket(s) are 100.0000000%.
In other words, the winning chance is 1 in 1.
        
------


## Q3. probability of having at least five (or four, or three, or two) winning numbers on a single ticket

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers. Therefore we would also like to provide this information by letting the users choose among 2 to 5 winning numbers. Here are the information required from the users.

- six different numbers from 1 to 49; and
- an integer between 2 and 5 that represents the number of winning numbers expected

In [37]:
def probability_less_6(n_winning_num):
    n_comb = combinations(6,n_winning_num)*combinations(49-6, 6-n_winning_num)
    prob = n_comb/combinations(49,6)
    print('''
The probability to have {} winning numbers on a single ticket(s) are {:.5%}.
In other words, the chance is 1 in {:,}. 
    '''.format(int(n_winning_num), prob, int(1/prob)))

In [38]:
for i in [2,3,4,5]:
    probability_less_6(i)
    print('------')


The probability to have 2 winning numbers on a single ticket(s) are 13.23780%.
In other words, the chance is 1 in 7. 
    
------

The probability to have 3 winning numbers on a single ticket(s) are 1.76504%.
In other words, the chance is 1 in 56. 
    
------

The probability to have 4 winning numbers on a single ticket(s) are 0.09686%.
In other words, the chance is 1 in 1,032. 
    
------

The probability to have 5 winning numbers on a single ticket(s) are 0.00184%.
In other words, the chance is 1 in 54,200. 
    
------
