This project will be about hypothetical 6/49 lottery app, that would be interested in the theoretical probabilities of different lottery options that would help users make rational choices during the game. Historical data coming from national 6/49 lottery in Canada will also be considered and assessed as to how close the results of empirical data is to theoretical one. 

In [1]:
from math import factorial 
def combinations(n, k):
    return factorial(n)/(factorial(k)*factorial(n-k))

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {8, 40, 15, 27, 19, 29}, he only wins the big prize if the numbers drawn are {8, 40, 15, 27, 19, 29}. If only one number differs, he doesn't win.

Let's write a function that will take a single input of given lottery and calculate probability of it winning

In [2]:
def one_ticket_probability(jok):
    total_number = combinations(46,6)
    success = 1/total_number
    print('The probability of winning for this particular ticket is {:.5%}'.format(success))

In [3]:
one_ticket_probability([4.2,5,6,7,4])

The probability of winning for this particular ticket is 0.00001%


Let's now take a look at the historical data from Canadian lottery.

In [6]:
import pandas as pd
lottery_canada = pd.read_csv('649.csv')
lottery_canada.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [7]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_numbers = lottery_canada.apply(extract_numbers, axis=1)
winning_numbers.head()



0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [15]:
def check_historical_occurrence(user_numbers, historical_numbers):   
    '''
    user_numbers: a Python list
    historical numbers: a pandas Series
    '''
    
    user_numbers_set = set(user_numbers)
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    
    if n_occurrences == 0:
        print('''The combination {} has never occured.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

In [16]:
test_input_3 = [25, 36, 2, 39, 8, 41]
check_historical_occurrence(test_input_3, winning_numbers)

The combination [25, 36, 2, 39, 8, 41] has never occured.


In [17]:
def multi_ticket_probability(n_tickets):
    
    n_combinations = combinations(49, 6)
    
    probability = n_tickets / n_combinations
    percentage_form = probability * 100
    
    if n_tickets == 1:
        print('''The probability of winning prize with one ticket is {:.6f}%.
'''.format(percentage_form, int(n_combinations)))
    
    else:
        combinations_simplified = round(n_combinations / n_tickets)   
        print('''The probability of winning prize with {:,} different tickets are {:.6f}%.
.'''.format(n_tickets, percentage_form,combinations_simplified))

In [18]:
multi_ticket_probability(3)

The probability of winning prize with 3 different tickets are 0.000021%.
.
