## Dataset and Problem Introduction

In this analysis, we explore calculating probabilities on an app that is aimed to prevent and treat lottery addiction by helping people better estimate their chances of winning.

Data Source: https://www.kaggle.com/datascienceai/lottery-dataset
<br>Reference: https://dataquest.io/

## Data

In [1]:
import pandas as pd

lottery_canada = pd.read_csv('datasets/649.csv')
lottery_canada.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


## Core Functions
- `factorial()` — a function that calculates factorials
- `combinations()` — a function that calculates combinations

In [2]:
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

## One-ticket Probability Function
A function that calculates the probability of winning the big prize for any given ticket. For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers.

In [3]:
def one_ticket_probability(user_numbers):
    
    n_combinations = combinations(49, 6)
    probability_one_ticket = 1/n_combinations
    percentage_form = probability_one_ticket * 100
    
    print('''Your chances to win the big prize with the numbers {} are {:.7f}%.
In other words, you have a 1 in {:,} chance to win.'''.format(user_numbers,
                    percentage_form, int(n_combinations)))
    
#Test function 1
print('Test Function 1:')
test_input_1 = [2, 43, 22, 23, 11, 5]
one_ticket_probability(test_input_1)

print()

#Test function 2
print('Test Function 2:')
test_input_2 = [9, 26, 42, 7, 15, 6]
one_ticket_probability(test_input_2)

Test Function 1:
Your chances to win the big prize with the numbers [2, 43, 22, 23, 11, 5] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.

Test Function 2:
Your chances to win the big prize with the numbers [9, 26, 42, 7, 15, 6] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.


## Historical Data Check Function
A function that helps users determine whether they would have ever won by now using a certain combination of six numbers.

In [4]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_numbers = lottery_canada.apply(extract_numbers, axis=1)
winning_numbers.head()

def check_historical_occurrence(user_numbers, historical_numbers):   
    '''
    user_numbers: a Python list
    historical numbers: a pandas Series
    '''
    
    user_numbers_set = set(user_numbers)
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chance to win the big prize in the next draw using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next draw using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

#Test function 1 
print('Test Function 1:')
test_input_3 = [33, 36, 37, 39, 8, 41]
check_historical_occurrence(test_input_3, winning_numbers)

print()

#Test function 2
print('Test Function 2:')
test_input_4 = [3, 2, 44, 22, 1, 44]
check_historical_occurrence(test_input_4, winning_numbers)

Test Function 1:
The number of times combination [33, 36, 37, 39, 8, 41] has occured in the past is 1.
Your chances to win the big prize in the next draw using the combination [33, 36, 37, 39, 8, 41] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.

Test Function 2:
The combination [3, 2, 44, 22, 1, 44] has never occured.
This doesn't mean it's more likely to occur now. Your chance to win the big prize in the next draw using the combination [3, 2, 44, 22, 1, 44] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.


## Multi-ticket Probability Function

In [5]:
def multi_ticket_probability(n_tickets):
    
    n_combinations = combinations(49, 6)
    
    probability = n_tickets / n_combinations
    percentage_form = probability * 100
    
    if n_tickets == 1:
        print('''Your chance to win the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chance to win.'''.format(percentage_form, int(n_combinations)))
    
    else:
        combinations_simplified = round(n_combinations / n_tickets)   
        print('''Your chance to win the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chance to win.'''.format(n_tickets, percentage_form,
                                                               combinations_simplified))


#Test function
print('Test Function:\n')        
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter

Test Function:

Your chance to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chance to win.
------------------------
Your chance to win the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chance to win.
------------------------
Your chance to win the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chance to win.
------------------------
Your chance to win the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chance to win.
------------------------
Your chance to win the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chance to win.
------------------------
Your chance to win the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chance to win.
------------------------
Your chance to win the big prize with 13,983,816 different

## Less Winning Numbers Function

In most 6/49 lotteries, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. This means that players might be interested in finding out the probability of having two, three, four, or five winning numbers.

This function calculates the probability that a player's ticket matches exactly the given number of winning numbers. If the player wants to find out the probability of having five winning numbers, the function will return the probability of having five winning numbers exactly (no more and no less). The function will not return the probability of having _at least_ five winning numbers.

In [6]:
def probability_less_6(n_winning_numbers):
    
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)    
    print('''Your chance of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chance to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))

#Test function 1 
print('Test Function:\n') 
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------') # output delimiter

Test Function:

Your chance of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chance to win.
--------------------------
Your chance of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chance to win.
--------------------------
Your chance of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chance to win.
--------------------------
Your chance of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chance to win.
--------------------------


## Summary
We have defined functions that help users:
- Find the probability of winning the big prize for one ticket and any number of tickets.
- Find the probability of having 2,3,4 or 5 winning numbers.
- Check whether their number combination has ever won the lottery.