# Introduction

As is well known in social science literature, most people who play the lottery do so for fun and without major problems; however, for some this activity turns into a habit which eventually escalates into addiction. A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning.

The purpose of this project is to assist in the development of this app by answering questions such as:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The data set is historical data from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018.


# Core Functions

In [6]:
# write two important functions that we'll use later

def factorial(n):
    if n == 1: return 1
    return n * factorial(n-1)

def number_of_combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

# Probability of winning with one ticket

For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers. The function below calculates the probability of winning the big prize for any given ticket and writes a snappy message. 

In [36]:
def one_ticket_probability(ticket):
    num_combinations = number_of_combinations(49,len(ticket))
    probability = 1 / num_combinations * 100
    message = '''Your chances to win the big prize with the numbers {} are {:.7f}%.\nIn other words, you have a 1 in {:,} chance to win.\n'''.format(ticket,
                    probability, int(num_combinations))
    print(message)

In [37]:
# test two tickets -- of course, they should report out the same probability!

one_ticket_probability([2,5,7,9,6,8])
one_ticket_probability([2,5,4,9,6,32])

Your chances to win the big prize with the numbers [2, 5, 7, 9, 6, 8] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.

Your chances to win the big prize with the numbers [2, 5, 4, 9, 6, 32] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win.



# Analysis of Historical Data

The engineering team tells us that we need to write a function that can help users determine whether they would have ever won by now using a certain combination of six numbers.

In [38]:
import pandas as pd

lotto = pd.read_csv("649.csv")

In [39]:
lotto.shape

(3665, 11)

In [40]:
lotto.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [50]:
# this function extracts the winning numbers ("Number Drawn 1, Number Drawn 2, etc.") for a row as a set

def extract_numbers(row):
    return set(row.iloc[4:10])

In [54]:
# make a pd.Series of the winning numbers for each row in the dataset.

winners = lotto.apply(extract_numbers, axis=1)
winners.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [66]:
winners.value_counts().sort_values().unique()

array([1], dtype=int64)

In [60]:
# this function checks how many times a user's inputted ticket number won historically and outputs a snappy message.

def check_historical_occurence(user_nums, historical_nums=winners):
    user_nums = set(user_nums)
    matches = historical_nums == user_nums
    num_matches = matches.sum()
    
    if num_matches == 0:
        print('''Ticket {} has never occured in the past!'''.format(user_nums))
        
    elif num_matches == 1:
        print('''Ticket {} has occured in the past {} time!'''.format(user_nums, num_matches))

    else:
        print('''Ticket {} has occured in the past {} times!'''.format(user_nums, num_matches))


In [61]:
check_historical_occurence([3, 41, 11, 12, 43, 14], winners)
check_historical_occurence([3, 41, 11, 12, 43, 15], winners)

Ticket {3, 41, 11, 12, 43, 14} has occured in the past 1 time!
Ticket {3, 41, 11, 12, 43, 15} has never occured in the past!


# Probability of winning with multiple tickets

This part outputs the probability of winning if a player buys multiple different tickets.

In [91]:
# this is just equal to the number of tickets purchased divided by the total possible combinations. The function below also returns a snazzy message.

def multi_ticket_probability(num_tickets):
    combos = number_of_combinations(49,6)
    probability_percent = num_tickets / combos * 100
    
    if num_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%.\nIn other words, you have a 1 in {:,.0f} chance to win!'''.format(probability_percent, combos))
        
    else:
        combos_simplified = round(combos / num_tickets) 
        print('''Your chances to win the big prize with {:,} tickets are {:.6f}%.\nIn other words, you have a 1 in {:,.0f} chance to win!'''.format(num_tickets,probability_percent,combos_simplified))

In [92]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('--------------------------------------------------------------------')

Your chances to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chance to win!
--------------------------------------------------------------------
Your chances to win the big prize with 10 tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chance to win!
--------------------------------------------------------------------
Your chances to win the big prize with 100 tickets are 0.000715%.
In other words, you have a 1 in 139,838 chance to win!
--------------------------------------------------------------------
Your chances to win the big prize with 10,000 tickets are 0.071511%.
In other words, you have a 1 in 1,398 chance to win!
--------------------------------------------------------------------
Your chances to win the big prize with 1,000,000 tickets are 7.151124%.
In other words, you have a 1 in 14 chance to win!
--------------------------------------------------------------------
Your chances to win the big prize with 6,991,

# Probability of matching fewer than 6 of the numbers


This section outputs the probability of having two, three, four, or five winning numbers. The user inputs an integer between 2 and 5 that represents the number of winning numbers expected and the ticket s/he is thinking of buying (which is, of course, irrelevant for the ultimate calculation).

In [113]:
def probability_less_6(n, ticket=[1,2,3,4,5,6]):
    n_combinations_ticket = number_of_combinations(6, n)
    n_combinations_remaining = number_of_combinations(49 - n, 6 - n)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    n_combinations_total = number_of_combinations(49, 6)
    
    probability = successful_outcomes / n_combinations_total
    probability_percentage = probability * 100
    
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.\nIn other words, you have a 1 in {:,} chance to win.'''.format(n, probability_percentage,int(combinations_simplified)))

In [114]:
test_inputs = list(range(2,6))

for test_input in test_inputs:
    probability_less_6(test_input)
    print('-------------------------------------------------------------------------')

Your chances of having 2 winning numbers with this ticket are 19.132653%.
In other words, you have a 1 in 5 chance to win.
-------------------------------------------------------------------------
Your chances of having 3 winning numbers with this ticket are 2.171081%.
In other words, you have a 1 in 46 chance to win.
-------------------------------------------------------------------------
Your chances of having 4 winning numbers with this ticket are 0.106194%.
In other words, you have a 1 in 942 chance to win.
-------------------------------------------------------------------------
Your chances of having 5 winning numbers with this ticket are 0.001888%.
In other words, you have a 1 in 52,969 chance to win.
-------------------------------------------------------------------------
