In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Mobile app to help lottery addicts to better estimate their chances of winning

<p>Hypothetically, there is a medical institute that aims to precent and treat gambling addictions. I wants to build a dedicated mobile app to help lottery addicts to better estimate their chances of winning. We will focus on the 6/49 lottery.</p>
<p>The app should be able to answer the following questions:
<ul>
<li> What is the probability of winning the big prize with a single ticket?</li>
<li>What is the probability of winning the big prize if we play 40 different tickets (or any other number)?</li>
<li>What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?</li>
</ul>
<p>We look at historical data coming from the national 6/49 lottery game in Canada from 1982 to 2018. <a href="https://www.kaggle.com/datascienceai/lottery-dataset">The data can be found here</a>.</p>
<p>We will need two functions: a function for factorials and a functions for combination:</p>

In [2]:
def factorial(n):
    a = 1
    while n>1:
        a = n*a
        n = n-1
    return a

def combinations(n,k):
    b= (factorial(k)*factorial(n-k))
    return factorial(n)/b
    

Next, we write a function that could be used in the app. It takes a list of six unique numbers and prints the probability of winning in a way that's easy to understand.

In [3]:
def one_ticket_probability(number):
    num_possibility = combinations(49,6)
    prob = 1/num_possibility
    print('The numbers {} have a change of {}% of winning. There are {} possibilities of winning the prize.'
          .format(number, prob*100,num_possibility))

In [4]:
one_ticket_probability([1,2,3,4,5,6])

The numbers [1, 2, 3, 4, 5, 6] have a change of 7.151123842018516e-06% of winning. There are 13983816.0 possibilities of winning the prize.


<p>We tested random numbers. Note that the input numbers are not used in the function as any 6 number sequence yields the same probablity.</p>
<p>Now we will explore the historical data from the Canada 6/49 lottery.</p>

In [5]:
lottery =pd.read_csv('649.csv')
lottery.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [6]:
lottery.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [7]:
lottery.describe()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
count,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0,3665.0
mean,649.0,1819.494952,0.030832,7.327694,14.568076,21.890859,28.978445,36.162619,43.099045,24.599454
std,0.0,1039.239544,0.237984,5.811669,7.556939,8.170073,8.069724,7.19096,5.506424,14.360038
min,649.0,1.0,0.0,1.0,2.0,3.0,4.0,11.0,13.0,0.0
25%,649.0,917.0,0.0,3.0,9.0,16.0,23.0,31.0,40.0,12.0
50%,649.0,1833.0,0.0,6.0,14.0,22.0,30.0,37.0,45.0,25.0
75%,649.0,2749.0,0.0,10.0,20.0,28.0,35.0,42.0,47.0,37.0
max,649.0,3591.0,3.0,38.0,43.0,45.0,47.0,48.0,49.0,49.0


In [8]:
lottery.isnull().sum()

PRODUCT            0
DRAW NUMBER        0
SEQUENCE NUMBER    0
DRAW DATE          0
NUMBER DRAWN 1     0
NUMBER DRAWN 2     0
NUMBER DRAWN 3     0
NUMBER DRAWN 4     0
NUMBER DRAWN 5     0
NUMBER DRAWN 6     0
BONUS NUMBER       0
dtype: int64

<p>The data set contains the data the drawing has taken place as well as all the numbers that have been drawn (including the bonus number).</p>
<p>We want to write an app that can do the following:</p>
<ul>
<li> inside the app, the user inputs six different number from 1 to 49</li>
<li> the app yields the number of times the combination selected occurred in the Canada data set, and </li>
<li> the probability of winning the big prize in the next drawing with that combination</li>
</ul>
<p>First, we extract all winning sex number from the histoical data set as Python set:

In [9]:
def extract_numbers(row_df):
    row_df = row_df[4:10]
    return set(row_df.values)

winners = lottery.apply(extract_numbers, axis = 1)
winners.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Next, we will write a function to check the historical occurrence:

In [10]:
def check_historical_occurrence(user_numbers,winning_numbers):
    user_numbers = set(user_numbers)
    match = user_numbers == winning_numbers
    print('The combination {} occurred {} times in the past.'.format(user_numbers, match.sum()))
    

In [11]:
check_historical_occurrence([1,2,3,4,5,6],winners)

The combination {1, 2, 3, 4, 5, 6} occurred 0 times in the past.


## Multi-ticket probability

We want to write a function that returns the probability of winning when playing with multiple tickets. The user inputs the number of different tickets and gets the probability of winning the big prize:

In [12]:
def multi_ticket_probability(number_tickets):
    number_poss = combinations(49,6)
    prob = number_tickets/number_poss
    print('The probability to win the big prize with {} tickets is {:.6f}%.'.format(number_tickets, prob*100))

In [13]:
multi_ticket_probability(100)

The probability to win the big prize with 100 tickets is 0.000715%.


In [14]:
tests = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for test in tests:
    multi_ticket_probability(test)

The probability to win the big prize with 1 tickets is 0.000007%.
The probability to win the big prize with 10 tickets is 0.000072%.
The probability to win the big prize with 100 tickets is 0.000715%.
The probability to win the big prize with 10000 tickets is 0.071511%.
The probability to win the big prize with 1000000 tickets is 7.151124%.
The probability to win the big prize with 6991908 tickets is 50.000000%.
The probability to win the big prize with 13983816 tickets is 100.000000%.


## Probability of less number match
<p>There is a smaller prize to be won if only a few number match. We want to write a function, which has the arguments</p>
<ul>
<li> six different number from 1 to 29</li>
<li> an integer between 2 and 5 that represents the number of winnings numbers expected</li>
</ul>
<p>The function should print the probability of having the inpuuted number of winning numbers</p>

In [17]:
def probability_less_6(x):
    x_combis = combinations(6,x)
    leftover_combis = combinations(43, 6-x)
    success = x_combis*leftover_combis
    combis_total = combinations(49,6)

    prob = success/combis_total*100
    print('The probability of having {} winning numbers is {:.6f}%'.format(x, prob))

In [18]:
for test in [2,3,4,5]:
    probability_less_6(test)

The probability of having 2 winning numbers is 13.237803%
The probability of having 3 winning numbers is 1.765040%
The probability of having 4 winning numbers is 0.096862%
The probability of having 5 winning numbers is 0.001845%
