## Estimate the chance of winning the 6/49 lottery ##

A medical institute wants to provide a mobile app for lottery addicts to help estimate their chances of winning the lottery. Using the past data of 3,665 drawings from 1982 - 2018, we will build an app that would answer the following questions:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

In [1]:
import pandas as pd
import numpy as np

def factorial(n):
    facn = 1
    for i in range(0,n):
        facn *= (n-i)
    return facn

def combinations(n,k):
    facn = factorial(n)
    facnk = factorial(n-k)
    fack = factorial(k)
    d= fack*facnk
    return (facn/d)

def one_ticket_probability(l):
    outcomes = combinations(49,6)
    successful = 1
    prob = successful/outcomes
    print("The possibility of winning the lottery is {0} %".format(prob*100))

lottery =[]

for i in range(0,6):
    ele = int(input('Enter number:'))
    lottery.append(ele)

one_ticket_probability(lottery)

Enter number:2
Enter number:3
Enter number:12
Enter number:13
Enter number:45
Enter number:35
The possibility of winning the lottery is 7.151123842018516e-06 %


## Getting probability ##

In the lotto 6/49 system, user draws 6 numbers between 0-49. When the numbers are drawn to determine the winners, it is not put back in so it is *sampled without replacement*. So when we calculate the total combinations, we use the formula for sampling without replacement.
>To get the probablity of winning the lottery, we divide the no: of successful outcomes with total outcomes (total combinations). Since a user picks one combination of numbers, the number of successful outcomes is 1.

## Comparing Against Historical Lotto data ##

### Step 1: Reading dataset into pandas dataframe ###

In [2]:
lotto = pd.read_csv('649.csv')
lotto.shape

(3665, 11)

In [3]:
lotto.head(5)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


### Step 2: Extracting drawings as a pandas dataset ###

** CHECKING PROBABILITY OF A SINGLE TICKET: **

In function extract_numbers(), we extract columns "Number Drawn .." into a separate pandas dataset.
Then we check the value of input against historical occurences in chech_historical_occurence()

In [4]:
def extract_numbers(row):
    return set(row)

lotto6 = lotto[['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4'
,'NUMBER DRAWN 5', 'NUMBER DRAWN 6']]
winning_sets = lotto6.apply(extract_numbers, axis =1)

    

In [5]:
def check_historical_occurence(li, sets):
    li = set(li)
    return (sets==li)
match = check_historical_occurence(lottery,winning_sets)

print('Your combination has occured {} time(s) in the Canadian lotto history. The chances of you winning the lottery is 1 in 13,983,816'.format(sum(match)))


Your combination has occured 0 time(s) in the Canadian lotto history. The chances of you winning the lottery is 1 in 13,983,816


** CHECKING PROBABILITY OF MULTIPLE TICKETS: **

Since most lottery addcits buy multiple tickets, we check the probability of of multiple tickets winning the lottery. Here successful outcomes depends on the number of tickets bought by the customer, and total outcomes can be calculated by calling the combinations functions.

Remember 6 tickets are drawn and the total numbers picked is upto 49.

In [6]:
def multi_ticket_probability(n_tickets):
    t_o = combinations(49,6)
    s_o = n_tickets
    probability = s_o/t_o
    print("The probability of winning the lottery with your tickets is {}% OR 1 in {} chances".format(probability*100, int(t_o/s_o)))
    print('--------------------------------------------------------------')

test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for val in test:
    multi_ticket_probability(val)


The probability of winning the lottery with your tickets is 7.151123842018516e-06% OR 1 in 13983816 chances
--------------------------------------------------------------
The probability of winning the lottery with your tickets is 7.151123842018517e-05% OR 1 in 1398381 chances
--------------------------------------------------------------
The probability of winning the lottery with your tickets is 0.0007151123842018516% OR 1 in 139838 chances
--------------------------------------------------------------
The probability of winning the lottery with your tickets is 0.07151123842018516% OR 1 in 1398 chances
--------------------------------------------------------------
The probability of winning the lottery with your tickets is 7.151123842018517% OR 1 in 13 chances
--------------------------------------------------------------
The probability of winning the lottery with your tickets is 50.0% OR 1 in 2 chances
--------------------------------------------------------------
The probability o

** CALCULATING THE PROBABILITY OF WINNING 2, 3, 4 or 5 numbers **

In order to calculate the probability of winning less than 6 numbers, we need to calculate how many combinations of 4 numbers can we make out of a a sample with 6 numbers, and the total combination of the remaining numbers i.e, 43. <br>

We arrive at 43 as the remaining number because we are counting combinations for sampling without replacement as we are not interested in outcomes for winning *at least* less than 6 numbers; We want to know the possibility of winning exactly a certain number of lotto numbers. Therefore, we leave out the 6 numbers that make up the lotto combination of the ticket arriving at **43**. <br>

Then we calculate the different remaining combinations of lotto numbers we can make with the winning numbers and (6-winning numbers) from 43 numbers.

In [40]:
def probability_less_6(no):
    success_outcomes = combinations(43, 6-no) * combinations(6,no)
    total_o = combinations(49,6)
    p = (success_outcomes/total_o) *100
    print("The probability of winning {} numbers is {}% OR 1 in {} chances".format(no, round(p,4),round(total_o/success_outcomes)))
    print('---------------------------------------------')

In [41]:
inp = [2,3,4,5]

for i in inp:
    probability_less_6(i)


The probability of winning 2 numbers is 13.2378% OR 1 in 8 chances
---------------------------------------------
The probability of winning 3 numbers is 1.765% OR 1 in 57 chances
---------------------------------------------
The probability of winning 4 numbers is 0.0969% OR 1 in 1032 chances
---------------------------------------------
The probability of winning 5 numbers is 0.0018% OR 1 in 54201 chances
---------------------------------------------


## Next steps ##

**For the first version of the app, we coded four main functions:**

one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set
multi_ticket_probability() — calculates the probability for any number of of tickets between 1 and 13,983,816
probability_less_6() — calculates the probability of having two, three, four or five winning numbers exactly
Possible features for a second version of the app include:

Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in lottery; for instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than winning the lottery")
Combining the one_ticket_probability() and check_historical_occurrence() to output information on probability and historical occurrence at the same time
Create a function similar to probability_less_6() which calculates the probability of having at least two, three, four or five winning numbers. Hint: the number of successful outcomes for having at least four winning numbers is the sum of these three numbers:
- The number of successful outcomes for having four winning numbers exactly
- The number of successful outcomes for having five winning numbers exactly
- The number of successful outcomes for having six winning numbers exactly