# Lotto probability of winning

## Introduction

In this project we are going to analyse data from 6/49 lottery and answer questions like:
- What is the probability of winning if we buy x tickets.
- What is the probability of having at least five, four etc. winning numbers on a single ticket.

The historical data set can be found here [Link](https://www.kaggle.com/datascienceai/lottery-dataset)

### Summary of results

Winning the big price in 6/49 Lotto is extremely unlikely even if you buy a lot of tickets. 

## Calculations

We will need to calculate probabilities using comtinations as in the 6/49 lottery, six number are drawn from a set of 49 numbers that range from 1 to 49. Once the number is drawn, it's not put back in the set. Lets create a function to calculate combinations which we will be continously using. 

In [2]:
import pandas as pd

In [4]:
from math import factorial
def combinations(n,k):
    return factorial(n)/(factorial(k)*factorial(n-k))

We imported factorial function from math library and created combinations function on our own. Lets now create a function which can calculate the probability of winning if we buy a sigle ticket.

In [5]:
def one_ticket_probability(numbers):
    return 'Your chance of winning is ' + '{:.7f}'.format((1/combinations(49,6))*100) +'%'

In [6]:
one_ticket_probability(3)

'Your chance of winning is 0.0000072%'

Now we know that a chance of winning having one ticket is 0.0000072%. We can also analyse historical data to check if choosing this numbers an user would have ever won.

In [3]:
lottery_data = pd.read_csv('data//649.csv')
lottery_data.shape

(3665, 11)

In [8]:
lottery_data

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45
...,...,...,...,...,...,...,...,...,...,...,...
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8


As we can see *NUMBER DRAWN 1,2,3,4,5,6* columns provide us an information which numbers were drawn. We need to transform this columns into one data set then compare user's input to historical results.

In [9]:
def extract_numbers(row):
    return set(row.iloc[4:10].values.tolist())
    
past_lotto = lottery_data.apply(extract_numbers,axis=1)
lotto = pd.DataFrame({'numbers':past_lotto,'date':lottery_data.iloc[:,3].values})

In [10]:
def check_historical_occurence(numbers,hist=lotto):
    slicer = set(numbers) == lotto['numbers']
    values = lotto[slicer]['date'].values[0]
    return 'Imputted combination occured %s time(s) on %s' %(len(values[0]),values), one_ticket_probability(numbers)

In [11]:
check_historical_occurence([32, 34, 6, 22, 24, 31])

('Imputted combination occured 1 time(s) on 6/13/2018',
 'Your chance of winning is 0.0000072%')

There are two functions *extract_numbers* to transform drawn numbers into a set and *check_hisorical_occurence* to compare user's input against historical data. Lets now write a function that can calculate the probability of winning depending on the number of tickets played.

In [37]:
def multi_ticket_probability(n_tickets):
    total_comb = combinations(49,6)
    chance = '{:.7f}'.format((n_tickets/total_comb)*100)
    return 'Playing %s ticket(s) you have %s' %(n_tickets,chance) +'% ' 'chance to win'

In [41]:
multi_ticket_probability(20)

'Playing 20 ticket(s) you have 0.0001430% chance to win'

We have created the function which being given a number of tickets bought by user can calculate the probability of winning the big price. Next we are going to create a function to calculate the probability of scoring 2,3,4 or 5 numbers out of total 6.

In [182]:
def probability_less_6(shoots):
    outcomes = combinations(6,shoots)
    total_out = outcomes * combinations(43,(6-shoots))
    total_comb = combinations(49,6)
    prob = '{:.7f}'.format((total_out/total_comb)*100)
    
    return 'The probability of having %s winning numbers is %s' %(shoots,prob) +'%'

In [184]:
probability_less_6(2)

'The probability of having 2 winning numbers is 13.2378029%'

Now we can calculate the probability of having two, three, four or five winning numbers. Lets add one more function to calculate the probability of having at least two, three, four or five winning numbers.

In [185]:
def probability_at_least_n(shoots):
    outcomes = combinations(6,shoots)
    total_out = outcomes * combinations(43,(6-shoots))
    total_comb = combinations(49,6)
    prob = '{:.7f}'.format((total_out/total_comb)*100)
    
    return 'The probability of having %s winning numbers is %s' %(shoots,prob) +'%'


## Conclusion 

Winning the big price in 6/49 Lotto is extremely unlikely even if you buy a lot of tickets. 

### Next steps

Creating a function similar to probability_less_6() which calculates the probability of having at least two, three, four or five winning numbers. 
The number of successful outcomes for having at least four winning numbers is the sum of these three numbers:
- The number of successful outcomes for having four winning numbers exactly
- The number of successful outcomes for having five winning numbers exactly
- The number of successful outcomes for having six winning numbers exactly