# A Mobile App For Lottery Addiction
In this project we will be looking at the [6/49 Lottery](https://en.wikipedia.org/wiki/Lotto_6/49) which is a lottery held in Canada. 

Like many others 6 matching numbers gets you the jackpot big prize. 

Smaller prizes are on offer for two to 5 winning numbers also.

We will be creating different fuctions for a Lottery Addiction app to do the following:
- Calculate the probability of winning the big prize with a single ticket
- Check whether a certain combination has occurred in the Canada lottery data set
- Calculate how likely a repeat combination will take in a given number of draws
- Calculate the probability for any number of of tickets between 1 and 13,983,816
- Calculate the probability of having two, three, four or five winning numbers

### Possible Outcomes Calculators
A key part of determining probabilities is determining the number of possible outcomes. 
Here we will construct two functions:
 - factorial() : For a number n calculate: n * n-1 * n-2 ... n - (n-1).
 
i.e. 6 factorial = 6! = 6 * 5 * 4 * 3 * 2 * 1 = 720

- combinations(): For a number of objects k from total number of objects n calculate (n! / (k!(n-k!)) with no replacement. Any order is permitted.(No Replacement means the number/ object cannot be reused in sequence)



i.e. combinations of 4 numbers out of 6 = (6! / (4!(6-4!)) = 15

In [1]:
def factorial(n):
    if n <1:
        return 1
    else:
        return (n * factorial(n-1))

def combinations(n,k):
    numerator = factorial(n)
    denominator = (factorial(k) * (factorial(n-k)))
    return (numerator/denominator)

### The Elusive Single Ticket Big Prize Win

In the 6/49 lottery six numbers are drawn from 49 numbers (that range from 1 to 49)

A player wins the big prize if 6 numbers match the six drawn.

Lets begin by calculating the big prize probability from just one ticket purchased.

In [2]:
total_possible_outcomes = combinations(49,6)
total_possible_outcomes_million = total_possible_outcomes / 1000000
def one_ticket_probability(ticket_list):
    if len(ticket_list) != 6:
        print("Please select 6 tickets")
        return
    if len(set(ticket_list)) != 6:
        print("Cannot select same numbers more than once")
        return
    else:
        probability_of_one_ticket = 1 / total_possible_outcomes
        percent_probability = probability_of_one_ticket * 100
        print("With these numbers", ticket_list, "you have a {:.7f}% chance of winning.".format(percent_probability)) 
        print("There are {:.2f} million possible ticket combinations".format(total_possible_outcomes_million))

In [3]:
one_ticket_probability([1,2,3,4,5,6,7])
print('\n')
one_ticket_probability([1,2,3,4,5,5])
print('\n')
one_ticket_probability([49,2,48,4,47,6])
print('\n')
one_ticket_probability([1,2,3,4,5,6])

Please select 6 tickets


Cannot select same numbers more than once


With these numbers [49, 2, 48, 4, 47, 6] you have a 0.0000072% chance of winning.
There are 13.98 million possible ticket combinations


With these numbers [1, 2, 3, 4, 5, 6] you have a 0.0000072% chance of winning.
There are 13.98 million possible ticket combinations


Here we have created a simple program that allows a player to enter their numbers and calculate their chances of winning. 

We have also created some redundancy in the event that too few, too many or duplicate numbers are selected.

### Testing The Real Data Set
This [dataset](https://www.kaggle.com/datascienceai/lottery-dataset) contains 3665 drawings (lottery events) from 1982 to 2018.

Lets preview the data.

In [4]:
import pandas as pd
lottery = pd.read_csv('649.csv')
print(lottery.sample(5))
print("The dataset has", lottery.shape[0], "rows and", lottery.shape[1], "columns.")

      PRODUCT  DRAW NUMBER  SEQUENCE NUMBER   DRAW DATE  NUMBER DRAWN 1  \
2927      649         2919                0   1/11/2012               9   
2941      649         2933                0   2/29/2012               5   
1440      649         1441                0  11/12/1997               4   
1076      649         1077                0   5/18/1994               1   
177       649          178                0   10/5/1985               4   

      NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
2927              10              11              18              39   
2941               6               8               9              11   
1440               5              14              30              32   
1076               4               8              30              31   
177                6              22              27              31   

      NUMBER DRAWN 6  BONUS NUMBER  
2927              45            23  
2941              40            28  
1440 

In [5]:
lottery.isnull().sum()

PRODUCT            0
DRAW NUMBER        0
SEQUENCE NUMBER    0
DRAW DATE          0
NUMBER DRAWN 1     0
NUMBER DRAWN 2     0
NUMBER DRAWN 3     0
NUMBER DRAWN 4     0
NUMBER DRAWN 5     0
NUMBER DRAWN 6     0
BONUS NUMBER       0
dtype: int64

We have the draw numbers, draw dates and the 6 numbers drawn (+1 extra bonus number).

Helpfully we have no null (empty) data cells.

### Historical Winning Numbers
It will also be of interest to our players wether their number has been a winner in the past.

It should be noted that past events dont influence future ones but this is more for fun for the players.

In [6]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    row = sorted(row)
    return tuple(row)

In [7]:
lottery['winning_numbers'] = lottery.apply(extract_numbers, axis = 'columns') 
#axis set to columns to apply method to each row#
lottery['winning_numbers'].sample(5)

3353     (7, 15, 27, 31, 35, 44)
733      (5, 10, 16, 20, 29, 42)
120     (21, 22, 29, 36, 38, 41)
3394      (3, 8, 25, 27, 36, 38)
51       (4, 14, 24, 31, 33, 35)
Name: winning_numbers, dtype: object

In [8]:
lottery['winning_numbers'].value_counts(sort = True).head()

(5, 9, 23, 30, 34, 36)     1
(6, 15, 23, 30, 34, 43)    1
(3, 8, 9, 33, 39, 49)      1
(9, 21, 22, 29, 48, 49)    1
(6, 10, 20, 30, 31, 39)    1
Name: winning_numbers, dtype: int64

It is very interesting to see that there even in the 3665 times the lottery has been run there have been no recurrences of a repeat combination. 

What are the odds of this happening?

In [9]:
def likelyhood_of_repeat_combination(number_of_draws):
    odds = (1-(1/combinations(49,6))) **number_of_draws
    print("The percent likelyhood of having no repeats is {} %".format(odds * 100))
    print("\n")
    print("The percent likelyhood of having a repeats is {} %".format((1-odds) * 100))

In [10]:
likelyhood_of_repeat_combination(3665)

The percent likelyhood of having no repeats is 99.97379456442258 %


The percent likelyhood of having a repeats is 0.026205435577431047 %


As there are 14 million combinations and only 3.7 thousand draws its hardly surpising the chances of a repeat were very low.

In [11]:
likelyhood_of_repeat_combination(9700000)

The percent likelyhood of having no repeats is 49.97441372465429 %


The percent likelyhood of having a repeats is 50.02558627534571 %


In the next 9.7million draws we would expect there to be a ~ 50:50 chance of a repeat number.

Now lets create a function that allows a player to check whether their picked numbers were previously a winner.

In [12]:
def check_historical_occurence(player_numbers, historical_data):
    winners = historical_data['winning_numbers']
    player_numbers = tuple(sorted(player_numbers))
    for i in winners:
        if player_numbers == i:
            draw_date = historical_data[winners == player_numbers]['DRAW DATE']
            return "These were winning numbers on {}".format(draw_date[0])
        else:
            return "No previous winners with these numbers"

In [13]:
player_1_numbers = [3, 11, 12, 14, 41, 43]
print(player_1_numbers, "\n", check_historical_occurence(player_1_numbers, lottery))
print("\n")
player_2_numbers = [1, 2, 3, 4, 5, 6]
print(player_2_numbers, "\n", check_historical_occurence(player_2_numbers, lottery))

[3, 11, 12, 14, 41, 43] 
 These were winning numbers on 6/12/1982


[1, 2, 3, 4, 5, 6] 
 No previous winners with these numbers


### Multiple Tickets Means Better Odds?
This is clear to all players. But what might be interesting to a player is what are my odds if I buy x tickets?

In [14]:
def multi_ticket_probability(ticket_number):
    no_of_outcomes = combinations(49,6)
    chance = ticket_number / no_of_outcomes
    percent_chance = chance * 100
    print("With {} number of tickets you have a {:.7f}% chance of winning.".format(ticket_number, percent_chance)) 

In [15]:
print(multi_ticket_probability(1))
print("\n")
print(multi_ticket_probability(10))
print("\n")
print(multi_ticket_probability(100))
print("\n")
print(multi_ticket_probability(1000))
print("\n")
print(multi_ticket_probability(10000))
print("\n")
print(multi_ticket_probability(100000))

With 1 number of tickets you have a 0.0000072% chance of winning.
None


With 10 number of tickets you have a 0.0000715% chance of winning.
None


With 100 number of tickets you have a 0.0007151% chance of winning.
None


With 1000 number of tickets you have a 0.0071511% chance of winning.
None


With 10000 number of tickets you have a 0.0715112% chance of winning.
None


With 100000 number of tickets you have a 0.7151124% chance of winning.
None


These are not great odds.
Buying 100 thousand tickets would give you a 1% chance of winning.
Seeing as each ticket is 3 Canadian dollars (~£1.7) you could lost a lot of money.

### Smaller Prizes Better Odds
In addition to the big prize (like most lotterys) there are extra prizes if you match two, three, four or five of the winning numbers.

We will show a player the odds of winning a lower prize in our app.

In [16]:
def probability_less_than_6(no_of_winning_numbers):
    if (no_of_winning_numbers <2) or (no_of_winning_numbers >= 6):
        return "Please select 2, 3, 4, or 5 tickets"
    else:
        ticket_combinations = combinations(6, no_of_winning_numbers)
        #There are 6 numbers on a ticket which wins these prizes. 
        #To match k of the winning numbers, we must select k of 6 winning numbers
        #AND we must select (6 − k) of the 43 non-winning numbers.
        #Therefore there are combination(6,K) x combination(43,(6-k)) winning tickets.
        remaining_combinations = combinations(43, 6-no_of_winning_numbers)
        successful_outcomes = ticket_combinations * remaining_combinations
        possible_outcomes = combinations(49,6)
        probability = successful_outcomes / possible_outcomes
        percent_prob = probability * 100
        simplified = round(possible_outcomes/successful_outcomes)
        print("The odds of getting {} number of winning numbers exactly is a {:.5f}% chance. \nThis is a 1 in {:,} chance of winning".format(no_of_winning_numbers, percent_prob, int(simplified)))

In [17]:
probability_less_than_6(2)

The odds of getting 2 number of winning numbers exactly is a 13.23780% chance. 
This is a 1 in 8 chance of winning


In [18]:
probability_less_than_6(3)

The odds of getting 3 number of winning numbers exactly is a 1.76504% chance. 
This is a 1 in 57 chance of winning


In [19]:
probability_less_than_6(4)

The odds of getting 4 number of winning numbers exactly is a 0.09686% chance. 
This is a 1 in 1,032 chance of winning


### Conclusion
We have now created 5 fun and informative functions for the mobile application.

Future additions could be comparing probabilities to other unlikely events (i.e. like being hit by lightning) or 