# Advising Lottery Estimations
The lottery usually is a fun activity to take part in and hope for a better life. Sometimes, this habit turns into an addiction, with addicts wasting away money and potentially turning to desperate behaviors, like theft, to make up more money to bet. 

Our goal is to help a medical institute by building a dedicated mobile app for lottery addicts to better estimate their chances of winning. The institute has a dedicated core of engineers to build the app, but we are responsible for the core probabilities that are pivotal for the app functionality. 

For the first version of the app, we will focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that allow users to answer questions like:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data pertinent to this lottery. Our dataset is a [Kaggle](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) resource from 1982-2018. 

### Defining Functions

Throughout this project, we'll need to calculate repeated probabilities and combinations. So let's start with coding these two essential functions:
- A function that calculates factorials; and
- A function that calculates combinations.

In [2]:
def factorial(n): #The same as the mathematical equation 5! = 5x4x3x2x1
    factorial = 1
    for i in range(1, n+1):
        factorial *= i
    return factorial

def combinations(n,k): #Combination formula: n! / k! (n-k)!
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator / denominator

### Winning the Big Prize

In this lottery, six numbers are drawn from 49 numbers from 1-49. A player wins if their numbers match the numbers drawn. For the first version of this app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket. 

After discussing with the engineering team of the institute, they told us we need to be aware of these following details:
- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [3]:
def one_ticket_probability(numbers):
    possible_outcomes = combinations(49,6)
    success_prob = (1/possible_outcomes)
    print('''Your numbers, {}, have a {:.8%} chance of winning the lottery. You have a 1 in {:,} chance of winning'''.format(numbers, success_prob,
                                                  int(possible_outcomes)))

Can test the function with two inputs...

In [4]:
array = [1,2,3,4,5,6]
one_ticket_probability(array)

Your numbers, [1, 2, 3, 4, 5, 6], have a 0.00000715% chance of winning the lottery. You have a 1 in 13,983,816 chance of winning


In [5]:
array = [40,32,15,27,2]
one_ticket_probability(array)

Your numbers, [40, 32, 15, 27, 2], have a 0.00000715% chance of winning the lottery. You have a 1 in 13,983,816 chance of winning


### Scoping out our Historical Data
For the first version of the app, users should be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now. 

We'll be using the following [dataset](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) which contains 3,665 rows dating from 1982 to 2018. 

For each row (or drawing), we will find columns for each number drawn labeled:
- Number Drawn 1
- Number Drawn 2
- Number Drawn 3
- Number Drawn 4
- Number Drawn 5
- Number Drawn 6

In [6]:
import pandas as pd
history = pd.read_csv('649.csv')
print(history.shape,'\nTop 3 rows of dataset:')
pd.DataFrame(history.head(3))

(3665, 11) 
Top 3 rows of dataset:


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [7]:
print('**Bottom 3 rows of dataset:**')
pd.DataFrame(history.tail(3))

**Bottom 3 rows of dataset:**


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


There are 3665 rows with 11 columns ordered by draw date. It consists of the number drawn columns, a bonus number column, and a few identifying columns.

### Comparing function
We will now write a function enabling users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now. 

We've been told to be aware of these following details:
- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
    - the number of times the combination selected occurred in the Canada data set; and
    - the probability of winning the big prize in the next drawing with that combination.

In [8]:
def extract_numbers(row):
    row = row[4:10] #Takes in the 6 numbers chosen
    row = set(row)  #Formats it into a set  
    return row

In [9]:
Winning_Numbers = history.apply(extract_numbers, axis = 1)
Winning_Numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [10]:
def check_historical_occurence(user_numbers, winning_numbers):
    user_numbers = set(user_numbers)
    bool_check = user_numbers == winning_numbers
    num_of_times = bool_check.sum()
    
    if num_of_times == 0:
        print('''Your numbers, {}, have never appeared in the history of this lottery event. 
Attempting to win the lottery comes with a 1 in 13,983,816, or 0.00000715%, chance of winning.'''.format(user_numbers))
    else:
        print('''The numbers {} have appeared {} times since the inception of this lottery.
Your chances to win with these numbers is 1 in 13,983,816, or 0.00000715%.'''.format(user_numbers,
                                                                                            num_of_times))

In [11]:
check_historical_occurence([3, 41, 11, 12, 43, 14], Winning_Numbers)

The numbers {3, 41, 11, 12, 43, 14} have appeared 1 times since the inception of this lottery.
Your chances to win with these numbers is 1 in 13,983,816, or 0.00000715%.


In [12]:
check_historical_occurence([1,2,3,4,5,6], Winning_Numbers)

Your numbers, {1, 2, 3, 4, 5, 6}, have never appeared in the history of this lottery event. 
Attempting to win the lottery comes with a 1 in 13,983,816, or 0.00000715%, chance of winning.


With this function we can test a user's input against the historical winning draws. This will notify them if there's a match or not, but regardless we will tell them about the large, almost guaranteed, chances that their efforts will be futile in terms of winning. 

### Multiple Ticket Probabilities
Lottery addicts usually play more than one ticket on a single drawing, so with our purpose of helping them better estimate their chances of winning we can now write a function to allow users to calculate their chances of winning for any number of different tickets.

We've been given the following information:
- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [42]:
def multi_ticket_probability(tickets):
    possible_outcomes = combinations(49,6) # All possible outcomes
    probability = tickets / possible_outcomes #Probability
    print('''The number of tickets you have: {}, will give you a {:.8%} chance of winning'''.format(tickets, probability))

In [54]:
test_input = [1,10,100,10000,1000000,6991908,13983816]

for i in test_input:
    multi_ticket_probability(i)
    if i != 13983816:
        print('-' * 16)

The number of tickets you have: 1, will give you a 0.00000715% chance of winning
----------------
The number of tickets you have: 10, will give you a 0.00007151% chance of winning
----------------
The number of tickets you have: 100, will give you a 0.00071511% chance of winning
----------------
The number of tickets you have: 10000, will give you a 0.07151124% chance of winning
----------------
The number of tickets you have: 1000000, will give you a 7.15112384% chance of winning
----------------
The number of tickets you have: 6991908, will give you a 50.00000000% chance of winning
----------------
The number of tickets you have: 13983816, will give you a 100.00000000% chance of winning


This function will allow users to face the possibilities of winning/losing depending on how many tickets they've bought for this lottery. Therefore, they can see how poor their chances of winning are and it may dissuade them from playing. 

### Smaller Prizes
In most 6/49 lotteries, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. Users may be interested in knowing the probability of having these numbers drawn in their favor. 

Details for this function detailed below:
- Inside the app, the user inputs:
    - six different numbers from 1 to 49; and
    - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

First we need to differentiate between two probability questions:
- What is the probability of having exactly five winning numbers?
- What is the probability of having at least five winning numbers?

Let's say a player chose these six numbers on a ticket: (1, 2, 3, 4 ,5 ,6). Out of these six numbers, we can form six five-number combinations:
- (1,2,3,4,5)
- (1,2,3,4,6)
- (1,2,3,5,6)
- (1,2,4,5,6)
- (1,3,4,5,6)
- (2,3,4,5,6)

For each one of the six five-number combinations above, there are 44 possible successful outcomes in a lottery drawing. For the combination (1, 2, 3, 4, 5), for instance, there are 44 lottery outcomes that would return a smaller prize:
- (1,2,3,4,5,6)
- (1,2,3,4,5,7)
- (1,2,3,4,5,8-49)

However, we need to remember that we couldn't have (1,2,3,4,5,6) as that would be a lottery win, not the smaller prize. **Matching exactly 5 numbers, not at least 5 numbers.**This means that for each of our six 5-number combinations we have 43 possible successful outcomes, not 44. 

Since there are six five-number combinations and each combination corresponds to 43 successful outcomes, we need to multiply 6 by 43 to find the total number of successful outcomes (6x4 = 258) 

Since there are 258 successful outcomes and there are 13,983,816 total possible outcomes, the probability of having exactly five winning numbers for a single lottery ticket is: (258 / 13,983,816) = 0.00001845.

We will say that we only need an integer between 2 and 5 representing the number of winning numbers expected, now let's write the function. 

In [47]:
def probability_less_6(integer):
    combos_ticket = combinations(6,integer)
    combos_remaining = combinations(43, 6 - integer)
    successful_outcomes = combos_ticket * combos_remaining
    
    probability = successful_outcomes / combinations(49,6)
    print('''If you are trying to win the smaller prize of matching {} numbers,
the chances of you winning are {:.8%}'''.format(integer,probability))

In [53]:
test_input = [2,3,4,5]

for i in test_input:
    probability_less_6(i)
    if i != 5:
        print('-'*16)

If you are trying to win the smaller prize of matching 2 numbers,
the chances of you winning are 13.23780290%
----------------
If you are trying to win the smaller prize of matching 3 numbers,
the chances of you winning are 1.76504039%
----------------
If you are trying to win the smaller prize of matching 4 numbers,
the chances of you winning are 0.09686197%
----------------
If you are trying to win the smaller prize of matching 5 numbers,
the chances of you winning are 0.00184499%


# Conclusion
In this project we aimed to develop an app to aid lottery addicts in recognizing their practically futile chances of winning these events. In building this app, we constructed four apps to do so:
- one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
- check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set
- multi_ticket_probability() — calculates the probability for any number of of tickets between 1 and 13,983,816
- probability_less_6() — calculates the probability of having two, three, four or five winning numbers