# Mobile App To Estimate Chances Of Winning The Lottery

For this guided project, I must assume that a medical institution that treats and prevents gambling addiction, want to build a mobile app that help lottery addicts better estimate their chances of winning.

There is a team of engineers that will build the final app, but they have asked my team to build a few functions that calculate probabilities of winning the lottery.

They would like us to create functions that enable users to answer the following questions:

- What is the probability of winning the big prize with a single ticket?

- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?

- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

They've also asked that we consider historical data coming from the national 6/49 lottery game in Canada. The dataset has records of 3,665 drawings, dating from 1982 to 2018. You can get access to the dataset on kaggle [here](https://www.kaggle.com/datascienceai/lottery-dataset)

**Please note:** This is a fictional scenario mean't for learning purposes only and in no way mean't to offer medical or financial advice. 

## Probability Functions

Our task consist mostly of calculating probabilities, so we'll need tools to help solve probability problems. In the 6/49 lottery, six numbers are drawn at random from a set of 49 numbers ranging from 1 to 49. The draws are done without replacement.

To calculate probabilities, we need to know how many different combinations we can get when choosing 6(5,4,3 or 2) numbers from a possible set of 49 numbers.

To find the number of combinations when sampling without replacement where order dosen't matter, we'll use this formula:

\begin{equation}
_nC_k = {n \choose k} =  \frac{n!}{k!(n-k)!}
\end{equation}

We can see that we'll often need to calculate factorials, so we'll start by creating two core functions, one to calculate factorials and one to calculate combinations

In [1]:
# factorial and combinations functions.

def factorial(n):
    final_product = 1
    for i in range(n,0,-1):
        final_product *=i
    return final_product

def combinations(n,k):
    return factorial(n)/(factorial(k)*factorial(n-k))

## What is the probability of winning the big prize with a single ticket?

To win the big prize in the 6/49 lottery the player must match all six numbers on their ticket with the six numbers drawn from the set of 49 numbers.

For the first version of our app we want players to be able to calculate the probability of winning the big prize with one ticket.

We're told by engineering to be aware of the following details when we write the function:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

Let's write the function:

In [2]:
def one_ticket_probability(nums):
    k = len(nums)
    n = 49
    C = combinations(n,k)
    probability = 1/C * 100
    return "You have {:.7f}% chance of winning the lotery with one ticket, which is 1 out of {} chances.".format(probability, int(C))


# Test the function.
test_nums = [3,8,11,27,34,41]
one_ticket_probability(test_nums)

'You have 0.0000072% chance of winning the lotery with one ticket, which is 1 out of 13983816 chances.'

## Compare Tickets With Historical Lottery Data

The one_ticket_probability function above is based on theoretical probability, where we assume that every possible outcome (every combination of six numbers chosen from 49) has an equal chance of happening.

However, players might also like to compare their tickets with historical lottery data. We can calculate empirical probabilities with the 6/49 lottery dataset we downloaded from kaggle

We'll start by reading and exploring the dataset.

In [3]:
# Read the 6/49 dataset into a pandas dataframe

import pandas as pd

df_649 = pd.read_csv('649.csv')

print("Rows: {}, Columns: {}".format(df_649.shape[0], df_649.shape[1]))
print("\n")
print("Head")
print("\n")
print(df_649.head(3))
print("\n")
print("Tail")
print("\n")
print(df_649.tail(3))

Rows: 3665, Columns: 11


Head


   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   
2      649            3                0  6/26/1982               1   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   
2               6              23              24              27   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  
2              39            34  


Tail


      PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
3662      649         3589                0  6/13/2018               6   
3663      649         3590                0  6/16/2018               2   
3664      649         3591            

The columns labeled NUMBER DRAWN 1, NUMBER DRAWN 2... NUMBER DRAWN 6 are the six winning numbers drawn on each draw, and are the numbers of interest to us.

Once again the engineering team has asked we be aware of the following details when creating our function:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
 - the number of times the combination selected occurred in the  -  Canada data set; and
 - the probability of winning the big prize in the next drawing with that combination.

Before creating a function to check historical occurences, we'll write a function that extracts the six winning numbers from each row in our dataset.

In [4]:
# Function to extract the 6 winning numbers from a row
# in the pandas dataframe and return as a set.

def extract_numbers(row):
    return set(row[4:10].values) 

# Test the function on the first row:
extract_numbers(df_649.iloc[0])

{3, 11, 12, 14, 41, 43}

In [5]:
# Apply the extract_numbers function to all the rows in the dataset
# And save the sets of numbers as a pandas series.

extracted_numbers = df_649.apply(extract_numbers, axis=1)

# Display the first five rows:
extracted_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

We can now write a function to check historical occurences, following the guidelines provided by the engineers above.

In [6]:
# The functions takes as input a list of 6 numbers that a player choose,
# and a series of past winning numbers, check  and print how many times
# the players numbers have occured and the probability of winning in the
# next draw.

def check_historical_occurence(nums, winning_nums):
    set_nums = set(nums)
    matching_nums = winning_nums == set_nums
    wins = winning_nums[matching_nums].shape[0]
    return print('''Your numbers matched previous winning numbers {0} times.\nAlthough your numbers matched {0} times in passed draws, the probability\nof winning the big prize in the next draw with these numbers is 0.0000072%'''.format(wins))

# list of 6 test numbers
nums_test = [1,6,39,23,24,27]

# Test the function using the test numbers and the previous winning numbers.
check_historical_occurence(nums_test, extracted_numbers)

Your numbers matched previous winning numbers 1 times.
Although your numbers matched 1 times in passed draws, the probability
of winning the big prize in the next draw with these numbers is 0.0000072%


## What is the probability of winning the big prize if we play 40 different tickets (or any other number)?

Lottery addicts often play more than one ticket on a single draw, believing they will significantly increase their chances of winning.

So we'll now code a function that calculates the chances of winning with any number of different tickets.

Engineering has provided the following guidlines:

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

Let's write the function

In [7]:
# User inputs the numbers of tickets they wish to play.

def multi_ticket_probability(num_tickets):
    outcomes = combinations(49, 6)
    probability = num_tickets/outcomes
    percentage = probability * 100
    outcomes_simplified = round(outcomes / num_tickets)
    return """You have {:.7f}% chance of winning with {} tickets, 
that is 1 out of {} chance to win""".format(percentage, num_tickets, outcomes_simplified)

# list of different number of tickets to test.
num_tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for i in num_tickets:
    chances = multi_ticket_probability(i)
    print(chances)
    print("\n")

You have 0.0000072% chance of winning with 1 tickets, 
that is 1 out of 13983816 chance to win


You have 0.0000715% chance of winning with 10 tickets, 
that is 1 out of 1398382 chance to win


You have 0.0007151% chance of winning with 100 tickets, 
that is 1 out of 139838 chance to win


You have 0.0715112% chance of winning with 10000 tickets, 
that is 1 out of 1398 chance to win


You have 7.1511238% chance of winning with 1000000 tickets, 
that is 1 out of 14 chance to win


You have 50.0000000% chance of winning with 6991908 tickets, 
that is 1 out of 2 chance to win


You have 100.0000000% chance of winning with 13983816 tickets, 
that is 1 out of 1 chance to win




## What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

In most 6/49 lotteries you can win smaller prizes if you match two, three, four of five numbers.

So our final function will calculate the probabilities of winning a smaller prize.

Engeneering has told us to be aware of the following details:

- Inside the app, the user inputs:
  - six different numbers from 1 to 49; and
  - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

To calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant behind the scenes, and we only need the integer between 2 and 5 representing the number of winning numbers expected.

Let's code the function

In [9]:
# Function returns the probability of matching exactly 2,3,4,or 5 numbers
# no more no less.

def probability_less_6(num):
    # number of combination we can form when choosing 2,3,4,or 5 from 6.
    num_outcomes = combinations(6,num)
    remaining_outcomes = combinations(43,6-num)
    sucessful_outcomes = num_outcomes * remaining_outcomes
    total_outcomes = combinations(49,6)
    probability = sucessful_outcomes/total_outcomes * 100
    simplified_outcomes = total_outcomes/ sucessful_outcomes
    return '''Your chance of getting {} numbers is 
{:.7f}%, which is 1 out of {} chances.'''.format(num,probability,int(round(simplified_outcomes)))

test_nums = [2,3,4,5]

for i in test_nums:
    chances = probability_less_6(i)
    print(chances)
    print("\n")

Your chance of getting 2 numbers is 
13.2378029%, which is 1 out of 8 chances.


Your chance of getting 3 numbers is 
1.7650404%, which is 1 out of 57 chances.


Your chance of getting 4 numbers is 
0.0968620%, which is 1 out of 1032 chances.


Your chance of getting 5 numbers is 
0.0018450%, which is 1 out of 54201 chances.


