# Lottery Probabilities App

## Intro

In this project, we will demonstrate how to use various probability tools for data analysis. By the end, we aim to highlight the importance and value of probabilities in predicting and analyzing data.

#### Scenario

This project is based on a hypothetical situation in which a medical institute is developing an app designed to predict the likelihood of winning the lottery. The app specifically targets individuals who struggle with lottery addiction. The goal is to help these individuals understand the actual odds of winning, which may assist them in overcoming their addiction and prevent them from falling deeper into debt.

Our role in this project is to serve as the underlying logic of the app. We will provide answers to hypothetical lottery questions using the same calculations that the app would employ.

## Functions

As mentioned earlier, we will be addressing numerous probability questions throughout this project. To do this effectively, we will need to utilize specific probability functions multiple times. Therefore, our first step will be to define the two functions outlined below.

![Probability Functions](functions.png "Probability Functions")

In [31]:
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n - k)
    return numerator/denominator

Now that we have established our two main functions for this project, we can proceed to create a function that calculates the chances of winning the grand prize. We are focusing on a 6/49 lottery, where six numbers are drawn from a set of 49 numbers, ranging from one to 49. A player wins the grand prize if the six numbers on their ticket match all six numbers drawn, and they must be in the exact order as drawn.

The function we are about to build will serve as the first version of the app. In this initial version, users will be able to calculate the probability of winning based on the numbers they select for a single ticket. Users will choose six numbers from 1 to 49, which will represent their "single ticket."

Additionally, the engineering team at the medical institute has provided a few important details for this function:

- Users will input six different numbers from 1 to 49 within the app.
- These six numbers will be processed as a Python list, which will be the input for our function.
- The engineering team requests that the function presents the probability value in a clear and friendly manner, making it easy for individuals without any background in probability to understand.

Let's get started! Please note the code comments (marked with a '#') to explain each step of the function.

In [32]:
def one_ticket_probability(six_numbers):
    # using the combinations function to calculate the number of possible combinations
    total_outcomes = combinations(49, 6)
    # there is only 1 winning ticket
    successful_outcomes = 1
    # calculating the probability of winning
    probability = successful_outcomes/total_outcomes
    # converting the probability to a percentage format
    probability_pct = probability * 100
    # presenting the result of the calculation in a readable format
    return print('''Your chance of winning with {} is {:.7f}%. To put it another way, your ticket has a 1 in {:,} chances to win.'''.format(six_numbers,
                    probability_pct, int(total_outcomes)))

Now that we have written our function, we will test out a few combinations, to ensure it performs as desired.

In [33]:
test1 = [6,7,27,32,30,2]
test2 = [41,17,22,22,8,13]

one_ticket_probability(test1)

Your chance of winning with [6, 7, 27, 32, 30, 2] is 0.0000072%. To put it another way, your ticket has a 1 in 13,983,816 chances to win.


In [34]:
one_ticket_probability(test2)

Your chance of winning with [41, 17, 22, 22, 8, 13] is 0.0000072%. To put it another way, your ticket has a 1 in 13,983,816 chances to win.


## Historical Data Check for Canada Lottery

An additional feature that the medical institute wants to include in the first version of the app is the ability for users to compare their ticket against historical lottery data. The selected data will come from past lotteries in Canada. By examining this historical data, users will hopefully be able to see if their ticket would have ever been a winning combination.

For the next step in our project, we will import the necessary libraries and data, and then proceed to explore the dataset.

In [35]:
# importing necessary libraries
import pandas as pd
import numpy as np

# Read in the data
canada_lottery = pd.read_csv('649.csv')

# Quick exploration of the data
print(canada_lottery.shape)

(3665, 11)


In [36]:
canada_lottery.head(5)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [37]:
canada_lottery.tail(5)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


As previously mentioned, the data is sourced from historical 6/49 lotteries held in Canada. The dataset includes information from 3,665 drawings that took place between 1982 and 2018. Each drawing is represented by a row, and the six numbers drawn in each lottery have their own separate columns.

## Historical Data Check Function

Next, we will create a function that allows users to see how their chosen numbers have performed in past lotteries and whether their ticket would have ever been selected. Here are some important details from the engineering team that we need to keep in mind:

- Users will input six different numbers from 1 to 49 within the app.
- These six numbers will be processed as a Python list, which will serve as the input for our function.
- The engineering team requests that our function provides the following information:
    - The number of times the selected combination has occurred in the Canada dataset.
    - The probability of winning the grand prize in the next drawing with that combination.

In [38]:
def extract_numbers(row):
    # selecting the ticket number columns
    row = row[4:10]
    # converting the row numbers to a 'set' data type
    row = set(row.values)
    return row

# using our new function to extract all the winning numbers from our dataset
winning_numbers = canada_lottery.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Now that we have extracted the winning numbers from our dataset, we can proceed to compare the numbers selected by our users with the historical winning numbers.

In [39]:
def check_historical_occurence(six_numbers, winners):
    # converting the user's ticket numbers into a 'set' data type
    user_attempt = set(six_numbers)
    # checking to see if the user's ticket was ever selected in the hisorical data
    check_occurrence = winners == user_attempt
    # adding upp all occurences of the user's ticket in the historical data
    n_occurrences = check_occurrence.sum()

    # presenting the results of the calculations in a readable format
    if n_occurrences == 0:
        print('''The combination {} has never yet occured as a winning ticket. 
Your chance of winning using {} is 0.0000072%. 
To put it another way, your ticket has a 1 in 13,983,816 chances to win.'''.format(user_attempt, user_attempt))
    else:
        print('''The number of times your combination, {}, has occured as a winning ticket in the past is {}. 
Your chance of winning using {} is 0.0000072%. 
To put it another way, your ticket has a 1 in 13,983,816 chances to win.'''.format(user_attempt, n_occurrences, user_attempt))

In [40]:
test1 = [6,7,27,32,30,2]
test2 = [33, 36, 37, 39, 8, 41]

check_historical_occurence(test1, winning_numbers)

The combination {32, 2, 6, 7, 27, 30} has never yet occured as a winning ticket. 
Your chance of winning using {32, 2, 6, 7, 27, 30} is 0.0000072%. 
To put it another way, your ticket has a 1 in 13,983,816 chances to win.


In [41]:
check_historical_occurence(test2, winning_numbers)

The number of times your combination, {33, 36, 37, 39, 8, 41}, has occured as a winning ticket in the past is 1. 
Your chance of winning using {33, 36, 37, 39, 8, 41} is 0.0000072%. 
To put it another way, your ticket has a 1 in 13,983,816 chances to win.


## Multi-ticket Probability

So far, we have developed functions that handle only a single ticket. However, in reality, lottery enthusiasts are likely to purchase multiple tickets to enhance their chances of winning. The objective of our next function is to calculate and display the actual probability of winning based on the number of tickets entered by the user. The specifications provided by the engineering team for this function are as follows:

- The user will specify the number of different tickets they wish to play, without needing to provide the specific combinations.
- Our function will accept an integer input ranging from 1 to 13,983,816, which is the maximum number of different tickets possible.
- The function will output information regarding the probability of winning the grand prize based on the number of different tickets played.

In [42]:
def multi_ticket_probability(num_of_tickets):
    # using the combinations function to calculate the number of possible combinations
    total_outcomes = combinations(49, 6)
    
    # calculating the probability of winning
    probability = num_of_tickets/total_outcomes
    # converting the probability to a percentage format
    probability_pct = probability * 100

    # presenting the results of the calculations in a readable format
    if num_of_tickets == 1:
        print('''Your chance of winning with 1 ticket is {:.6f}%. 
To put it another way, your ticket has a 1 in {:,} chances to win.'''.format(probability_pct, int(total_outcomes)))
    else:
        combinations_simplified = round(total_outcomes / num_of_tickets)
        print('''Your chance of winning with {:,} different tickets are {:.6f}%. 
To put it another way, you have a 1 in {:,} chances to win.'''.format(num_of_tickets, probability_pct, combinations_simplified))

Now that we have developed this function, we will conduct a few tests to evaluate its performance.

In [43]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter

Your chance of winning with 1 ticket is 0.000007%. 
To put it another way, your ticket has a 1 in 13,983,816 chances to win.
------------------------
Your chance of winning with 10 different tickets are 0.000072%. 
To put it another way, you have a 1 in 1,398,382 chances to win.
------------------------
Your chance of winning with 100 different tickets are 0.000715%. 
To put it another way, you have a 1 in 139,838 chances to win.
------------------------
Your chance of winning with 10,000 different tickets are 0.071511%. 
To put it another way, you have a 1 in 1,398 chances to win.
------------------------
Your chance of winning with 1,000,000 different tickets are 7.151124%. 
To put it another way, you have a 1 in 14 chances to win.
------------------------
Your chance of winning with 6,991,908 different tickets are 50.000000%. 
To put it another way, you have a 1 in 2 chances to win.
------------------------
Your chance of winning with 13,983,816 different tickets are 100.000000%. 
T

## Less Winning Numbers

For our final function in this project, we will be focusing on tickets that have only some winning numbers. Sometimes in 6/49 lottery systems, tickets that have a few of the winning numbers on them will recieve smaller prizes. The function we are going to write for this is going to predict the probability of any ticket having a certain number of winning numbers on it. Here are the details from the engineer team:

- Inside the app, the user inputs:
    - six different numbers from 1 to 49; and
    - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

In [44]:
def probability_less_6(n_winning_numbers):
    # calculating the number of successful outcomes using our combinations function
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    # calculating the total number of combinations in the entire lottery
    n_combinations_total = combinations(49, 6)
    
    # dividing the number of successfull outcomes with our winning numbers by the number of total outcomes to get the probability value
    probability = successful_outcomes / n_combinations_total
    
    # converting the probability value to a percentage, a more readable format
    probability_percentage = probability * 100
    
    # calculating the number of chances each ticket has to have the selected number of 'winning numbers' and rounding up to be more readable
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))

In [46]:
# testing our above function with different selections of winning numbers
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------') # output delimiter

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances to win.
--------------------------
Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances to win.
--------------------------
Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances to win.
--------------------------
Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances to win.
--------------------------


The coding comments above outline the process our function follows to calculate the probability of a single ticket containing a specific number of winning numbers. The test results indicate that as the number of winning numbers decreases, the probability of having that number of winning numbers increases, and vice versa. For lottery ticket customers, this information could be quite valuable.

## Conclusion

In this project, we developed four main functions for the initial version of an app designed for lottery enthusiasts. These functions include:

- `one_ticket_probability()` — calculates the probability of winning the grand prize with a single ticket
- `check_historical_occurrence()` — checks whether a specific combination has appeared in the Canadian lottery dataset
- `multi_ticket_probability()` — calculates the probability for any number of tickets between 1 and 13,983,816
- `probability_less_6()` — calculates the probability of having exactly two, three, four, or five winning numbers

One of our primary objectives was to illustrate the value of using probability techniques when analyzing data. Although this project is small, it effectively demonstrates how probabilities can provide a clear and concise understanding of the overall data landscape. Additionally, these techniques enable us to create more complex and comprehensive functions that would be quite challenging to develop without a solid understanding of probabilities. Overall, probabilities serve as a versatile and practical tool, useful for a variety of tasks, including exploration, analysis, and prediction.