<img src="Lotto_649_logo.svg" style="display:block; margin:auto" width=250>

<div align="center"> <h1 align="center"> Project: Mobile App for Lottery Addiction </h1> </div>

#### The Scenario

A significant portion of individuals initially engage in lottery participation for fun purposes, yet for a subset of this people, this amusement activity transitions into a habitual behavior, eventually leading to addiction. Similar to other compulsive gamblers, those addicted to the lottery often use their savings or create debts, which could result in desperate actions like theft.

A medical institution dedicated to addressing and mitigating gambling addictions has tasked us with developing the core logic for a mobile application aimed at assisting lottery addicts in better assessing their chances of winning. While the institution's engineering team will handle the app's technical implementation, they require us to design the foundational algorithms and calculate relevant probabilities.

In the initial iteration of the app, the focus will be on the 6/49 lottery. We are tasked with creating functions that allow users to answer key questions, such as:

- What is the likelihood of winning the jackpot with a single ticket?
- What is the likelihood of winning the jackpot when playing multiple tickets (e.g., 40 tickets)?
- What is the likelihood of matching at least five, four, three, or two winning numbers on a single ticket?

#### The Data

Furthermore, the institution requests that we incorporate historical data obtained from the national 6/49 lottery game in Canada. This [dataset](https://www.kaggle.com/datascienceai/lottery-dataset) includes information from 3,665 drawings spanning the period from 1982 to 2018, providing a comprehensive basis for analysis and probability calculation.

In [1]:
# Import relevant packages
import pandas as pd
import time

#### Core Functions

Below, we're going to write two functions that we'll be using frequently:

- `factorial()` — a function that calculates `factorials`
- `combinations()` — a function that calculates `combinations`

In [2]:
# Create a function calculation the factorial
def factorial(n):
    result = 1
    for num in reversed(range(1, n+1)):
        result *= num
    return result

In [3]:
# Create a function calculation the combinations
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n - k)
    
    return numerator / denominator

# Testing function for combinations of n=49, k=6
combinations(49, 6)

13983816.0

#### One-ticket Probability


In the preceding section, our attention was directed towards crafting the `factorial()` and `combinations()` functions, central components essential for following calculations. Now, our focus shifts to develop a function dedicated to computing the probability of winning the grand prize.

In the initial phase of app development, our primary objective is to empower users to assess the likelihood of winning the grand prize based on the numbers they select for a single ticket in the 6/49 lottery. Therefore, our immediate task involves creating a function capable of computing the probability of winning the big prize for any given ticket.

In consultation with the engineering team from the medical institute, we've established specific guidelines for crafting this function:

- Users input six distinct numbers ranging from 1 to 49.
- These six numbers are represented as a Python list, serving as the sole input for our function.
- The function output should present the probability value in a user-friendly format, ensuring accessibility for individuals with minimal background in probability theory.

In [4]:
# Create the one_ticket_probability function

def one_ticket_probability():
    
    check = False
    
    while not check:
            input_numbers = input('Please enter six numbers between 1 and 49 separated by space:')
            user_numbers = [int(i) for i in input_numbers.split()]
            user_numbers = set(user_numbers)
            
            if len(user_numbers) == 6 and all(1 <= num <= 49 for num in user_numbers):
                    check = True
                    print('We are Processing your Chance of Winning... \n')
                    time.sleep(1)
                    probability = 1 / combinations(49, 6) * 100
                    probability_transformed = "%.7f" % probability
            
                    print('''The chance of winning, buying the ticket with the numbers {} is: {}%.
In other words, you have a 1 in {:,.0f} chance to win the big price'''.format(
                        user_numbers, probability_transformed, combinations(49, 6)))
                
            else:
                print('Something went wrong. Please choose six numbers between 1 and 49, separated by space')         

The `one_ticket_probability` function serves the purpose of calculating the probability of winning the big prize in a 6/49 lottery based on a single ticket chosen by the user. The function prompts the user to input six unique numbers between 1 and 49, ensuring validity and uniqueness of the selected numbers. 

It iteratively checks the user's input, rejecting duplicates and non-compliant entries, until six valid and distinct numbers are provided. Upon receiving a valid input, the function proceeds to calculate the probability of winning the lottery, expressed as a percentage. This is achieved by dividing 1 (representing the successful outcome of winning) by the total number of possible combinations of six numbers out of 49, and then transforming the result into a formatted percentage value. 

Finally, the function presents the calculated probability along with the user's chosen numbers, offering a clear interpretation of the odds of winning the big prize.

Now lets check the function on different outputs.




In [5]:
# Use the function
one_ticket_probability()

Please enter six numbers between 1 and 49 separated by space: 2 15 22 35 42 47


We are Processing your Chance of Winning... 

The chance of winning, buying the ticket with the numbers {2, 35, 42, 15, 47, 22} is: 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win the big price


The input is valid and the function calculates the probability.

In [6]:
one_ticket_probability()

Please enter six numbers between 1 and 49 separated by space: 1 15 26 35 45 50


Something went wrong. Please choose six numbers between 1 and 49, separated by space


Please enter six numbers between 1 and 49 separated by space: 1 15 26 35 45 49


We are Processing your Chance of Winning... 

The chance of winning, buying the ticket with the numbers {1, 35, 45, 15, 49, 26} is: 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win the big price


First, the input is invalid, since the user chose a number outside the range of 1 - 49. In the second attempt, the user chose valid numbers.

In [7]:
one_ticket_probability()

Please enter six numbers between 1 and 49 separated by space: 2 22 23 23 43 44


Something went wrong. Please choose six numbers between 1 and 49, separated by space


Please enter six numbers between 1 and 49 separated by space: 2 22 23 33 43 44


We are Processing your Chance of Winning... 

The chance of winning, buying the ticket with the numbers {33, 2, 43, 44, 22, 23} is: 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win the big price


First, the user attempted to chose a duplicate number, in the second attempt the numbers are valid and the probability gets calculated

#### Historical Data Check for Canada Lottery

In the preceding section, we developed a function capable of computing the probability of winning the grand prize with a single lottery ticket. However, for the next iteration of our application, users should be able to cross-reference their ticket against historical Canada 6/49 lottery data to find out if they would have ever secured a win to date. In this section, our focus shifts to analyzing the structure of the historical data sourced from the Canada 6/49 lottery, which is obtainable from Kaggle. This [dataset](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) contains the following structure:





In [8]:
# Read in the dataset
c_6_49 = pd.read_csv('649.csv')

# Print the dataset
c_6_49

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45
...,...,...,...,...,...,...,...,...,...,...,...
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8


In [9]:
# Check info of dataset
c_6_49.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


The dataset includes historical records spanning 3,665 drawings, with each row representing data from a singular drawing, spanning from the years 1982 to 2018. Each drawing entry describes the six numbers drawn, each of which is allocated to an individual column, totaling six columns.  There are no `NaN` values and most of the columns include `int` values.

#### Function for Historical Data Check

Now we'll improve the `one_ticket_probability()` function to make it possible for the users to compare their chosen ticket against the historical lottery data in Canada, enabling them to check if their ticket would have previously won. The engineering team has outlined specific details for consideration:

- User Input: Users will input six distinct numbers from 1 to 49 within the app.
- Function Input: The six numbers will be submitted as a Python list, serving as the function's input.
- Function Output: The function should output two key metrics:
    - The frequency of occurrence of the user's combination in the Canadian dataset.
    - The probability of winning the jackpot in the subsequent drawing using that combination.

We're going to begin by extracting all the winning numbers from the lottery data set. The `extract_numbers()` function will go over each row of the dataframe and extract the six winning numbers as a Python `set`. Sets cannot have two items with the same value.


In [10]:
# Create the extract_numbers function
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

In [11]:
# Apply function and save to variable winning_numbers
winning_numbers = c_6_49.apply(extract_numbers, axis=1)

# Print winning_numbers variable, check output
winning_numbers

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3         {3, 9, 10, 43, 13, 20}
4        {34, 5, 14, 47, 21, 31}
                  ...           
3660    {38, 40, 41, 10, 15, 23}
3661    {36, 46, 47, 19, 25, 31}
3662     {32, 34, 6, 22, 24, 31}
3663     {2, 38, 15, 49, 21, 31}
3664    {35, 37, 14, 48, 24, 31}
Length: 3665, dtype: object

Below, we write the `check_historical_occurrence()` function, which is an evolution of the `one_ticket_probability()` function. The `check_historical_occurrence()` function serves to compare the user's chosen combination of six numbers against historical winning numbers from the Canadian lottery dataset. Here's how the function works:

- Initialization: The function initializes two variables, check and count, to manage the iteration process and count the occurrences of the user's combination in the historical data, respectively.
- User Input Handling: The function prompts the user to input six numbers between 1 and 49, separated by spaces. It then parses the input, converting it into a set of integers representing the user's chosen numbers.
- Input Validation: The function verifies that the user input contains exactly six unique numbers within the valid range of 1 to 49.
- Historical Data Comparison: The function iterates over the set of winning numbers from the historical dataset. For each winning number set, it checks if the user's combination intersects with the winning number set, indicating a match. If a match is found, the count variable is incremented to tally the occurrence.
- Probability Calculation: After processing the historical data, the function calculates the overall probability of winning the jackpot with the user's chosen combination using the combinations function.
- Output: Finally, the function displays the frequency of occurrence of the user's combination in the historical data and the corresponding probability of winning the jackpot with that combination. The output also includes a message indicating the user's chances of winning the jackpot based on their chosen numbers.

In [12]:
# Create the check_historical_occurence function, set one parameter for comparison
def check_historical_occurence(set_of_winning_numbers):

    check = False
    count = 0
    
    while not check:
            input_numbers = input('Please enter six numbers between 1 and 49 separated by space:')
            user_numbers = [int(i) for i in input_numbers.split()]
            user_numbers = set(user_numbers)
            
            if len(user_numbers) == 6 and all(1 <= num <= 49 for num in user_numbers):
                
                for win_number in set_of_winning_numbers:
                    if len(win_number.intersection(user_numbers)) == len(win_number):
                        count += 1
                    else:
                        count = 0
                    
                check = True
                print('We are Processing your Chance of Winning... \n')
                time.sleep(1)
                probability = 1 / combinations(49, 6) * 100
                probability_transformed = "%.7f" % probability
            
                print('''Out of 3,665 drawings dating from 1982 to 2018 this number occurred {} time/s. \n
The chance of winning, buying the ticket with the numbers {} is: {}%.
In other words, you have a 1 in {:,.0f} chance to win the big price. \n 
'''.format(count, user_numbers, probability_transformed, combinations(49, 6)))
            
            else:
                print('Something went wrong. Please choose six unique numbers between 1 and 49, separated by space')    

Lets test the new function

In [13]:
# We utilize the winning numbers variable created from extract numbers() function as the necessary parameter
check_historical_occurence(winning_numbers) 

Please enter six numbers between 1 and 49 separated by space: 37 35 48 14 24 31


We are Processing your Chance of Winning... 

Out of 3,665 drawings dating from 1982 to 2018 this number occurred 1 time/s. 

The chance of winning, buying the ticket with the numbers {35, 37, 14, 48, 24, 31} is: 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win the big price. 
 



The combination 37 35 48 14 24 31 occured one time in the past. The chances of winning is not influenced by this information. The winning rate is still extremly low. Lets check another combination:

In [14]:
check_historical_occurence(winning_numbers)

Please enter six numbers between 1 and 49 separated by space: 0 35 48 14 24 31


Something went wrong. Please choose six unique numbers between 1 and 49, separated by space


Please enter six numbers between 1 and 49 separated by space: 12 23 26 31 35 43


We are Processing your Chance of Winning... 

Out of 3,665 drawings dating from 1982 to 2018 this number occurred 0 time/s. 

The chance of winning, buying the ticket with the numbers {35, 43, 12, 23, 26, 31} is: 0.0000072%.
In other words, you have a 1 in 13,983,816 chance to win the big price. 
 



First, we got an error message due to the number 0 which is out of range for a 1 to 49 lottery. After the, the combination occured 0 times.

#### Multi-ticket Probability

The `multi_ticket_probability()` function below is designed to calculate the probability of winning the lottery jackpot based on the number of tickets purchased by the user. Here's how the function works:

- Initialization: The function initializes a check variable to manage the loop for validating user input.
- User Input Handling: Within a while loop, the function prompts the user to input the number of lottery tickets they want to purchase. The input is converted to an integer using `int()`.
- Input Validation: The function validates the user input to ensure it is a valid integer. If the input is not a valid integer, a `ValueError` exception is raised, and an error message is displayed.
- Probability Calculation: If the input is valid, the function calculates the probability of winning the jackpot based on the number of tickets purchased. It uses the `combinations()` function to calculate the total number of possible combinations for the lottery.
- Output: The function displays the probability of winning the jackpot for the specified number of tickets purchased by the user. It also provides additional information, such as the adjusted number of combinations and the corresponding chance of winning the jackpot.

In [15]:
# Create the multi_ticket_probability function
def multi_ticket_probability():
    
    check = False
    
    while not check:
        try:
            input_number = int(input('How much lottery tickets you want to play? Please insert a number:'))
            
            check = True
            print('We are Processing your Chance of Winning... \n')
            time.sleep(1)
            
            probability = input_number / combinations(49, 6) * 100
            probability_transformed = "%.7f" % probability
            combinations_adjusted = combinations(49, 6) / input_number
            
            print('''The chance of winning, buying {:,.0f} ticket/s is: {}%.
You have a {:,.0f} in {:,.0f} chance (respectively: 1 in {:,.0f}) to win the big price. '''.format(
    input_number, probability_transformed, input_number, combinations(49, 6), combinations_adjusted))
        
        except ValueError:
            print('The input was not a valid integer.') 

In [16]:
# Use the function for 7 different values
for _ in range(7):
    multi_ticket_probability()
    print('-'*20)

How much lottery tickets you want to play? Please insert a number: 1


We are Processing your Chance of Winning... 

The chance of winning, buying 1 ticket/s is: 0.0000072%.
You have a 1 in 13,983,816 chance (respectively: 1 in 13,983,816) to win the big price. 
--------------------


How much lottery tickets you want to play? Please insert a number: 10


We are Processing your Chance of Winning... 

The chance of winning, buying 10 ticket/s is: 0.0000715%.
You have a 10 in 13,983,816 chance (respectively: 1 in 1,398,382) to win the big price. 
--------------------


How much lottery tickets you want to play? Please insert a number: 100


We are Processing your Chance of Winning... 

The chance of winning, buying 100 ticket/s is: 0.0007151%.
You have a 100 in 13,983,816 chance (respectively: 1 in 139,838) to win the big price. 
--------------------


How much lottery tickets you want to play? Please insert a number: 10000


We are Processing your Chance of Winning... 

The chance of winning, buying 10,000 ticket/s is: 0.0715112%.
You have a 10,000 in 13,983,816 chance (respectively: 1 in 1,398) to win the big price. 
--------------------


How much lottery tickets you want to play? Please insert a number: 100000


We are Processing your Chance of Winning... 

The chance of winning, buying 100,000 ticket/s is: 0.7151124%.
You have a 100,000 in 13,983,816 chance (respectively: 1 in 140) to win the big price. 
--------------------


How much lottery tickets you want to play? Please insert a number: 1000000


We are Processing your Chance of Winning... 

The chance of winning, buying 1,000,000 ticket/s is: 7.1511238%.
You have a 1,000,000 in 13,983,816 chance (respectively: 1 in 14) to win the big price. 
--------------------


How much lottery tickets you want to play? Please insert a number: 13983816


We are Processing your Chance of Winning... 

The chance of winning, buying 13,983,816 ticket/s is: 100.0000000%.
You have a 13,983,816 in 13,983,816 chance (respectively: 1 in 1) to win the big price. 
--------------------


#### Probability of Having Less Winning Numbers

In many 6/49 lottery games, there are smaller prizes for tickets matching two, three, four, or five of the six numbers drawn. Consequently, users of the application may wish to determine the likelihood of having two, three, four, or five winning numbers on their tickets. For the initial version of the application, it's essential to provide users with these probabilities.

When developing the function to calculate these probabilities, we need to consider the following details:

- User Inputs: Users will input six distinct numbers from 1 to 49, representing their ticket, along with an integer between 2 and 5 indicating the expected number of winning numbers.
- Function Output: The function should print information about the probability of having a specific number of winning numbers.

To streamline the calculation process, we inform the engineering team that the specific combination on the ticket is irrelevant, and we only require the integer representing the expected number of winning numbers. As a result, we'll create a function named `probability_less_6()` to handle these calculations.

The `probability_less_6()` function calculates the probability of a player's ticket matching exactly the specified number of winning numbers. For instance, if a player wants to determine the probability of having five winning numbers, the function will provide the probability of precisely five winning numbers, excluding scenarios with fewer or more winning numbers.

In [17]:
# Create the probability_less_6 function
def probability_less_6():
    check = False
    
    while not check:
        try:
            input_number = int(input('Please insert an integer between 2 and 5 (2 and 5 including):'))
            if input_number >= 2 and input_number <= 5:
                check = True
            
                n_combinations_ticket = combinations(6, input_number)
                n_combinations_remaining = combinations(43, 6 - input_number)
                total_success_outcomes = n_combinations_ticket * n_combinations_remaining 
            
                p_winning = (total_success_outcomes / combinations(49, 6)) * 100
                combinations_adjusted = combinations(49, 6) / total_success_outcomes
            
                print('''Your chances of having {} winning numbers with this ticket are: {:,.4f}%.
You have a 1 in {:,.0f} chance to win. '''.format(input_number, p_winning, combinations_adjusted))
            else:
                print('The integer is not in the specified range')
        except ValueError:
            print('The input was not a valid integer.') 

The function works as following:

- Initialization: The function initializes a check variable to manage the loop for validating user input.
- User Input Handling: Within a `while` loop, the function prompts the user to input an integer between 2 and 5. The input is converted to an integer using `int()`.
- Input Validation: The function validates the user input to ensure it is a valid integer between 2 and 5. If the input is not a valid integer or falls outside the specified range, a `ValueError` exception is raised, and an error message is displayed.
- Probability Calculation: If the input is valid, the function calculates the probability of having less than 6 winning numbers on a lottery ticket. It uses the `combinations()` function to calculate the number of combinations for selecting the winning numbers on the ticket and for the remaining numbers not selected. It then calculates the total number of successful outcomes (tickets with less than 6 winning numbers) and the probability of winning based on these outcomes.
- Output: The function displays the probability of having less than 6 winning numbers on a ticket, along with the corresponding chance of winning. It provides the user with information about their chances of winning based on the inputted number of winning numbers.

In [18]:
# Lets check for the winning numbers
for _ in range(4):
    probability_less_6()
    print('-'*20)

Please insert an integer between 2 and 5 (2 and 5 including): 2


Your chances of having 2 winning numbers with this ticket are: 13.2378%.
You have a 1 in 8 chance to win. 
--------------------


Please insert an integer between 2 and 5 (2 and 5 including): 3


Your chances of having 3 winning numbers with this ticket are: 1.7650%.
You have a 1 in 57 chance to win. 
--------------------


Please insert an integer between 2 and 5 (2 and 5 including): 4


Your chances of having 4 winning numbers with this ticket are: 0.0969%.
You have a 1 in 1,032 chance to win. 
--------------------


Please insert an integer between 2 and 5 (2 and 5 including): 5


Your chances of having 5 winning numbers with this ticket are: 0.0018%.
You have a 1 in 54,201 chance to win. 
--------------------


#### Conclusion and Next steps

In conclusion, we have successfully developed the first version of our lottery probability app, which provides users with valuable insights into their chances of winning the big prize and compares their ticket against historical lottery data. Through the implementation of four main functions:

1. `one_ticket_probability()`
2. `check_historical_occurrence()`
3. `multi_ticket_probability()`
4. `probability_less_6()`

the users can analyze various scenarios and make informed decisions when participating in the lottery. 

Moving forward, we have identified several potential features for a second version of the app, including enhancing the clarity of outputs with fun analogies, combining probability and historical occurrence information, and expanding the functionality to calculate probabilities for having at least two, three, four, or five winning numbers. These enhancements aim to further enhance the user experience and provide comprehensive insights into lottery probabilities, empowering users to make informed decisions when playing the lottery.


