# Project: Mobile App for Lottery Addiction
---

This is theoretical project where we are working for a medical insitute taht aims to prevent and treat gambling addictions and wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. In this project we will be working on creating the 'logical core' of the app and calculating probabilities.

For the apps first version we will be focusing on the [6/48 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and building functions that enable users answer questions like - What's the probability of...

- Winning the big prize with a single ticket?
- Winning the big prize if we play 40 different tickets (or any other number)?
- Having at least five (or four, or three, or two) winning numbers on a single ticket?

To answer these questions we will be conducting theroetical and empirical probablility calculations. Empirically, we will be doing this using historical data fro the national 6/49 lottery of Canada. The data set can be found [here](https://www.kaggle.com/datasets/datascienceai/lottery-dataset).

The data has been click from above source and is a file called `649.csv`:

---

|    |   PRODUCT |   DRAW NUMBER |   SEQUENCE NUMBER | DRAW DATE   |   NUMBER DRAWN 1 |   NUMBER DRAWN 2 |   NUMBER DRAWN 3 |   NUMBER DRAWN 4 |   NUMBER DRAWN 5 |   NUMBER DRAWN 6 |   BONUS NUMBER |
|---:|----------:|--------------:|------------------:|:------------|-----------------:|-----------------:|-----------------:|-----------------:|-----------------:|-----------------:|---------------:|
|  0 |       649 |             1 |                 0 | 6/12/1982   |                3 |               11 |               12 |               14 |               41 |               43 |             13 |
|  1 |       649 |             2 |                 0 | 6/19/1982   |                8 |               33 |               36 |               37 |               39 |               41 |              9 |
|  2 |       649 |             3 |                 0 | 6/26/1982   |                1 |                6 |               23 |               24 |               27 |               39 |             34 |

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- `NUMBER DRAWN 1`
- `NUMBER DRAWN 2`
- `NUMBER DRAWN 3`
- `NUMBER DRAWN 4`
- `NUMBER DRAWN 5`
- `NUMBER DRAWN 6`


---

In [1]:
# Importing libraries used throughout this lession
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.options.display.max_columns = 80  # Avoid having displayed truncated output
import pyperclip
import seaborn as sns

# import altair as alt  # for interactive plots
# from IPython.display import Markdown # For displaying dataframes in markdown
# for some web srapping 
# import requests  # for API requests
# from bs4 import BeautifulSoup  # instead of using the above
# import json  # for json conversion
# from scipy.stats import percentileofscore
# import datetime as dt
# import re # for regular expressions
# import altair as alt # for interactive plots
# from IPython.display import HTML  # for display HTML in jupyterlab
# from IPython.display import display # for display more than one pandas df at once
# importing requests: a library for making HTTP requests to APIs.

## 1. Exploring the Data

In [2]:
# Reading 649.csv as pandas datafranme (df)
lotto_df = pd.read_csv('data/649.csv', low_memory=0)

# Copying first 3 rows to clipboard for markdown display above
markdown_output = lotto_df.head(3).to_markdown()
pyperclip.copy(markdown_output)

In [3]:
lotto_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


In [4]:
lotto_df.shape

(3665, 11)

## 2. Core Functions
---

This section is for creating to important functions that are used throughout out this project. Factorial and number of combinations outcomes (independent replacement events).

In [5]:
def factorial(n):
    '''
        Calculate the factorial of a non-negative integer.

        Parameters:
        n (int): int for factorialising

        Returns:
            factorial (int): the factorial of n
    '''
    # Check if the input is negative
    if n < 0:
        # If the input is negative, raise a ValueError with a helpful error message
        raise ValueError("Factorial is not defined for negative values")
    # Check if the input is 0
    elif n == 0:
        # If the input is 0, return 1, since 0! is defined as 1
        return 1
    else:
        # If the input is a positive integer, use recursion to calculate the factorial
        # Multiply the input by the factorial of the input minus 1, and return the result
        return n * factorial(n-1)

In [6]:
def combinations(n, k):
    '''
        Calculate the number of combinations of k items from a group of n items.

        Parameters:
        n (int): the size of the group
        k (int): the number of items to choose from the group

        Returns:
        num_combinations (int): the number of combinations
    '''
    # calculate the factorials of n, k, and n-k
    n_factorial = factorial(n)
    k_factorial = factorial(k)
    n_minus_k_factorial = factorial(n - k)
    
    # calculate the number of combinations
    num_combinations = n_factorial // (k_factorial * n_minus_k_factorial)
    
    return num_combinations

## 3. One Ticket Probability
---

We are tasked with creating a function that can calculate the probability of winning the big prize for any given ticket. In a 6/49 lottery game, six numbers are drawn from a set of 49 numbers ranging from 1 to 49, and a player wins the big prize if all six numbers on their ticket match all six numbers drawn.

The engineering team has provided us with the following details to keep in mind when writing this function:

1. The user will input six different numbers from 1 to 49 within the app.
2. These six numbers will be given to our function as a Python list input.
3. The probability value should be printed in a way that's easily understandable for people without any probability training.

To achieve this, we have written the `one_ticket_probability()` function which takes a list of six unique numbers as input and calculates the probability of winning the big prize for one ticket.

The function calculates the total number of possible outcomes by using the `combinations()` function that we have previously written. It then determines the number of successful outcomes, which is always 1 for a single ticket. The probability of winning is calculated by dividing the number of successful outcomes by the total number of possible outcomes.

Finally, the function prints the probability value in a friendly way that is easy to understand for users, using the `str.format()` method to make the printed message more personalized with respect to the user's input.

By following these steps, we have successfully created a function that can calculate the probability of winning the big prize for any given ticket and prints it in a user-friendly way.

In [7]:
def one_ticket_probability(numbers):
    '''
    Calculate the probability of winning the big prize in the 6/49 lottery game,
    given a set of six numbers.
    
    Parameters:
    numbers (list): a list of six unique numbers from 1 to 49
    
    Returns:
    probability (float): probability of winning the big prize, rounded to 7 decimal places
    '''
    # calculate the total number of possible outcomes for a six-number lottery ticket
    n_combinations = combinations(49, 6)
    
    # since the user is only entering one combination, the number of successful outcomes is 1
    n_success = 1
    
    # calculate the probability of winning the big prize
    probability = n_success / n_combinations
    
    # print the probability in a friendly way
    message = "Your chances of winning the big prize with the numbers {} are {:.7f}%."
    print(message.format(numbers, probability * 100))
    
    return probability

In [8]:
test_input_1 = [2, 43, 22, 23, 11, 5]
one_ticket_probability(test_input_1)

Your chances of winning the big prize with the numbers [2, 43, 22, 23, 11, 5] are 0.0000072%.


7.151123842018516e-08

In [9]:
test_input_2 = [9, 26, 41, 7, 15, 6]
one_ticket_probability(test_input_2)

Your chances of winning the big prize with the numbers [9, 26, 41, 7, 15, 6] are 0.0000072%.


7.151123842018516e-08

## 4. Function for Historical Data Check
---

From the dataframe `lotto_df` the engineering team told us that we need to be aware of the following details:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
    - the number of times the combination selected occurred in the Canada data set; and
    - the probability of winning the big prize in the next drawing with that combination.

In [10]:
def extract_numbers(row: pd.Series):
    """
        Extract the numbers from a pandas Series row that contain the string 'NUMBER DRAWN'
        
        Args:
            row: a pandas Series row
            
        Returns:
            A set of numbers extracted from the row that contains the string 'NUMBER DRAWN'
    """
    
    # Filter the row to include only columns with names containing 'NUMBER DRAWN'
    filtered_row = row.filter(regex='NUMBER DRAWN')
    
    # Convert the filtered row to a set and return it
    return set(filtered_row)

In [11]:
winning_numbers = lotto_df.apply(extract_numbers,  axis=1)
winning_numbers.head(5)

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [12]:
def check_historical_occurence(user_numbers: list[int], historical_numbers: pd.Series):
    """
    Check the historical occurrence of a given combination of numbers.

    Args:
        user_numbers: A list of integers representing the user's chosen numbers.
        historical_numbers: A pandas Series containing historical winning numbers.

    Returns:
        None. Prints a message to the console with the number of times the
        combination has occurred and the user's chances of winning with that
        combination.
    """

    # Convert user_numbers to a set for efficient membership testing
    user_numbers_set = set(user_numbers)

    # Check how many times the user's combination has occurred in historical numbers
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()

    # Print the appropriate message based on whether the combination has occurred
    if n_occurrences == 0:
        print(f"The combination {user_numbers} has never occurred.")
        print("Your chances of winning the big prize with this combination are "
              "1 in 13,983,816.")
    else:
        print(f"The combination {user_numbers} has occurred {n_occurrences} times.")
        print("Your chances of winning the big prize with this combination are "
              "1 in 13,983,816.")


In [13]:
test_input_3 = [33, 36, 37, 39, 8, 41]
check_historical_occurence(test_input_3, winning_numbers)

The combination [33, 36, 37, 39, 8, 41] has occurred 1 times.
Your chances of winning the big prize with this combination are 1 in 13,983,816.


In [14]:
test_input_4 = [3, 2, 44, 22, 1, 44]
check_historical_occurence(test_input_4, winning_numbers)

The combination [3, 2, 44, 22, 1, 44] has never occurred.
Your chances of winning the big prize with this combination are 1 in 13,983,816.


## 5. Multi-ticket Probability
---

We aim to assist lottery players in accurately assessing their chances of winning. To that end, we'll create a function that enables users to calculate their odds of winning based on the number of tickets they plan to play.

After consulting with the engineering team, we've determined that users will input the number of tickets they want to purchase (without specifying the particular combinations). Our function will receive an integer between 1 and 13,983,816, which is the maximum number of different tickets.

To provide users with helpful information, our function will output details on the likelihood of winning the grand prize based on the number of tickets played.

In [15]:
def multi_ticket_probability(num_tickets: int):
    """
        Calculate the probability of winning the big prize in a lottery game 
        given the number of tickets played.

        Args:
            num_tickets (int): The number of different tickets played by the user.

        Returns:
            None. The function displays a message with the calculated probability.
    """
    
    # Calculate the total number of possible outcomes for the lottery
    n_possible_outcomes = combinations(49, 6)
    
    # Calculate the probability of winning the big prize with the given number of tickets
    winning_probs = num_tickets / n_possible_outcomes
    
    # Format the message to be displayed to the user
    message = "Your chances of winning the big prize with {} tickets is {:.7f}%."
    
    # Display the message to the user with the calculated probability
    print(message.format(num_tickets, winning_probs * 100))

In [16]:
test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for i in test:
    multi_ticket_probability(i)

Your chances of winning the big prize with 1 tickets is 0.0000072%.
Your chances of winning the big prize with 10 tickets is 0.0000715%.
Your chances of winning the big prize with 100 tickets is 0.0007151%.
Your chances of winning the big prize with 10000 tickets is 0.0715112%.
Your chances of winning the big prize with 1000000 tickets is 7.1511238%.
Your chances of winning the big prize with 6991908 tickets is 50.0000000%.
Your chances of winning the big prize with 13983816 tickets is 100.0000000%.


## 6. Less Winning Numbers - Function
---

**Function to Calculate Probability of Winning with 2-5 Numbers**

This function allows users to calculate the probability of having two, three, four, or five winning numbers in a 6/49 lottery.

**Input:**

- Six different numbers from 1 to 49.
- An integer between 2 and 5 representing the number of winning numbers expected.

**Output:**

Information about the probability of having the inputted number of winning numbers.

For example, to calculate the probability of having exactly five winning numbers, we can use the following steps:

- From a ticket with six numbers, we can form six five-number combinations.
- There are 44 possible successful outcomes in a lottery drawing for each of the six five-number combinations, but we need to leave out the outcome that matches all six numbers.
- For each of the six five-number combinations, we have 43 possible successful outcomes.
- Multiplying 6 by 43 gives us a total of 258 successful outcomes.
- Since there are 258 successful outcomes and there are 13,983,816 total possible outcomes, the probability of having exactly five winning numbers for a single lottery ticket is 0.00001845.
To calculate the probabilities for two, three, four, or five winning numbers, we can modify the above steps based on the number of expected winning numbers. The specific combination on the ticket is irrelevant, and we only need the integer between 2 and 5 representing the number of winning numbers expected.

In [17]:
from math import comb as combinations

def probability_less_6(n_winning_numbers: int):
    """
        Calculates the probability of having less than 6 winning numbers in a 6/49 lottery, given the number of
        winning numbers expected as an input.

        Args:
        n_winning_numbers (int): An integer between 2 and 5 representing the number of winning numbers expected.

        Returns:
        None: Prints the probability of having the inputted number of winning numbers, as well as the corresponding
        simplified chance of winning as a ratio.

    """

    # Calculate the number of possible combinations of tickets with n_winning_numbers
    n_combinations_ticket = combinations(6, n_winning_numbers)
    
    # Calculate the number of possible combinations of the remaining numbers that are not on the ticket
    n_combinations_remaining = combinations(49-6, 6 - n_winning_numbers)
    
    # Calculate the total number of successful outcomes for the given ticket
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    # Calculate the total number of possible outcomes for any given ticket
    n_combinations_total = combinations(49, 6)
    
    # Calculate the probability of having the inputted number of winning numbers for a single ticket
    probability = successful_outcomes / n_combinations_total
    
    # Convert probability to a percentage
    probability_percentage = probability * 100
    
    # Simplify the ratio of total possible outcomes to successful outcomes
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    
    # Print the results
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))


In [18]:
test = [2, 3, 4, 5]

for i in test:
    probability_less_6(i)
    print('\n')

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances to win.


Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances to win.


Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances to win.


Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances to win.


