# Mobile App for Lottery Addiction

<p style="text-align:center;">
  <img src="https://images.unsplash.com/photo-1517085908802-f56a43681c18?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=815&q=80" width="700" height="100">
  <br>
  Source: <a href="https://unsplash.com/">Unsplash</a>
</p>

The goal of this project is to develop a mobile app that helps lottery addicts estimate their chances of winning. The app will be created by a team of engineers from a medical institute that focuses on preventing and treating gambling addictions. Our role is to build the app's logical core and calculate probabilities.

For the first version of the app, we will focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and develop functions that allow users to answer critical questions, such as:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five, four, three, or two winning numbers on a single ticket?

We will analyze historical data from the national 6/49 lottery game in Canada to accomplish this task. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) contains information on 3,665 drawings dating from 1982 to 2018. By analyzing this data and developing accurate probability models, we hope to create an app that provides lottery addicts with the information they need to make informed decisions and avoid developing addiction.

## Core Functions

As explained, our goal in this project is to develop a mobile app that can help lottery addicts better estimate their chances of winning. To achieve this, we need to calculate probabilities and combinations repeatedly. Hence, we'll start by writing two essential functions:

- A factorial function that calculates factorials.
- A combination function that calculates combinations.

The formula to calculate factorials is:

$n! = n \times (n-1) \times (n-2) \times \cdots \times 2 \times 1$

For the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49, and the drawing is done without replacement. This means that once a number is drawn, it's not put back in the set.

To find the number of combinations when we're sampling without replacement and taking only k objects from a group of n objects, we can use the formula:

${n C k} = \binom{n}{k} = \frac{n!}{k!(n-k)!}$

In [1]:
# Function to compute factorial
def factorial(n):
    result = 1
    for i in range(1, n+1):
        result *= i
    return result

# Function to find number of combinations
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k)*factorial(n-k)
    return numerator/denominator

## One-ticket Probability

Our next step is to build a function that can calculate the probability of winning the big prize in the 6/49 lottery. To win the big prize, a player's ticket must match all six numbers drawn from a set of 49 numbers ranging from 1 to 49. For instance, if a player's ticket has the numbers {13, 22, 24, 27, 42, 44}, they only win the big prize if the drawn numbers are {13, 22, 24, 27, 42, 44}. If any one number is different, they lose.

For the first version of our app, we want users to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket. A player selects six numbers out of 49 to make a single ticket. To achieve this, we will create a function that takes a Python list of six different numbers between 1 and 49 as input and calculates the probability of winning the big prize.

Before building the function, we have discussed some specific details with the engineering team of the medical institute, which must be considered while writing the function:

- The user inputs six different numbers from 1 to 49 inside the app.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team has requested that the function print the probability value in an easy-to-understand way, suitable for users who have no prior probability training.

In [2]:
# Function to check valid numbers in the list
def check_numbers(lst):
    """
    Check if a list of numbers is not less or longer than 6, contains only numbers from 1 to 49,
    all the numbers are unique, and all the elements of the list are integers.

    If all conditions are met, return the list of numbers.
    If any condition fails, return a list of error messages.

    Parameters:
    lst (list): A list of numbers

    Returns:
    list: The list of numbers if all conditions are met, or a list of error messages otherwise.
    """

    errors = []
    for i in range(len(lst)):
        if isinstance(lst[i], str):
            try:
                lst[i] = int(lst[i])
            except ValueError:
                errors.append("List should only contain integers")
    if len(lst) != 6:
        errors.append("List should contain exactly 6 numbers")
    if not all(isinstance(x, int) for x in lst):
        errors.append("List should only contain integers")
    if not all(1 <= x <= 49 for x in lst):
        errors.append("List should only contain numbers from 1 to 49")
    if len(set(lst)) != len(lst):
        errors.append("List should not contain duplicates")
    if errors:
        return " and ".join(errors)
    else:
        return lst

In [3]:
# Function to compute win probability for a given list of numbers    
def one_ticket_probability(user_numbers):
    # Check if user_numbers is valid
    valid_numbers = check_numbers(user_numbers)
    if isinstance(valid_numbers, str):
        return valid_numbers
    
    total_num_outcomes = combinations(n=49, k=6)
    num_successful_outcomes = 1
    prob_one_ticket = num_successful_outcomes/total_num_outcomes
    prob_percent = prob_one_ticket * 100
    msg = (f'Your chances of winning the big prize with the numbers {user_numbers} are extremely low - only {prob_percent:.6f}%! '
           f'This means that out of every {int(total_num_outcomes)} tickets sold, only 1 will be a winning ticket.')
    return msg

The function `one_ticket_probability()` calculates the probability of winning the lottery, which involves selecting 6 numbers out of 49 possible numbers. There is only one winning combination, and there are 13,983,816 possible outcomes in total. To determine the probability of winning, we divide the number of winning outcomes by the total number of outcomes. In this case, we divide 1 (the number of winning outcomes) by 13,983,816 (the total number of outcomes).

We can now write a function `test_probabilities()` to test `one_ticket_probability()` for different scenarios.

In [4]:
def test_probabilities():
    tests = [[1,2,3,4,5,6],    # correct input
             [1,2,3,4,5],      # less than 6 numbers
             [1,2,3,4,5,1],    # duplicate numbers
             [1,2,3,4,5,100],  # numbers not between 1-49
             [1,'2',3,4,5,6],  # contain string number
             [1,200,3,3,4,5]]  # list with multiple errors

    for test in tests:
        # Convert any string values to integers
        test = [int(val) if isinstance(val, str) else val for val in test]
        print(one_ticket_probability(test))
        print('-'*60,'\n')
        
test_probabilities()

Your chances of winning the big prize with the numbers [1, 2, 3, 4, 5, 6] are extremely low - only 0.000007%! This means that out of every 13983816 tickets sold, only 1 will be a winning ticket.
------------------------------------------------------------ 

List should contain exactly 6 numbers
------------------------------------------------------------ 

List should not contain duplicates
------------------------------------------------------------ 

List should only contain numbers from 1 to 49
------------------------------------------------------------ 

Your chances of winning the big prize with the numbers [1, 2, 3, 4, 5, 6] are extremely low - only 0.000007%! This means that out of every 13983816 tickets sold, only 1 will be a winning ticket.
------------------------------------------------------------ 

List should only contain numbers from 1 to 49 and List should not contain duplicates
------------------------------------------------------------ 



## Historical Data Check for Canada Lottery

We previously created a function that informs users about their chances of winning the big prize with a single ticket. However, for the initial version of the app, users should also be able to check their ticket against the past lottery results in Canada and find out if they would have ever won by now.

Next, we will concentrate on examining the historical data from the Canada 6/49 lottery. The dataset is available for download from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset) and has the following structure:

In [5]:
import pandas as pd

# Read the data
lottery = pd.read_csv('649.csv')

# View number of rows and columns in the dataset
lottery.shape

(3665, 11)

In [6]:
# Display first and last three rows
display(lottery.head(3))
display(lottery.tail(3))

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The dataset consists of 3,665 records, each representing a single drawing of the Canada 6/49 lottery. The data spans from 1982 to 2018. Each record contains information on the six numbers that were drawn, which are listed in six columns:

- `NUMBER DRAWN 1`
- `NUMBER DRAWN 2`
- `NUMBER DRAWN 3`
- `NUMBER DRAWN 4`
- `NUMBER DRAWN 5`
- `NUMBER DRAWN 6`

## Function for Historical Data Check

We will now create a function that allows users to check whether their ticket has ever won in the Canada lottery historical data. The engineering team has advised us to keep the following details in mind:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
    - the number of times the combination selected occurred in the Canada data set; and
    - the probability of winning the big prize in the next drawing with that combination.
    
For the first task, we'll create a function called `extract_numbers()` that takes a row of the lottery dataframe as input and returns a set containing all six winning numbers for that particular row. For example, if the function is called with the first row of the lottery dataframe, it should return the set `{3, 11, 12, 14, 41, 43}` which contains the six winning numbers for that particular drawing.

To extract all the winning numbers from the entire lottery dataframe, we can use the `extract_numbers()` function in combination with the `DataFrame.apply()` method. This will apply the `extract_numbers()` function to each row of the dataframe, and return a dataframe with the winning numbers for each drawing.

In [7]:
def extract_numbers(row):
    draw_numbers = row[4:-1]
    return set(draw_numbers)

We will write a function called `check_historical_occurrence()` for the second task. This function will take two arguments: a list of numbers inputted by the user, and a pandas Series containing sets with the winning numbers. The Series will be obtained using the `extract_numbers()` function.

In [8]:
def check_historical_occurence(user_numbers, historical_numbers):
    # Convert user numbers to set for comparison
    user_numbers_set = set(user_numbers)
    
    # Check if user numbers are identical to any set in historical numbers (returns Series of bool)
    check_occurence = user_numbers_set == historical_numbers
    
    # # Count number of occurrences of identical sets (True==1)
    n_occurence = sum(check_occurence)
    
    # If user did not enter exactly 6 numbers
    if len(user_numbers_set) != 6:
        return one_ticket_probability(user_numbers)
    # If user numbers have never won before
    elif n_occurence == 0:
        return f'Your combination of numbers {user_numbers} is absent in the dataset.'
    # If user numbers have won before
    elif n_occurence == 1:
        return one_ticket_probability(user_numbers)

Let's evaluate the functionality of the `check_historical_occurence()` function by providing a few inputs from the users.

In [9]:
# Extract winner numbers using 'extract_numbers' function
winning_numbers = lottery.apply(extract_numbers, axis=1)

tests = [[3, 41, 11, 12, 43],         # less than 6 numbers
         [3, 41, 11, 12, 43, 14],     # occured combination
         [3, 4, 11, 12, 43, 14]]      # absent combination

for test in tests:
    print(check_historical_occurence(test, winning_numbers))
    print('-'*60,'\n')

List should contain exactly 6 numbers
------------------------------------------------------------ 

Your chances of winning the big prize with the numbers [3, 41, 11, 12, 43, 14] are extremely low - only 0.000007%! This means that out of every 13983816 tickets sold, only 1 will be a winning ticket.
------------------------------------------------------------ 

Your combination of numbers [3, 4, 11, 12, 43, 14] is absent in the dataset.
------------------------------------------------------------ 



## Multi-ticket Probability

So far, we wrote two functions:

- `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket
- `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We've talked with the engineering team and they gave us the following information:

- The user will input the number of *different* tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

We'll write a function named `multi_ticket_probability()` that prints the probability of winning the big prize depending on the number of different tickets played.

In [32]:
def multi_ticket_probability(n_tickets):
    """
    Calculates the probability of winning the big prize in the Canadian lottery, depending on the number of tickets played.
    
    Args:
    - n_tickets: int. Number of tickets played by the user.
    
    Returns:
    - A string that informs the user about their chances of winning, based on the number of tickets played.
    
    Raises:
    - ValueError: If n_tickets is not an integer or is less than 1 or greater than the total number of possible combinations.
    """

    n_combinations = combinations(49, 6)
    probability = n_tickets / n_combinations
    probability_percent = probability * 100

    if not 1 <= n_tickets <= n_combinations:
        return f'You should input a number between 1 and {int(n_combinations)} inclusive.'

    if probability < 0.00001:
        if n_tickets == 1 or n_tickets % 10 == 1:
            msg = (f"Your chances to win the big prize with {n_tickets:,} ticket are only {probability_percent:.7f}%. "
                   f"You'll need to buy {n_combinations/n_tickets:.0f} tickets to have a 1 in 2 chance of winning.")
        else:
            msg = (f"Your chances to win the big prize with {n_tickets:,} tickets are only {probability_percent:.7f}%. "
                   f"You'll need to buy {n_combinations/n_tickets:.0f} tickets to have a 1 in 2 chance of winning.")
    elif probability < 0.01:
        if n_tickets == 1 or n_tickets % 10 == 1:
            msg = (f"Your chances to win the big prize with {n_tickets:,} ticket are {probability_percent:.5f}%. "
                   f"On average, you'll need to buy {n_combinations/n_tickets:.0f} tickets to win the big prize once.")
        else:
            msg = (f"Your chances to win the big prize with {n_tickets:,} tickets are {probability_percent:.5f}%. "
                   f"On average, you'll need to buy {n_combinations/n_tickets:.0f} tickets to win the big prize once.")
    elif probability < 1:
        if n_tickets == 1 or n_tickets % 10 == 1:
            msg = (f"Your chances to win the big prize with {n_tickets:,} ticket are {probability_percent:.2f}%. "
                   f"On average, you'll need to buy {n_combinations/n_tickets:.0f} tickets to win the big prize.")
        else:
            msg = (f"Your chances to win the big prize with {n_tickets:,} tickets are {probability_percent:.2f}%. "
                   f"On average, you'll need to buy {n_combinations/n_tickets:.0f} tickets to win the big prize.")
    else:
        if n_tickets == 1:
            msg = ("You'll definitely win the big prize with one ticket."
                   " But keep in mind that there are other people who may have the same numbers.")
        else:
            msg = (f"Your chances to win the big prize playing {n_tickets:,} tickets are 100%. "
                   "You'll definitely win the big prize. "
                   "But keep in mind that there are other people who may have the same numbers.")

    return msg

Let's put the `multi_ticket_probability()` function to the test with the following set of inputs:: `[1, 10, 100, 10000, 1000000, 6991908, 13983816]`.

We will see how the function performs in calculating the probability of winning the big prize based on the number of tickets played.

In [38]:
tests = [0, 
         1,
         10000, 
         139839,        # 1/100 of the maximum number of different tickets
         1398382,       # 1/10 of the maximum number of different tickets
         1666667,       # a number of tickets that costs as the big prize
         6991908,       # 1/2 of the maximum number of different tickets
         13983816]      # the maximum number of different tickets

for test in tests:
    print(multi_ticket_probability(test))
    print('-'*60,'\n')

You should input a number between 1 and 13983816 inclusive.
------------------------------------------------------------ 

Your chances to win the big prize with 1 ticket are only 0.0000072%. You'll need to buy 13983816 tickets to have a 1 in 2 chance of winning.
------------------------------------------------------------ 

Your chances to win the big prize with 10,000 tickets are 0.07151%. On average, you'll need to buy 1398 tickets to win the big prize once.
------------------------------------------------------------ 

Your chances to win the big prize with 139,839 tickets are 1.00%. On average, you'll need to buy 100 tickets to win the big prize.
------------------------------------------------------------ 

Your chances to win the big prize with 1,398,382 tickets are 10.00%. On average, you'll need to buy 10 tickets to win the big prize.
------------------------------------------------------------ 

Your chances to win the big prize with 1,666,667 tickets are 11.92%. On average, 

## Less Winning Numbers — Function

