# Mobile App for Lottery Addiction

---

## 1. Introduction

In this project, we are part of a team that is developing a mobile application, which **calculates the probability of winning the 6/49 lottery** to deter addicts from gambling. For the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49), six numbers are drawn without replacement from a set of 49 numbers that range from 1 to 49. Specifically, we will build functions to determine:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play multiple different tickets?
- What is the probability of having exactly two, three, four or five winning numbers on a single ticket?
- What is the probability of having at least two, three, four or five winning numbers on a single ticket?

We are also provided with a supplementary set of [historical data](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) from the national 6/49 lottery game in Canada. 

---

## 2. Open and Read the Data

Firstly, we will explore the historical lottery data from Canada to find potential use cases.

In [1]:
import pandas as pd
lottery_canada = pd.read_csv('649.csv')

# Print the number of rows and columns
print(f'The dataset has {lottery_canada.shape[0]} rows and {lottery_canada.shape[1]} columns.')

The dataset has 3665 rows and 11 columns.


In [2]:
# Print the first few rows
lottery_canada.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [3]:
# Print the last few rows
lottery_canada.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


- For each lottery draw, the six numbers drawn are saved in the following columns:
    - `NUMBER DRAWN 1`
    - `NUMBER DRAWN 2`
    - `NUMBER DRAWN 3`
    - `NUMBER DRAWN 4`
    - `NUMBER DRAWN 5`
    - `NUMBER DRAWN 6`
- The data set consists of 3665 lottery draws, dating from 1982 to 2018.

In addition to computing the theoretical probabilities of winning the lottery, we can also determine whether a user would have won the lottery using this historical data.

---

## 3. Compute Probabilities

Next, we will define two core functions that will be used frequently:

- `factorial()`: a function that calculates factorials and
- `combinations()`: a function that calculates combinations.

In [4]:
# Function to compute factorial of n
def factorial(n):
    product = 1
    while n > 0:
        product *= n
        n -= 1
    return product

# Function to compute combinations of selecting k objects from n objects
def combinations(n, k):
    return (factorial(n) / factorial(n - k) / factorial(k))

**a. What is the probability of winning the big prize with a single ticket?**

A player wins the big prize if the six numbers on the ticket are identical to all six numbers drawn. 

We will write the `one_ticket_probability` function to calculate the probability of winning the big prize for any given ticket.

In [5]:
# Function to calculate theoretical probability of winning big prize
def one_ticket_probability():

    # Total number of possible outcomes 
    n_outcomes = combinations(49, 6)

    # Probability for one ticket
    prob_big_prize = 1 / n_outcomes
    
    # Print the probability as a percentage
    per_big_prize = prob_big_prize * 100
    
    print(f'Your chances to win the big prize are {per_big_prize:.7f}%.')
    print(f'This means you have a 1 in {1 / prob_big_prize:,.0f} chance to win.')
    
# Test function
one_ticket_probability()

Your chances to win the big prize are 0.0000072%.
This means you have a 1 in 13,983,816 chance to win.


The app should also enable users to input their selected combination of six numbers and compare their ticket against historical lottery data. We will compute and dsplay:
- the number of times which the selected combination occurred in the historical data set and
- the probability of winning the big prize in the next drawing with that combination.

In [6]:
# Function that return a set of winning numbers from a lottery draw
def extract_numbers(row):
    combination = set(row[4:10])
    return combination

# Function that extracts all sets of winning numbers from data
winning_numbers = lottery_canada.apply(extract_numbers, axis = 1)

# Function to calculate historical probability of winning big prize
def check_historical_occurence(user_inputs):
    
    # Convert input numbers to set
    user_numbers = set(user_inputs)
    historical_occurence = 0
    
    # Count number of times that input numbers won the lottery
    for i in range(3665):
        if user_numbers == winning_numbers[i]:
            historical_occurence += 1
    print(f'Your numbers have won the big prize {historical_occurence} time(s) in the history of Canada lottery.')

    # Compute probability of winning from historical data
    n_outcomes = combinations(49, 6)
    prob_big_prize = 1 / n_outcomes
    per_big_prize = prob_big_prize * 100
    
    print(f'Your chances to win the big prize in the next draw are {per_big_prize:.7f}%.')      
    print(f'This means you have a 1 in {1 / prob_big_prize:,.0f} chance to win.')

# Test function
test_inputs = [48, 35, 37, 24, 14, 31]
check_historical_occurence(test_inputs)

Your numbers have won the big prize 1 time(s) in the history of Canada lottery.
Your chances to win the big prize in the next draw are 0.0000072%.
This means you have a 1 in 13,983,816 chance to win.


**b. What is the probability of winning the big prize if we play multiple different tickets?**

Gambling addicts may also play multiple tickets on a single lottery draw, with the perception that this will significantly increase their chances of winning. 

To help them better estimate their chances of winning, let's write a function that shows the probability of winning based on:
- user input number of different tickets to play (minimum of 1, maximum of 13,983,816),
- without the specific combinations on each ticket.

In [7]:
# Function to compute winning probability with x multiple tickets
def multi_ticket_probability(x):
    
    # Total number of possible outcomes
    n_outcomes = combinations(49, 6)
    
    # Calculate the probability for the number of tickets inputted.
    prob_big_prize = x / n_outcomes

    # Print the probability as a percentage
    per_big_prize = prob_big_prize * 100
    
    print(f'Your chances to win the big prize with {x} different ticket(s) are {per_big_prize:.7f}%.')
    print(f'This means you have a 1 in {1 / prob_big_prize:,.0f} chance to win.')
    print('-------------------------------------------------------------------------------------')

# Test function
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for test_input in test_inputs:
    multi_ticket_probability(test_input)

Your chances to win the big prize with 1 different ticket(s) are 0.0000072%.
This means you have a 1 in 13,983,816 chance to win.
-------------------------------------------------------------------------------------
Your chances to win the big prize with 10 different ticket(s) are 0.0000715%.
This means you have a 1 in 1,398,382 chance to win.
-------------------------------------------------------------------------------------
Your chances to win the big prize with 100 different ticket(s) are 0.0007151%.
This means you have a 1 in 139,838 chance to win.
-------------------------------------------------------------------------------------
Your chances to win the big prize with 10000 different ticket(s) are 0.0715112%.
This means you have a 1 in 1,398 chance to win.
-------------------------------------------------------------------------------------
Your chances to win the big prize with 1000000 different ticket(s) are 7.1511238%.
This means you have a 1 in 14 chance to win.
----------

**c. What is the probability of having exactly two, three, four or five winning numbers on a single ticket?**

In most 6/49 lotteries, there are smaller prizes if a player's ticket matches two, three, four, or five of the six numbers drawn. Hence, we will help to calculate the probabilities of two, three, four, or five winning numbers on a single ticket.

In [8]:
# Function to return probability of y winning numbers
def probability_less_6(y):
    
    # Number of different sets of y numbers 
    n_user_combinations = combinations(6, y)
    
    # Number of different sets of remaining numbers 
    non_matching_combinations = combinations(49 - 6, 6 - y)

    # Total number of successful outcomes
    successful_outcomes = n_user_combinations * non_matching_combinations
    
    # Total number of possible outcomes
    total_outcomes = combinations(49, 6)
    
    # Probability of y winning numbers
    prob_win = successful_outcomes / total_outcomes
    per_win = prob_win * 100
    
    print(f'The probability of having {y} winning numbers is {per_win:.7f}%.')
    print(f'This means you have a 1 in {1 / prob_win:,.0f} chance to win.')
    print('-----------------------------------------------------------')
    
# Test function
for i in range(2, 6):
    probability_less_6(i)

The probability of having 2 winning numbers is 13.2378029%.
This means you have a 1 in 8 chance to win.
-----------------------------------------------------------
The probability of having 3 winning numbers is 1.7650404%.
This means you have a 1 in 57 chance to win.
-----------------------------------------------------------
The probability of having 4 winning numbers is 0.0968620%.
This means you have a 1 in 1,032 chance to win.
-----------------------------------------------------------
The probability of having 5 winning numbers is 0.0018450%.
This means you have a 1 in 54,201 chance to win.
-----------------------------------------------------------


**d. What is the probability of having at least two, three, four or five winning numbers on a single ticket?**

Lastly, we will build on the `probability_less_6()` function to determine probabilities of having at least two, three, four or five winning numbers.

In [9]:
def probability_less_6(y):
    
    # Number of different sets of y numbers 
    n_user_combinations = combinations(6, y)
    
    # Number of different sets of remaining numbers 
    non_matching_combinations = combinations(49 - 6, 6 - y)

    # Total number of successful outcomes
    successful_outcomes = n_user_combinations * non_matching_combinations
    
    # Total number of possible outcomes
    total_outcomes = combinations(49, 6)
    
    # Probability of y winning numbers
    prob_win = successful_outcomes / total_outcomes
    per_win = prob_win * 100
    return prob_win

def probability_at_least(y):
    prob_win = 0
    
    for i in range(y, 1, -1):
        prob_win += probability_less_6(i)
    per_win = prob_win * 100

    print(f'The probability of having at least {y} winning numbers is {per_win:.7f}%.')
    print(f'This means you have a 1 in {1 / prob_win:,.0f} chance to win.')
    print('-----------------------------------------------------------')
    
# Test function
for i in range(2, 6):
    probability_at_least(i)

The probability of having at least 2 winning numbers is 13.2378029%.
This means you have a 1 in 8 chance to win.
-----------------------------------------------------------
The probability of having at least 3 winning numbers is 15.0028433%.
This means you have a 1 in 7 chance to win.
-----------------------------------------------------------
The probability of having at least 4 winning numbers is 15.0997053%.
This means you have a 1 in 7 chance to win.
-----------------------------------------------------------
The probability of having at least 5 winning numbers is 15.1015502%.
This means you have a 1 in 7 chance to win.
-----------------------------------------------------------


---

## 4. Conclusion

In summary, we managed to write four main functions for the app:

- `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket.
- `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set.
- `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816.
- `probability_less_6()` — calculates the probability of having exactly two, three, four or five winning numbers.
- `probability_at_least()` - calculates the probability of having at least two, three, four or five winning numbers.