# Mobile App for Lottery Addiction

## 1) Introduction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

The scenario we're following throughout this project is fictional — the main purpose is to practice applying the concepts we learned in a setting that simulates a real-world scenario.


## 2 ) Creating the functions

We will start by creating some functions that will be important for the rest of the project:

In [1]:
# Import needed packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# Define a function to calculate the factorial of a number n
def factorial(n):
    result = 1
    for x in range(n, 0, -1):
        result *= x
    return result

# Define a function to calculate the number of combinations in groups of k from a population n, without replacement
def combinations(n, k):
    result = int(factorial(n) / (factorial(n-k) * factorial(k)))
    return result

### 2.1) Probability of winning the big prize

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

In [3]:
# Define a function that takes a Python list with 6 numbers as input, and returns the probability of winning the big prize
def one_ticket_probability(list):
    probability = 1 / combinations(49, 6) * 100
    return "The probability of winning the big prize with these numbers is " + "{:.6f}".format(probability) + "% - one in " + str(combinations(49, 6))

# Test the function
test_list = [1,2,3,4,5,6]
one_ticket_probability(test_list)

'The probability of winning the big prize with these numbers is 0.000007% - one in 13983816'

### 2.2) Probability the ticket had already won

Just above, we wrote a function that can tell users what is the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

We will proceed with reading the historical data and a first inspection:

In [4]:
six49 = pd.read_csv("649.csv", parse_dates=["DRAW DATE"])

display("Shape of the dataset:", six49.shape)
display("Types of data:", six49.dtypes)
display("Null values per column:", six49.isnull().sum())
display(six49.head(3))
display(six49.tail(3))

'Shape of the dataset:'

(3665, 11)

'Types of data:'

PRODUCT                     int64
DRAW NUMBER                 int64
SEQUENCE NUMBER             int64
DRAW DATE          datetime64[ns]
NUMBER DRAWN 1              int64
NUMBER DRAWN 2              int64
NUMBER DRAWN 3              int64
NUMBER DRAWN 4              int64
NUMBER DRAWN 5              int64
NUMBER DRAWN 6              int64
BONUS NUMBER                int64
dtype: object

'Null values per column:'

PRODUCT            0
DRAW NUMBER        0
SEQUENCE NUMBER    0
DRAW DATE          0
NUMBER DRAWN 1     0
NUMBER DRAWN 2     0
NUMBER DRAWN 3     0
NUMBER DRAWN 4     0
NUMBER DRAWN 5     0
NUMBER DRAWN 6     0
BONUS NUMBER       0
dtype: int64

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,1982-06-12,3,11,12,14,41,43,13
1,649,2,0,1982-06-19,8,33,36,37,39,41,9
2,649,3,0,1982-06-26,1,6,23,24,27,39,34


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,2018-06-13,6,22,24,31,32,34,16
3663,649,3590,0,2018-06-16,2,15,21,31,38,49,8
3664,649,3591,0,2018-06-20,14,24,31,35,37,48,17


Then, we will define two functions:

1. A function that gets all the winning numbers from the historical data.
2. A function that tells the user how many times a particular combination of numbers has already won the lottery, and the probability of winning the lottery again with those numbers.

In [5]:
# Define a function that takes as input a row of the lottery dataframe and returns a set containing all the six winning numbers
winning_numbers = []
def extract_numbers(row):
    numbers = set()
    for x in range(4,10):
        numbers.add(row[x])
    winning_numbers.append(numbers)
    return winning_numbers

# Apply the function
six49.apply(extract_numbers, axis=1)

# Display results of the function
display(six49.head())
display("First elements of the winning_numbers list:" ,winning_numbers[:5])

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,1982-06-12,3,11,12,14,41,43,13
1,649,2,0,1982-06-19,8,33,36,37,39,41,9
2,649,3,0,1982-06-26,1,6,23,24,27,39,34
3,649,4,0,1982-07-03,3,9,10,13,20,43,34
4,649,5,0,1982-07-10,5,14,21,31,34,47,45


'First elements of the winning_numbers list:'

[{3, 11, 12, 14, 41, 43},
 {8, 33, 36, 37, 39, 41},
 {1, 6, 23, 24, 27, 39},
 {3, 9, 10, 13, 20, 43},
 {5, 14, 21, 31, 34, 47}]

In [6]:
# Define a function that returns the number of times the combination inputted by the user occurred in the past
def check_historical_occurence(user_list, winning_list):
    win_times = winning_list.count(set(user_list))
    return "The combination " + str(user_list) + " has already won the lottery " + str(win_times) + """ times.
    """ + one_ticket_probability(user_list)

# Testing the function with a combination that occured and another that did not
test_occured = set([3, 11, 12, 14, 41, 43])
test_not_occurred = set([1, 2, 3, 4, 5, 6])

print(check_historical_occurence(test_occured, winning_numbers))
print(check_historical_occurence(test_not_occurred, winning_numbers))

The combination {3, 41, 11, 12, 43, 14} has already won the lottery 1 times.
    The probability of winning the big prize with these numbers is 0.000007% - one in 13983816
The combination {1, 2, 3, 4, 5, 6} has already won the lottery 0 times.
    The probability of winning the big prize with these numbers is 0.000007% - one in 13983816


### 2.3) Probability of winning with more than one ticket

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning, so we are going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

In [7]:
# Define a function that takes a number of tickets as input, and returns the probability of winning the big prize
def multi_ticket_probability(number):
    probability = number / combinations(49, 6) * 100
    output_odds = round(combinations(49, 6) / number)
    return "The probability of winning the big prize with " + str(number) + " tickets is " + "{:.6f}".format(probability) + "% - 1 in " + str(output_odds)

# Test the function
display(multi_ticket_probability(1))
display(multi_ticket_probability(10))
display(multi_ticket_probability(100))
display(multi_ticket_probability(10000))
display(multi_ticket_probability(1000000))
display(multi_ticket_probability(6991908))
display(multi_ticket_probability(13983816))

'The probability of winning the big prize with 1 tickets is 0.000007% - 1 in 13983816'

'The probability of winning the big prize with 10 tickets is 0.000072% - 1 in 1398382'

'The probability of winning the big prize with 100 tickets is 0.000715% - 1 in 139838'

'The probability of winning the big prize with 10000 tickets is 0.071511% - 1 in 1398'

'The probability of winning the big prize with 1000000 tickets is 7.151124% - 1 in 14'

'The probability of winning the big prize with 6991908 tickets is 50.000000% - 1 in 2'

'The probability of winning the big prize with 13983816 tickets is 100.000000% - 1 in 1'

### 2.4) Probability of getting some winning numbers

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In [8]:
# Define a function that takes as input a list of numbers and the number of those to check for probability of being winning numbers
def probability_less_6(user_list, number):
    # Combinations with the numbers that the user chooses
    user_combinations = combinations(6, number)
    
    # Combinations with the remaining numbers (excluding remaining user numbers)
    other_numbers_combinations = combinations(49 - 6, 6 - number)
    
    # Total combinations that make the user get N winning numbers
    winning_combinations = (user_combinations * other_numbers_combinations)
    
    # Total possible combinations
    total_combinations = combinations(49, 6)
    
    # Percentage of getting N winning numbers
    probability = winning_combinations / total_combinations * 100
    
    # Output in odds
    output_odds = round(total_combinations / winning_combinations)
    return "The combination " + str(user_list) + " has a " + "{:.3f}".format(probability) + "% probability of having " + str(number) + " winning numbers - 1 in " + str(output_odds)

# Test the function
test_list = [1, 2, 3, 4, 5, 6]
print(probability_less_6(test_list, 2))
print(probability_less_6(test_list, 3))
print(probability_less_6(test_list, 4))
print(probability_less_6(test_list, 5))

The combination [1, 2, 3, 4, 5, 6] has a 13.238% probability of having 2 winning numbers - 1 in 8
The combination [1, 2, 3, 4, 5, 6] has a 1.765% probability of having 3 winning numbers - 1 in 57
The combination [1, 2, 3, 4, 5, 6] has a 0.097% probability of having 4 winning numbers - 1 in 1032
The combination [1, 2, 3, 4, 5, 6] has a 0.002% probability of having 5 winning numbers - 1 in 54201


## 3) Conclusion

We have created this project as a way to practice probabilities in Python. Several functions have been defined with regard to probabilities of winning lotteries:

1. Probability of winning the big prize
2. Probability the ticket had already won
3. Probability of winning with more than one ticket
4. Probability of getting some winning numbers

The results that we got during the project shows, clearly, that <ins>lotteries are not worth our money</ins> - we can find better things to do with it.