# Treating Gambling Addictions

Our goal of this project is to participate in the creation of an app that helps lottery addicts. We will assist them by developing the logical core of the app which presents the probabilities of winning to users.

Why? We want gambling addicts to realize the harsh truth of improbability in lotteries. We aim to present it to them in a personalized manner for the purpose of reflection. In other words, they will present the truth to themselves via our app.

Other engineers will design & develop the app around our logical core. Our core scripts need to answer the following questions:

- What is the probability of winning a big prize with a single ticket?

- What is the probability of winning said prize with 40 different tickets?

- What is the probability of having at least 2, 3, 4, or 5 winning numbers on a single ticket?

We will use the historical data coming from the national 6/49 lottery in Canada. The data set, found at https://www.kaggle.com/datascienceai/lottery-dataset, has data from 1982 to 2018.

# Formulae for Probabilities

The lottery does not return drawn numbers back into its set. This means the drawing is done without replacement. We will use factorials and our formula is below.

In [1]:
#The first argument is a simple factorial operation. Example: 3! = 6

def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

#The second argument is nCk = (n!) / (k! * (n - k)!) . This finds the number of combinations of a drawing without replacement.

def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator / denominator

# What is the probability of winning a big prize with a single ticket in the 6/49 Lottery?

The 6/49 lottery takes 6 numbers drawn from a set of 49 numbers ranging 1 to 49. If a player draws the 6 numbers on their ticket which matches the 6 numbers drawn, the player wins a big prize. The numbers must be in exact order.

Example: [13, 22, 24, 27, 42, 44] and [13, 22, 24, 27, 42, 44].

For the first version of our app, players need to calculate the probability of winning with their chosen set of numbers. Below is a function that calculates the probability of winning with any given ticket.

- In the app, the user inputs 6 numbers from 1-49.
- The 6 numbers will be a Python list, becoming an input to our function.
- The engineering team needs to print the probability in a user-friendly way. Pretend people are not probabilistic and explain the result simply.

In [2]:
def one_ticket_probability(six_numbers):
    total_combinations = combinations(49, len(six_numbers))
    success_chance = 1 / total_combinations
    percentage = success_chance * 100
    
    print("Your chances to win the prize with the numbers {} are {:.7f}%. Another way of saying it: you have a 1 in {:,} chance to win."" ".format(six_numbers,percentage, int(total_combinations)))

In [3]:
one_ticket_probability([1, 2, 3, 4, 5, 6])

Your chances to win the prize with the numbers [1, 2, 3, 4, 5, 6] are 0.0000072%. Another way of saying it: you have a 1 in 13,983,816 chance to win. 


Because any given set of numbers by the user will be one of 13,983,816 possibilities, we can hardcode a 1 in success_chance. We could also place a 6 for the h in combinations, but len gets us the same result while referencing the input values.

# Loading Our Data & Necessary Steps

Below we import our data from the aforementioned lottery. We also write necessary functions to extract sets from our lottery data.

In [4]:
import pandas as pd
#ld stands for lottery data
ld = pd.read_csv("649.csv")
ld.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [5]:
#This extracts the full winning combination of numbers.
def extract_numbers(row):
    winner = set((row["NUMBER DRAWN 1"],
                  row["NUMBER DRAWN 2"],
                  row["NUMBER DRAWN 3"],
                  row["NUMBER DRAWN 4"],
                  row["NUMBER DRAWN 5"],
                  row["NUMBER DRAWN 6"]))
    return winner

In [6]:
winners = ld.apply(extract_numbers, axis=1)
winners.head(4)

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
dtype: object

In [7]:
#This checks how frequently winning numbers have occurred.
def check_historical_occurence(un, winners):
    un = set(un)
    check = winners == un
    
    if sum(check) == 0:
        print("The combination {} has never won before. Your chances of winning with combination {} are 0.0000072%, or 1 in 13,983,816".format(un, un))
        
    else:
        print("The combination {} has won {} time(s) before. Your chances of winning with {} in the next drawing are 0.0000072%, or 1 in 13,983,816.".format(un, sum(check), un))
    return sum(check)

check_historical_occurence([3, 41, 11, 12, 43, 14],winners)

The combination {3, 41, 11, 12, 43, 14} has won 1 time(s) before. Your chances of winning with {3, 41, 11, 12, 43, 14} in the next drawing are 0.0000072%, or 1 in 13,983,816.


1

Regardless of how many times a combination of numbers has won in the past, they aren't excluded from victory in the future. The possibility of victory for each and any combination remains the same, so we don't need to change any calculated results from earlier.

This is the fact our app will try to communicate to addicts. We reflect their number back to them, instead of simply giving them the odds, to make the truth more personal.

# What is the probability of winning said prize with 40 different tickets?

Addicts & gamblers will often buy more than one ticket to improve their chances. Below is code to reveal how much some extra tickets will help.

In [8]:
def multi_ticket_probability(n_tickets):
    possible_outcomes = combinations(49, 6)
    chance = n_tickets / possible_outcomes
    probability = chance * 100
    print("If you purchase {} ticket(s), you have a {:.7f}% chance of winning the big prize. Or, 1 in {}".format(n_tickets, probability, round(possible_outcomes/n_tickets)))

multi_ticket_probability(600000)

If you purchase 600000 ticket(s), you have a 4.2906743% chance of winning the big prize. Or, 1 in 23


There are a few reasons we made the code this way, some mechanical and some practical.

- This code is scaleable for singular and plural grammar.
- This code accurately calculates the chances of winning for any given number of tickets. Addicts will often buy many tickets, so entering how many tickets they can or want to buy will reveal the mathematical futility in their decisions.

# What is the probability of having at least 2, 3, 4, or 5 winning numbers on a single ticket?

There are sometimes small prizes for having a few matching numbers. Users might be interested in knowing the probability of winning small prizes with 2, 3, 4, or 5 numbers.

For the inquiry below, users need to input:

- 6 different numbers between 1-49.
- An integer between 2 & 5 representing the number of winning numbers.

Our code accomplishes this below, and we've answered some questions our engineers might have about the math.

- Why do we have 43 as the number of outcomes remaining?
We have 43 because, for example, we can use nCk with 6 (numbers drawn) and 5 (the number of values we want to match). It produces 44 outcomes for the lottery, but one of them is victorious. We are looking for EXACTLY 5 winning numbers, not AT LEAST 5 winning numbers. Since the 6th value could make the combination 5+ winning numbers, we subtract 1 combination from the set. 44-1 = 43.

- Why is there 6 - less_6?
We draw 6 numbers, but in this case, care about 2 to 5 of them for minor prizes. We subtract the number in that range from 6 to account for the selection from the drawn numbers. They are part of the 5 numbers we scrutinize, so we keep the 43 in "outcomes_remaining."

In [9]:
def probability_less_6(less_6):
    outcomes_ticket = combinations(6, less_6) #This informs us how many X-number combinations there are out of 6 numbers.
    outcomes_remaining = combinations(43, 6 - less_6)
    successful_outcomes = outcomes_ticket * outcomes_remaining

    n_combinations_total = combinations(49, 6)
    probability = successful_outcomes / n_combinations_total
    probability_percent = probability * 100
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    print("Your ticket has a {} chance of winning with {} numbers. In other words, you have a 1 in {} chance of winning.".format(probability_percent, less_6, combinations_simplified))
    

In [10]:
for test in [2, 3, 4, 5]:
    probability_less_6(test)
    print("--------------------") #this segments the answers nicely.

Your ticket has a 13.237802900152577 chance of winning with 2 numbers. In other words, you have a 1 in 8 chance of winning.
--------------------
Your ticket has a 1.7650403866870101 chance of winning with 3 numbers. In other words, you have a 1 in 57 chance of winning.
--------------------
Your ticket has a 0.0968619724401408 chance of winning with 4 numbers. In other words, you have a 1 in 1032 chance of winning.
--------------------
Your ticket has a 0.0018449899512407771 chance of winning with 5 numbers. In other words, you have a 1 in 54201 chance of winning.
--------------------


# Conclusion

We set out to develop a logical brain for engineers to design an app around, and we're finished with the first iteration! We developed:

- one_ticket_probability()
- check_historical_occurrence()
- multi_ticket_probability()
- probability_less_6()

These should allow users to face their gambling addictions with a basic, fundamental level of truth.

Future features might include:

- Making outputs more fun by adding analogies. Example: "You are 300 times more likely to get salmonella than win the lottery." Note: We don't know if that's true. It's just an example.
- Combining one_ticket_probability() and check_historical_occurrence() to show probability & historical occurrence at the same time.