# An idea: Mobile App for Lottery Addiction

## Introduction

We know that many people start playing the lottery for fun, but for some of them them this activity turns into a habit which, eventually can escalate into an addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft. These situations then can destroy their social relationship and badly affect their own family too.

For this project we imagine that a medical institute want to develop a dedicated mobile app in order to prevent and treat gambling addictions and to help the players better estimate their chance of winning.

For this version we're going to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49); the canadian version of the game in which every week **6** numbers are randomly drawn out of a set of **49**.
The player win the first price if the ticket he bought contains all the six numbers. Among the six numbers, a bonus number is called so that the player that **5** numbers plus this **bonus number** win the second price.

We need to build a series of functions for the app in order to answer questions like:
* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

#### Disclaimer

The scenario we're following throughout this project is purely fictional and it's main purpose is to apply some of the rules and concepts that lay and order some probability calculations. In no case the writer is inviting anyone to play lottery or to spend money for gambling.

## Calculate the probability for one winning ticket

### 1.1

Throughout the project, we'll need to repeatedly calculate probabilities and combinations. As a consequence, we'll start to write two functions we're going to use a lot:

* A function that calculates factorials
* A function that calculates combinations

In lottery game we sample without replacement. It means that from a starting point of **49** numbers, each time we call a new number, the set of outcomes gets smaller (49 than 48 than 47...so on and so forth). The final number represents the **total number of outcomes**.

In [1]:
# function to calculate factorials

def factorial(n):
    
    final_product = 1
    
    for i in range(n, 0, -1):
        final_product *= n
        n -= 1
    
    return final_product

# function to calculate combinations

def combinations(n, k): # we input two values cause for combinations we take "k" values from a group of "n" objects
    
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n - k)
    
    return numerator / denominator

### 1.2

We wrote two fundamental functions for our project; at this point we are able to calculate the probability of winning the big prize (6 correct numbers) by any given ticket.

We need to be aware of the following details when we write the function:

* Inside the app, the user inputs six different numbers from 1 to 49
* Under the hood, the six numbers will come as a list, which will serve as the single input to our function.
* The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.
* The function should print the probability in a way that's easy to understand.

In [2]:
# define function to calculate the probability for a ticket to win the big prize

def one_ticket_probability(a_ticket):
    
    # formatting a_ticket as a string
    
    a_ticket_str = [str(i) for i in sorted(a_ticket)]
    user_ticket = '-'.join(a_ticket_str)
    
    # calculate all the possible combinations
    
    combs = combinations(49, len(a_ticket))
    
    # calculate probability as a percentage
    
    probability = (1 / combs) * 100
    
    # converting into standard notation
    
    probability_std = format(probability, '.6f')
    
    print(f"The user chose the following numbers: {user_ticket}")
    print(f"The probability of winning the big prize is equal to {probability_std} %")

# testing 

one_ticket_probability([33,48,36,19,22,4])    

The user chose the following numbers: 4-19-22-33-36-48
The probability of winning the big prize is equal to 0.000007 %


So we wrote the function that calculates the probability of winning the first price, by buying a random ticket with six numbers.
The number of successfull outcomes is just **1** so, to find the value we need to divide 1 for the number of possible combinations.

As we can see, the possibility to win the lottery, by using a ticket, is extremely low - **0.000007 %** -. 

We could say that is *almost* impossible.

## Compare user's ticket

### 2.1 

For this project we want also that the user is able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

To do this we're going to use a data set that collect historical data from the **Canada 6/49** lottery.
The data set can be downloaded [here](https://www.kaggle.com/datascienceai/lottery-dataset).

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from **1982** to **2018**. For each drawing, we can find the six numbers drawn in the following six columns:

* ```NUMBER DRAWN 1```
* ```NUMBER DRAWN 2```
* ```NUMBER DRAWN 3```
* ```NUMBER DRAWN 4```
* ```NUMBER DRAWN 5```
* ```NUMBER DRAWN 6```

We'll import the data set and briefly explore it.

In [3]:
import pandas as pd

lottery = pd.read_csv('649.csv')

In [4]:
# checking the structure

print(f"The dataset is composed by {lottery.shape[0]} rows and {lottery.shape[1]} columns")

The dataset is composed by 3665 rows and 11 columns


In [5]:
# printing the first 3 rows

lottery.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [6]:
# printing last 3 rows

lottery.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


One important thing to say is that in this dataset the numbers are sorted in an ascending way.

This means that when we see ```NUMBER DRAWN 1``` it doesn't necessary mean that it's the first number to get called but, more probably, the lowest number in the set.

At this point we're going to write a function that will enable users to extract all six winning numbers and return a set for all the draws.

### 2.2

In [37]:
#function for extracting the sets

def extract_numbers(row):
    row = row[4:10]
    row = set(row)
    return row

winners = lottery.apply(extract_numbers, axis=1)

{3, 41, 11, 12, 43, 14}
{33, 36, 37, 39, 8, 41}
{1, 6, 39, 23, 24, 27}
{3, 9, 10, 43, 13, 20}
{34, 5, 14, 47, 21, 31}
{8, 41, 20, 21, 25, 31}
{33, 36, 42, 18, 25, 28}
{7, 40, 16, 17, 48, 31}
{37, 5, 38, 10, 23, 27}
{4, 37, 46, 15, 48, 30}
{33, 38, 7, 9, 42, 21}
{36, 11, 43, 17, 19, 20}
{37, 7, 14, 47, 17, 20}
{35, 44, 25, 28, 29, 30}
{36, 39, 8, 41, 47, 18}
{9, 12, 13, 14, 44, 48}
{4, 40, 43, 44, 14, 18}
{34, 35, 36, 13, 16, 18}
{36, 11, 23, 25, 28, 29}
{37, 7, 45, 18, 23, 25}
{37, 11, 45, 18, 19, 31}
{8, 14, 16, 48, 18, 31}
{4, 11, 45, 23, 24, 25}
{33, 34, 3, 4, 48, 19}
{5, 43, 17, 21, 28, 30}
{36, 6, 38, 46, 17, 24}
{4, 9, 10, 11, 43, 46}
{32, 33, 7, 13, 45, 23}
{35, 37, 11, 18, 22, 28}
{35, 45, 48, 25, 26, 31}
{32, 36, 11, 19, 25, 31}
{34, 37, 39, 8, 41, 47}
{35, 36, 5, 10, 45, 30}
{34, 8, 42, 22, 26, 27}
{33, 3, 5, 38, 6, 39}
{10, 44, 14, 47, 25, 31}
{11, 13, 15, 16, 24, 31}
{11, 12, 44, 19, 25, 29}
{33, 2, 7, 21, 22, 30}
{38, 41, 11, 43, 15, 25}
{2, 10, 11, 43, 49, 31}
{32, 39, 49

We extracted each combination of winning number and fetched them into a panda series named **winners**; at this point we want to write a function that, given six numbers chosen by the user, return how many times that sequence already appeared and will calculate the probability of winning the big price in the next drawing with that combination.

In [70]:
# function for comparison

def check_historical_occurence(user_numbers, lucky_numbers):
    
    user_numbers = set(user_numbers)
    check_repetition = user_numbers == lucky_numbers
    nr_repetitions = check_repetition.sum()
    
    if nr_repetitions == 0:
        print(f"The combination you chose never occured.\n" 
              f"Please, now don't start thinking that it needs to happen and that you'll surely win...\n"
              f"Your chances to win the big prize in the next drawing using the combination {set(user_numbers)} are still 0.000007%.\n"
              f"Or, if you prefer, you have 1 in 13,983,816 chances to win.\nDon't lose your money!\nBuy something for your wife, kids or whoever you want!"
             )
    else:
        print(f"The combination you chose occured {nr_repetitions} times in the past.\n" 
              f"Please, now don't start thinking that you're a lucky guy and that you'll surely win...\n"
              f"Your chances to win the big prize in the next drawing using the combination {set(user_numbers)} are still 0.000007%.\n"
              f"Or, if you prefer, you have 1 in 13,983,816 chances to win.\nDon't lose you money!\nBuy something for your wife, kids or whoever you want!"
             )

In [76]:
# testing 

check_historical_occurence([7,12,19,30,28,49], winners)

The combination you chose never occured.
Please, now don't start thinking that it needs to happen and that you'll surely win...
Your chances to win the big prize in the next drawing using the combination {49, 19, 7, 12, 28, 30} are still 0.000007%.
Or, if you prefer, you have 1 in 13,983,816 chances to win.
Don't lose your money!
Buy something for your wife, kids or whoever you want!


In [77]:
check_historical_occurence([35, 41, 22, 23, 25, 26], winners)

The combination you chose occured 1 times in the past.
Please, now don't start thinking that you're a lucky guy and that you'll surely win...
Your chances to win the big prize in the next drawing using the combination {35, 22, 23, 41, 26, 25} are still 0.000007%.
Or, if you prefer, you have 1 in 13,983,816 chances to win.
Don't lose you money!
Buy something for your wife, kids or whoever you want!


We can see that, no matter how many times the combination got called in the past, the possibility for it to happen are still extremely low and don't change compared to another combination that never occured in the past. This should "clean the path" from the idea that numbers who appeared many times have more probability to get drawn.

## What if I buy more than one ticket?

### 3.1

When we talk about gambling addiction, we're talking about subjects that usually waste a lot of money in gambling. For the lottery we're focusing on those persons who, convinced that they will have much more possibility of winning, buy more than one ticket, thinking their chances will significantly increase.

Our purpose is to help them better estimate their chances of winning and we're going to write a function that will allow the users to estimate the chances of winning for any number of tickets. Also, from the [website](https://www.olg.ca/en/lottery/play-lotto-649-encore/about.html#:~:text=Each%20play%20costs%20%243%20and,a%20guaranteed%20%241%2DMILLION%20PRIZE.), we know that each ticket costs **3 $** and will calculate how much a user would spend.

Our function will be structured like this:
* The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
* Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
* The function should print information about the probability of winning the big prize depending on the number of different tickets played and the total cost the user should face.

In [88]:
def multi_ticket_probability(nr_tickets): # we input the number of tickets
    
    # calculate number of unique combinations
    nr_combs = combinations(49, 6)
    # calculate the probability as a percentage
    success = (nr_tickets / nr_combs) * 100
    # calculate the cost
    ticket_cost = 3
    total = ticket_cost * nr_tickets
    #print message
    if nr_tickets > 1:
        print(f"By buying {nr_tickets} tickets, you have the {format(success, '.6f')}% of chances to win the big prize.\nYou would also spend {total}$")
    else:
        print(f"By buying {nr_tickets} ticket, you have the {format(success, '.6f')}% of chances to win the big prize.\nYou would also spend {total}$")       

Written the function, we're going to run some tests.

In [108]:
tests = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test in tests:
    multi_ticket_probability(test)
    print("___________________________________\n") # delimiter

By buying 1 ticket, you have the 0.000007% of chances to win the big prize.
You would also spend 3$
___________________________________

By buying 10 tickets, you have the 0.000072% of chances to win the big prize.
You would also spend 30$
___________________________________

By buying 100 tickets, you have the 0.000715% of chances to win the big prize.
You would also spend 300$
___________________________________

By buying 10000 tickets, you have the 0.071511% of chances to win the big prize.
You would also spend 30000$
___________________________________

By buying 1000000 tickets, you have the 7.151124% of chances to win the big prize.
You would also spend 3000000$
___________________________________

By buying 6991908 tickets, you have the 50.000000% of chances to win the big prize.
You would also spend 20975724$
___________________________________

By buying 13983816 tickets, you have the 100.000000% of chances to win the big prize.
You would also spend 41951448$
________________

Let's analyze a bit what we see here.

We now that the big price is equal to **5,000,000 $**; that's the first point we'll need to be aware of.
By testing the function we can see up to **10000** tickets, the probability to win the first price are extremely low. We could say with good aproximation that, no matter we buy a single ticket or we buy **10000** of them, the event is almost impossible to happen.

Probabilities increase if we buy **1,000,000** tickets or more but, at that point, we would spend more than what we could win.
At this point, do we still believe that gambling on lottery has any statistical sense?

After knowing the costs you should afford in order to have any significant possibility of winning, would you still believe it's a fair game to play?

### 3.2

In most lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In [132]:
def probability_less_6(winning_numbers):
    # calculating the number of possible combination of winning numbers
    ticket_combs = combinations(6, winning_numbers)
    # calculating possible outcomes
    remaining_combs = combinations(43, 6 - winning_numbers)
    
    # calculating total number of successfull outcomes
    total_succ_outputs = ticket_combs * remaining_combs
    
    # calculating total combinations
    total_combs = combinations(49, 6)
    
    probability = (total_succ_outputs / total_combs) 
    
    print(f"The chances of having {winning_numbers} winning numbers is equal to {format(probability, '.6f')}\nor, if you prefer, there's a {format(probability * 100, '.6f')}% of success.")   

The chances of having 5 winning numbers is equal to 0.000018
or, if you prefer, there's a 0.001845% of success.


In [133]:
for i in [2, 3, 4, 5]:
    probability_less_6(i)
    print("___________________________________\n") # delimiter

The chances of having 2 winning numbers is equal to 0.132378
or, if you prefer, there's a 13.237803% of success.
___________________________________

The chances of having 3 winning numbers is equal to 0.017650
or, if you prefer, there's a 1.765040% of success.
___________________________________

The chances of having 4 winning numbers is equal to 0.000969
or, if you prefer, there's a 0.096862% of success.
___________________________________

The chances of having 5 winning numbers is equal to 0.000018
or, if you prefer, there's a 0.001845% of success.
___________________________________



# Conclusion

As we see lottery is an extremely unfair game and everyone should at least understand the basic probability the lies under the hood. 

We showed how, up until **10000** tickets the probability of success remains extremely low and the differences very neglectable.

Probabilities increase starting from **1,000,000** tickets bought but at that point, if every ticket costs **3 $** and the big price is equal to **5,000,000**, the award would be lower than the money we spent making the "investment" completely useless and senseless; we should, in fact, spend more money buying tickets than what we could obtain buy an uncertain winning.

That said, gambling should be done, if it needs to be done, just for fun without expecting anything specific from the act itself for there are better and faster ways to make money.