# Mobile App for Lottery Addiction

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. They need help designing the logical core of the app.

The first version of the app will focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49). Features they would like to include will answer some of the following questions:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if X amount of tickets are played?
- What is the probability of having at least X amount of winning numbers on a single ticket?

Furthermore, there is [historical data](https://www.kaggle.com/datascienceai/lottery-dataset) that will be considered, covering 3,665 drawings from 1982 to 2018.

## Core Functions

A core function of this app will be calculating probabilities given a certain input. Two functions that will be used a lot are:
- A function for calculating factorials
- A function for calculating combinations

Calculating the factorial of a number (n), n! can be done as follows:

$$ n! = n \cdot{} (n-1) \cdot{} (n-2) \cdot{} (...) \cdot{} 2 \cdot{} 1 $$

As the 6/49 Lottery draws numbers without replacing them, the cobination of a number can be calculated as follows, taking *k* objects from a group of *n* objects:

$$ _{n}C_{k} = \Big(\frac{n}{k}\Big) = \frac{n!}{k!(n - k)!} $$

Let's see these in python:

#### Factorial

In [1]:
def factorial(n: int) -> int:
    #Recursively calculate the factorial of an integer number
    if n > 0:
        return n * factorial(n - 1)
    else:
        return 1

#### Combinations

In [2]:
def combinations(n: int, k: int) -> int:
    #Calculate combination taking k objects from group of n
    denominator = factorial(k) * factorial(n - k)
    return factorial(n) / denominator

## Single Ticket Probability

These core functions will be used as a basis in the next major feature -- calculating the probability of winning the big prize. In the 6/49 Lottery, winning the big prize requires every number on a ticket to match each number drawn, exactly. The numbers can range from 1 - 49 and will be inputted by the user.

In [3]:
def one_ticket_probability(numbers: list):
    #There are 13,983,816 ways to choose 6 numbers from 49 choices,
    #the chance of choosing the correct ones is 1 / 13,983,816.
    c_total = combinations(49, 6)
    prob = 1 / c_total
    percentage = prob * 100
    return print('The numbers you chose: {0}\nhave a {1:.8f} % probability of winning the big prize.'.format(numbers, percentage))


In [4]:
one_ticket_probability([1, 2, 3, 4, 5, 6])

The numbers you chose: [1, 2, 3, 4, 5, 6]
have a 0.00000715 % probability of winning the big prize.


In [5]:
one_ticket_probability([6, 5, 4, 3, 2, 1])

The numbers you chose: [6, 5, 4, 3, 2, 1]
have a 0.00000715 % probability of winning the big prize.


In [6]:
one_ticket_probability([49, 48, 47, 46, 45, 44])

The numbers you chose: [49, 48, 47, 46, 45, 44]
have a 0.00000715 % probability of winning the big prize.


As shown with a few sample inputs, the probability of winning the big prize with one ticket is very small. This probability also does *NOT* change by inputting different numbers.

## Taking A Look at History

With the [data](https://www.kaggle.com/datascienceai/lottery-dataset) available from previous lottery drawings, the user could see if the numbers they choose have been winning numbers in the past.

Let's take a look at the data to start.

In [7]:
import pandas as pd

lottery_data = pd.read_csv('649.csv')
print(lottery_data.shape)

(3665, 11)


In [8]:
lottery_data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [9]:
lottery_data.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The relevant data here shows the numbers drawn for this lottery in the past. This data can be extracted and checked against a users numbers to see if their numbers have been winning numbers in the past.

The first step is to extract the numbers for each drawing into a list.

In [16]:
def extract_numbers(row) -> list:
    #Pick the numbers drawn and return them as a set
    numbers = row[4:10]
    return set(numbers)


In [17]:
winning_numbers = lottery_data.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Next will be comparing numbers a user inputs to all the previous winning numbers.

In [20]:
def check_historical_occurence(user_num: list, series):
    #compare user numbers against pandas series of historical values
    user_num_set = set(user_num)
    matches = user_num_set == series
    n_occurence = matches.sum()
    
    if n_occurence == 0:
        print('The numbers you entered, {} have never previously won. This does not mean your chances of winning are better, however.'.format(user_num))
        
    else:
        print('The numbers you entered, {} have occured {} times. However, this does not increase your chances of winning.'.format(user_num, n_occurence))
        
    print('\nYour chance to win the big prize in the next drawing using the combination {} is 0.0000072%. In other words, you have a 1 in 13,983,816 chance to win.'.format(user_num))
    
    return

In [21]:
check_historical_occurence([8, 33, 37, 36, 39, 41], winning_numbers)

The numbers you entered, [8, 33, 37, 36, 39, 41] have occured 1 times. However, this does not increase your chances of winning.

Your chance to win the big prize in the next drawing using the combination [8, 33, 37, 36, 39, 41] is 0.0000072%. In other words, you have a 1 in 13,983,816 chance to win.


In [22]:
check_historical_occurence([1,2,3,4,5,6], winning_numbers)

The numbers you entered, [1, 2, 3, 4, 5, 6] have never previously won. This does not mean your chances of winning are better, however.

Your chance to win the big prize in the next drawing using the combination [1, 2, 3, 4, 5, 6] is 0.0000072%. In other words, you have a 1 in 13,983,816 chance to win.


As shown above, entering a set of winning numbers (copied from winning_numbers.head) and comparing it to all the winning numbers shows that it does in fact occur once. Alternatively, entering the numbers 1-6 and checking against previous winning numbers shows that it has not occured. Either way the chances of winning do not change.

## Multiple Entries Probability

Lottery addicts are likely to play more than one ticket in a single drawing, assuming this greatly increases their odds of winning. The next task is to estimate their odds of winning based on the number of different tickets they plan on playing. Let's create a function to calculate this, utilizing the previous core functions.

In [34]:
def n_entries_probability(n: int):
    #There are 13,983,816 ways to choose 6 numbers from 49 choices,
    #the chance of choosing the correct ones is n / 13,983,816.
    
    c_total = combinations(49, 6)
    prob = n / c_total
    percentage = prob * 100
    return print('\nPlaying {} different tickets provides a {:.8f}% probability of winning the big prize.'.format(n, percentage))


In [35]:
entries = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

pcts = [n_entries_probability(x) for x in entries]


Playing 1 different tickets provides a 0.00000715% probability of winning the big prize.

Playing 10 different tickets provides a 0.00007151% probability of winning the big prize.

Playing 100 different tickets provides a 0.00071511% probability of winning the big prize.

Playing 10000 different tickets provides a 0.07151124% probability of winning the big prize.

Playing 1000000 different tickets provides a 7.15112384% probability of winning the big prize.

Playing 6991908 different tickets provides a 50.00000000% probability of winning the big prize.

Playing 13983816 different tickets provides a 100.00000000% probability of winning the big prize.


#### A Guaranteed Win Isn't Profitable

As seen above, unless you are entering every single ticket combination possible your odds of winning do not increase dramatically. Entering 10,000 tickets only gives you slightly over 7% chance of winning. 

As of 2013, the price per entry in this lottery is 3 USD; Requiring someone to spend $41,951,448 to garuntee a win.

In order for this to be a 'profitable' endeavor (winning the minimum jackpot of $5,000,000), you would need to buy tickets (a single entry) at a cost of less than ~25 cents per entry, to account for taxes on winnings.

In [38]:
# 3$ per entry * 13983816 entries
3 * 13983816

41951448

In [40]:
# $5 million winning / (number of tickets * cost/tax factor)
5000000 / (13983816*1.37)

0.2609899212415517

## Partial Winning Numbers

Most lotteries have smaller prizes for players that match some of the winning numbers, but not all. For example, matching two numbers in the 6/49 Lottery provides a free entry to the next draw. These players might be interested in knowing the probability of such an occurance.

Let's create a function that allows users to calculate the probability of matching 2, 3, 4, or 5 numbers.

In [59]:
def partial_probability(n: int):
    #calculate the probability of matching exactly n numbers
    #c_ticket is number of ways 5 numbers can be on 1 ticket
    #c_remain is number of ways to have 6 - n incorrect numbers to fill
    c_ticket = combinations(6, n)
    c_remain = combinations(43, 6 - n)
    n_success = c_ticket * c_remain
    
    probability = n_success / combinations(49, 6)
    percent = probability * 100
    
    print('\nThere is a {:.8f}% chance to match {} numbers'.format(percent, n))
    
    return

In [58]:
[partial_probability(x) for x in [1, 2, 3, 4, 5]]


There is a 41.30194505% chance to match 1 numbers

There is a 13.23780290% chance to match 2 numbers

There is a 1.76504039% chance to match 3 numbers

There is a 0.09686197% chance to match 4 numbers

There is a 0.00184499% chance to match 5 numbers


[None, None, None, None, None]

As shown above, the chance for matching fewer numbers out of all the numbers increases. i.e. the more numbers you want to match, the lower chance there is to have them all actually match. This is intuitive.

## v2.0

There are rudimentary functions built for the first version of this app, all specific to the 6/49 lottery. While this is a good start for the app there are many directions it could go. The basic usability can be tweaked to fit parameters of other famous lotteries, a function could be created to find the probability of having at *least* n winning numbers, or even adding in 'similar chance' facts to connect gambling probabilities with the real world.