# A Mobile App for Lottery Addiction

A (fictional) medical instute wants to build an app for calculating the probabilities of winning  the [6/49 Lotto](https://en.wikipedia.org/wiki/Lotto_6/49), a type of lotto in which 6 numbers,ranging from 1 to 6, are drawn from a pool of 49 numbers. The aim of the app is to help people with lottery addiction understand the real probabilities of winning a game. <br>
In order to build the app we'll consider historical data coming from the national 6/49 lottery in Canada, the original data can be found [here on Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset)

The data is real, but the scenario described in this notebook is clearly fictional: we have no evidence an addiction can be cured by statistics (but nobody can stop us from doing our reasearch about it), and we doubt any medical institute would find useful giving a game addict an app about the game he/she is addicted to.

### Helping functions

The following functions are helper functions used to calculate the number of possible combinations of k elements drawn from a group of n elements, without replacement: once a number has been drawn, it can't appear in the combination more than once.
The functions uses the formula:

<br>
\begin{equation}
_nC_k = {n \choose k} =  \frac{n!}{k!(n-k)!}
\end{equation}

In [63]:
def factorial(n):
    total = 1
    for i in range(2, n + 1):
        total *= i
    return total    
    

In [64]:
def combinations(n, k):
    if (n - k) <= 0:
        return None
    return factorial(n)/(factorial(k)*factorial(n - k))
    

The following function calculates the probability that a single ticket of 6 numbers can win the lottery.

In [65]:
def one_ticket_probability(numbers):
        all_combinations = combinations(49, len(numbers))
        ticket_prob_perc = 1/all_combinations * 100
        print("The list of numbers {} has {:f}% of probability to win in the next drawing.".format(str(numbers)[1:-1], ticket_prob_perc))

In [66]:
##checking the probability of a ticket
one_ticket_probability([1, 2, 3, 4, 5, 6])

The list of numbers 1, 2, 3, 4, 5, 6 has 0.000007% of probability to win in the next drawing.


In [67]:
#checking the probability of a wrong ticket
one_ticket_probability([1, 3, 4])

The list of numbers 1, 3, 4 has 0.005428% of probability to win in the next drawing.


### The Canadian 6/49 Historical Data

The dataset I'll use contains 3665 drawings from 1982 to 2018, the number drawn are in the columns:
<ul>
<li>NUMBER DRAWN 1</li>
<li>NUMBER DRAWN 2</li>
<li>NUMBER DRAWN 3</li>
<li>NUMBER DRAWN 4</li>
<li>NUMBER DRAWN 5</li>
<li> NUMBER DRAWN 6</li>
</ul>


In [68]:
import pandas as pd
import numpy as np
import datetime

In [69]:
dateparse = lambda x: datetime.datetime.strptime(x, '%m/%d/%Y')

In [70]:
lottery = pd.read_csv("649.csv", parse_dates=["DRAW DATE"], date_parser = dateparse)

In [71]:
lottery.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,1982-06-12,3,11,12,14,41,43,13
1,649,2,0,1982-06-19,8,33,36,37,39,41,9
2,649,3,0,1982-06-26,1,6,23,24,27,39,34


In [72]:
lottery.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,2018-06-13,6,22,24,31,32,34,16
3663,649,3590,0,2018-06-16,2,15,21,31,38,49,8
3664,649,3591,0,2018-06-20,14,24,31,35,37,48,17


In [73]:
lottery.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   PRODUCT          3665 non-null   int64         
 1   DRAW NUMBER      3665 non-null   int64         
 2   SEQUENCE NUMBER  3665 non-null   int64         
 3   DRAW DATE        3665 non-null   datetime64[ns]
 4   NUMBER DRAWN 1   3665 non-null   int64         
 5   NUMBER DRAWN 2   3665 non-null   int64         
 6   NUMBER DRAWN 3   3665 non-null   int64         
 7   NUMBER DRAWN 4   3665 non-null   int64         
 8   NUMBER DRAWN 5   3665 non-null   int64         
 9   NUMBER DRAWN 6   3665 non-null   int64         
 10  BONUS NUMBER     3665 non-null   int64         
dtypes: datetime64[ns](1), int64(10)
memory usage: 315.1 KB


### Function for Historical Data Check

The following function extracts the six number drawn in a row and tranform them in a set.

In [74]:
def extract_numbers(row):
    row = set(row[4:10].values)
    return row

Below I created a series of all the winning series of six numbers in the dataset.

In [75]:
winning_numbers = lottery.apply(extract_numbers, axis = 1)

In [76]:
winning_numbers.head(3)

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
dtype: object

In [77]:
#this function checks if a ticket is a valid ticket with at least 6 valid numbers
def check_is_a_valid_ticket(ticket):
    if len(ticket) != 6:
        return (False, "A valid ticket has 6 number, your ticket has {}".format(len(ticket)))
    if len(set(ticket))!= 6:
        return (False, "Your ticket contains repeated numbers")
    valid_numbers = np.arange(1, 50)
    for number in ticket:
        if number not in valid_numbers:
            return (False, "Only numbers between 1 and 49 are valid, your ticket contains: {}".format(number))
    return (True, "Your ticket is valid")    
    
    

In [78]:
check_is_a_valid_ticket([3, 41, 11, 12, 43, 61])

(False, 'Only numbers between 1 and 49 are valid, your ticket contains: 61')

In [79]:
### winning_ticket
ticket_1 = [3, 41, 11, 12, 43, 14]
### failing ticket, valid numbers
ticket_2 = [1, 2, 3, 4, 5, 6]
### numbers repeated
ticket_3 = [3, 41, 11, 12, 14, 14]
### ticket too short
ticket_4 = [3, 41, 11, 12]
### ticket too long
ticket_5 = [3, 41, 11, 12, 43, 14, 7]

In [80]:
assert check_is_a_valid_ticket(ticket_1)[0] == True
assert check_is_a_valid_ticket(ticket_2)[0] == True
assert check_is_a_valid_ticket(ticket_3)[0] == False
assert check_is_a_valid_ticket(ticket_4)[0] == False
assert check_is_a_valid_ticket(ticket_5)[0] == False

The following function takes in a Python list representing a ticket ad checks it against the list of winning numbers.

In [81]:
def check_historical_occurence(ticket):
    is_valid = check_is_a_valid_ticket(ticket)
    if is_valid[0]:
        #do something
        print(is_valid[1])
        occurrences = (winning_numbers == set(ticket)).sum()
        print("The number of times the ticket {} won is: {}" .format(str(ticket)[1:-1], occurrences))
        one_ticket_probability(ticket)
    else:
         print(is_valid[1])

In [82]:
check_historical_occurence(ticket_1)

Your ticket is valid
The number of times the ticket 3, 41, 11, 12, 43, 14 won is: 1
The list of numbers 3, 41, 11, 12, 43, 14 has 0.000007% of probability to win in the next drawing.


In [83]:
check_historical_occurence(ticket_2)

Your ticket is valid
The number of times the ticket 1, 2, 3, 4, 5, 6 won is: 0
The list of numbers 1, 2, 3, 4, 5, 6 has 0.000007% of probability to win in the next drawing.


In [84]:
check_historical_occurence(ticket_3)

Your ticket contains repeated numbers


In [85]:
check_historical_occurence(ticket_4)

A valid ticket has 6 number, your ticket has 4


In [86]:
check_historical_occurence(ticket_5)

A valid ticket has 6 number, your ticket has 7


Even if a ticket has won in the past, every new drawing is indipendent from the previous ones, so any combinations of six numbers has always the same (very low) probability of being drawn the next time.

### Multi-ticket probability

Many lottery addicts play more than one ticket at once, I'm going to write a function to calculate the probability of playing more than one ticket, considering every ticket to be differnt from the others.

In [87]:
def multi_ticket_probability(number_of_tickets):
    all_outcomes = combinations(49, 6)
    success_prob_perc = (number_of_tickets / all_outcomes) * 100
    print("The probability of winning with {} tickets is {:f}%".format(
    number_of_tickets, success_prob_perc))

In [88]:
multi_ticket_probability(1)
multi_ticket_probability(10)
multi_ticket_probability(100)
multi_ticket_probability(10000)
multi_ticket_probability(1000000)
multi_ticket_probability(6991908)
multi_ticket_probability(13983816)

The probability of winning with 1 tickets is 0.000007%
The probability of winning with 10 tickets is 0.000072%
The probability of winning with 100 tickets is 0.000715%
The probability of winning with 10000 tickets is 0.071511%
The probability of winning with 1000000 tickets is 7.151124%
The probability of winning with 6991908 tickets is 50.000000%
The probability of winning with 13983816 tickets is 100.000000%


Playing all the possible 13983816 unique combinations of numbers the probability of winning is guaranteed, but clearly not very convenient.

### Less winning numbers

The 6/49 lottery awards the biggest prize to tickets with all the six number drawn, but there are also other lesser prizes for tickets with 5, 4, 3 or 2 numbers. <br>
I'll write a function calculating the probability of winning one the lesser prizes with a six number tickets.

In [89]:
def probability_less_6(z):
    if z <2 or z > 5:
        print("You should enter a number between 2 and 5")
    else:
        total_6_combinations =  combinations(49, 6)
        total_z_combinations = combinations(6, z)
        other_combinations = combinations(43, 6 - z)
        ## other numbers different from our tickets
    
        successful_outcomes = other_combinations * total_z_combinations
        success_prob = (successful_outcomes/ total_6_combinations)*100
        return success_prob

In [90]:
probability_less_6(2)

13.237802900152577

In [91]:
for i in range(2, 6):
    message = "The probability of {} winning numbers out of six is: {:5f}%"
    print(message.format(i, probability_less_6(i)))

The probability of 2 winning numbers out of six is: 13.237803%
The probability of 3 winning numbers out of six is: 1.765040%
The probability of 4 winning numbers out of six is: 0.096862%
The probability of 5 winning numbers out of six is: 0.001845%
