# Mobile App For Lottery Addiction - Helping Lottery Buyers Understanding Their Chances Of Winning It


## Introduction
---

From Dataquest: 

>A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.
For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:
>
>
>   - What is the probability of winning the big prize with a single ticket?
>
>
>   - What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
>
>
>   - What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?
>
>
>The institute also wants us to consider historical data coming from the national [6/49](https://en.wikipedia.org/wiki/Lotto_6/49) lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018 (...).

## The Probability Of Winning The Lottery With One Bet
---

The first task consists in writing an algorithm that calculates the probability of winning the big prize, i.e. being able to guess all the 6 numbers drawn out of the 49 available. 

`one_ticket_probability()` works by first redirecting the bet (output) to an auxiliary function `check_input()`; this function will check whether the bet meets the five conditions established below:  

Conditions for the bet to be considered correctly placed:
1. the bet should be placed as a list object (inside squared brackets).
2. the list must have six values.
3. the values are integers (natural numbers).
4. those numbers must not repeat.
5. every number must be inside the 1 to 49 range.

In case that all conditions are met, the output of `check_input()` will be piped as input to `one_ticket_probability_cal()`, which will calculate and return the probability of winning the lottery. In the case that one or more conditions is broken, `check_input` returns a message detailing which conditions have been broken, asking the user to insert a corrected version of the bet.  

We also write another two helper functions that will be useful for calculating probabilities:
- `factorial(n)` calculates the [factorial](https://en.wikipedia.org/wiki/Factorial) of an integer 'n': 
    - $n! = \prod_{i=1} ^{n} i$


- `combinations(n, k)`  calculates the number of combinations without replacement, given a pool of 'n' numbers and 'k' draws: 
    - $\frac{n!}{k!(n-k)!}$
      

In [1]:
import numpy as np
import pandas as pd
import re


def factorial(n):
    
    factorial_n = n
    
    for i in range(1, n): # This range ends at n-1.
        factorial_n *= i

    return factorial_n


def combinations(n, k):
    return int((1 / factorial(k)) * (factorial(n) / factorial(n - k)))

In [2]:
def one_ticket_probability_calc(bet):
    """Takes in a list of six unique numbers and prints the probability of winning.
    This functions requires numpy.
    """
    
    n = 49
    k = 6
    
    n_choose_k = combinations(n, k)

    one_ticket_prob_perc = (1 / n_choose_k) * 100

    one_ticket_prob_perc = round(one_ticket_prob_perc, 6)

    return one_ticket_prob_perc


def check_input(bet):
    
    # Essential information: the input is a '49 choose 6' type of combination:
    n = 49
    k = 6
    
    # Condition 1 - Is the input a list?
    # Condition 2 - Are the values in the list integers?
    # Condition 3 - Does the input list has the same number of draws?
    # Condition 4 - Are the values in the list inside the 1 to 49 range?
    # Condition 5 - Are all integers unique?
    
    # Generic error statement (always printed).
    common_error_statement = 'Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.' 
        
       
    error_spec = [
        '    - Input is not a list; example of an O.K. list: [5, 10, 15, 20, 25, 30]',
        '    - Value(s) which are not natural numbers (integers) were inserted.',
        f'    - The bet must be composed exactly of 6 natural numbers.', 
        '    - At least one value does not belong to the range between 1 to 49 (both extremes included).',
        '    - At least one number inserted is repeated.'
        ]

    # We must raise an early error if the input is not a list,
    # otherwise we would have another error raised when the 
    # program tries to use list comprehension inside 'conditions'.
    if not isinstance(bet, list):
        print(common_error_statement)
        print('Error specification:')
        print(error_spec[0])
        return

    conditions = [isinstance(bet, list),
                  all([type(el) == int for el in bet]),
                  len(bet)==k,
                  all([el in range(1, 49+1) for el in bet]),
                  len(bet) == len(set(bet))]  
    
    
    # If every input condition is met:
    if all(conditions):
        return one_ticket_probability_calc(bet)

    else:
        print(common_error_statement)
        print('Error specification:')

        if (conditions[2] == False) & (type(bet)==list):
            error_spec[2] += f' {len(bet)} were inserted.'

        for index, cond in enumerate(conditions):
            if cond == False:
                print(error_spec[index])
                

def one_ticket_probability(bet):
    prob = check_input(bet)
    if prob:
         # To print the probability without scientific notation (is converted into a string).
        prob = np.format_float_positional(prob, trim='-')
        return f'The probability of winning the lottery with the inserted bet is {prob}%.'

### Testing the algorithm

Testing `one_ticket_probability()` by inserting six different numbers:

In [3]:
one_ticket_probability([1, 2, 3, 4, 5, 6])

'The probability of winning the lottery with the inserted bet is 0.000007%.'

#### Checking whether `check_input()` helps the user to identify errors in the placed bet. 

First error, inserting an argument in the function that is not a list, two examples:

In [4]:
one_ticket_probability(2324)

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - Input is not a list; example of an O.K. list: [5, 10, 15, 20, 25, 30]


In [5]:
one_ticket_probability('dog')

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - Input is not a list; example of an O.K. list: [5, 10, 15, 20, 25, 30]


Second error, inserting one list with 5 elements, and another with 7.

In [6]:
one_ticket_probability([22, 41, 35, 5, 3])

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - The bet must be composed exactly of 6 natural numbers. 5 were inserted.


In [7]:
one_ticket_probability([1, 2, 3, 4, 5, 6, 7])

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - The bet must be composed exactly of 6 natural numbers. 7 were inserted.


Third error, inserting a number with decimals or any other value that is not an integer within the 1 to 49 range (see the last value in the input list). Two examples:

In [8]:
one_ticket_probability([22, 41, 35, 12, 5, 22.5])

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - Value(s) which are not natural numbers (integers) were inserted.
    - At least one value does not belong to the range between 1 to 49 (both extremes included).


In [9]:
one_ticket_probability([22, 41, 35, 12, 5, 'dog'])

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - Value(s) which are not natural numbers (integers) were inserted.
    - At least one value does not belong to the range between 1 to 49 (both extremes included).


In this last case we make two mistakes on purpose: a repeated number (6), and a string value.

In [10]:
one_ticket_probability([22, 41, 35, 6, 6, 'dog'])

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - Value(s) which are not natural numbers (integers) were inserted.
    - At least one value does not belong to the range between 1 to 49 (both extremes included).
    - At least one number inserted is repeated.


Inserting a list with three possible mistakes.

In [11]:
one_ticket_probability([21, 9, 4, 44, 44, 80, 20, 1])

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - The bet must be composed exactly of 6 natural numbers. 8 were inserted.
    - At least one value does not belong to the range between 1 to 49 (both extremes included).
    - At least one number inserted is repeated.


## Adding Features To The App
---

For the first version of the app, however, users should also be able to know if the bet they inserted was also a winning bet in earlier drawings, which entails knowing the historical data of the Canadian Lottery.

To that effect we'll be creating a Series that stores all the past winning combinations of the Canadian Lottery, by building a function that can be added to the main algorithm. 

### Introducing the Canadian Lottery history data set

In [12]:
lot_hist = pd.read_csv('649.csv')

lot_hist.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


In [13]:
lot_hist.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


### Support Function That Retrieves All Winning Bets 

As we can see, we have a lottery style that also includes a 'Bonus Number', but as one can read in this [Wikipedia article](https://en.wikipedia.org/wiki/Lottery_mathematics#Powerballs_and_bonus_balls), it is not related to the main prize event, therefore we do not consider it. To facilitate the comparison between the full draws and the bets inserted by our users, we produce a Series of set objects.

In [14]:
def extract_winning_bets():
    
    winning_set = []

    for col, series in lot_hist.iloc[:, 4:9+1].iterrows():

        total_draw = []
    
        for nr_drawn in series:
            total_draw.append(nr_drawn)

        winning_set.append(sorted(total_draw))

    winning_set = pd.Series(winning_set, name='all_winning_bets')
    
    winning_set = winning_set.apply(lambda x: set(x))
        
    return winning_set
    

Testing `extract_winning_bets()`: `winning_bets` extracts all the unique winning combinations from `lot_hist`.

In [15]:
winning_bets = extract_winning_bets()

In [16]:
winning_bets

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3         {3, 9, 10, 43, 13, 20}
4        {34, 5, 14, 47, 21, 31}
                  ...           
3660    {38, 40, 41, 10, 15, 23}
3661    {36, 46, 47, 19, 25, 31}
3662     {32, 34, 6, 22, 24, 31}
3663     {2, 38, 15, 49, 21, 31}
3664    {35, 37, 14, 48, 24, 31}
Name: all_winning_bets, Length: 3665, dtype: object

### Reformulating The Main Algorithm So That It Informs The User About The Historical Occurrence Of The Placed Bet

- `check_historical_occurrence()` returns the same output as `one_ticket_probability()` and also counts it's historical occurrence.  

- `extract_winning_bets()` is piped so that the winning bets can be automatically extracted from the data set we're working with. 

In [17]:
def check_historical_occurrence(bet):
    
    # PART ONE: defining input restrictions and calculating probabilities of winning.
    one_ticket_prob = one_ticket_probability(bet)
    
    print(one_ticket_prob)
    
    
    # PART TWO: comparing the actual bet with past bets of the Canadian Lottery history.
    if one_ticket_prob:
        winning_bets = extract_winning_bets()

        # Set logic: `set >= other`, 'Test whether every element in 'other' is in the set'.
        win_count = sum(winning_bets>=set(bet)) 

        print(f"Number of times the user's bet was a winning combination in past events of the Canadian lottery: {win_count}.")

### Testing the algorithm

Three cases:
 - a winning combination is placed.
 - a random combination is placed.
 - an erroneous bet is placed.

In [18]:
check_historical_occurrence([3, 41, 11, 12, 43, 14])

The probability of winning the lottery with the inserted bet is 0.000007%.
Number of times the user's bet was a winning combination in past events of the Canadian lottery: 1.


In [19]:
check_historical_occurrence([23, 41, 3, 35, 12, 8])

The probability of winning the lottery with the inserted bet is 0.000007%.
Number of times the user's bet was a winning combination in past events of the Canadian lottery: 0.


Inserting the bet with a mistake on purpose (a bet with only 5 digits), to check whether `check_historical_occurrence(bet)` also raises errors correctly resorting to the `check_input()` support function. 

In [20]:
check_historical_occurrence([23, 41, 3, 35, 12]) 

Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49. Fill the bet again please.
Error specification:
    - The bet must be composed exactly of 6 natural numbers. 5 were inserted.
None


### Another Feature: Inform The User About The Probability Of Winning A Lottery Event Given $n$ Random Bets

`multi_ticket_probability()` is a function that helps the lottery player understand what are the realistic chances of winning the lottery given a certain number of bets for one lottery event.

`check_input_2()` insures that a number within the boundaries of possibilities, i.e. a number of bets that goes from 1 up to the number of bets required to win the lottery event once. 

In [21]:
def check_input_2(n):
    
    maximum_number_bets = combinations(49, 6)
    
    bet_range = range(1, maximum_number_bets+1)
    
    conditions = [
        isinstance(n, int),
        n in bet_range
        ] 
    
    if all(conditions):
        return n
    else:
        print(f'Error: the inserted number should be a natural number between 1 and {maximum_number_bets} - the total number of possible combinations. Insert another number please.\n')


def multi_ticket_probability(number_of_bets):
    """Takes the number of bets the user wants to play and returns the probability of
    winning a '49 choose 6' type of lottery game.
    """
    
    n = check_input_2(number_of_bets)
    
    if n:
        # Probability of winning.
        prob_win = n / combinations(49, 6)

        # Rounded and percentage converted.
        prob_win_perc = round(prob_win*100, 10)

        # To print the probability without scientific notation.
        prob_win_perc = np.format_float_positional(prob_win_perc, trim='-')

        return f'The probability of winning with the lottery with {number_of_bets} bet(s) is:\n- {prob_win_perc}%.'


### Testing the algorithm.

In [22]:
list_test_1 = [1, 10, 40, 100, 10000, 1000000, 6991908, 13983816]

test_1 = [multi_ticket_probability(i) for i in list_test_1]

In [23]:
print(*test_1, sep='\n\n')

The probability of winning with the lottery with 1 bet(s) is:
- 0.0000071511%.

The probability of winning with the lottery with 10 bet(s) is:
- 0.0000715112%.

The probability of winning with the lottery with 40 bet(s) is:
- 0.000286045%.

The probability of winning with the lottery with 100 bet(s) is:
- 0.0007151124%.

The probability of winning with the lottery with 10000 bet(s) is:
- 0.0715112384%.

The probability of winning with the lottery with 1000000 bet(s) is:
- 7.151123842%.

The probability of winning with the lottery with 6991908 bet(s) is:
- 50%.

The probability of winning with the lottery with 13983816 bet(s) is:
- 100%.


Checking mistakes and error message.

In [24]:
list_test_2 = ['dog', [10], -1]

test_2 = [multi_ticket_probability(i) for i in list_test_2]

Error: the inserted number should be a natural number between 1 and 13983816 - the total number of possible combinations. Insert another number please.

Error: the inserted number should be a natural number between 1 and 13983816 - the total number of possible combinations. Insert another number please.

Error: the inserted number should be a natural number between 1 and 13983816 - the total number of possible combinations. Insert another number please.



## Chances Of Partially Match The Winning Combination
---

Next we produce a function - `probability_less_6()`, that allows the user to know what is the probability of getting $i$ numbers correct out of the 6 withdrawn.

To automate `probability_less_6()` we followed this progression:



- Probability of getting 5 out 6 numbers right: 

\begin{equation}
\frac{{6 \choose 5} * {43 \choose 1}}{{49 \choose 6}}
\end{equation}

- Probability of getting 4 out 6 numbers right: 

\begin{equation}
\frac{{6 \choose 4} * {43 \choose 2}}{{49 \choose 6}}
\end{equation}


- Probability of getting 3 out 6 numbers right: 

\begin{equation}
\frac{{6 \choose 3} * {43 \choose 3}}{{49 \choose 6}}
\end{equation}


- Probability of getting 2 out 6 numbers right: 

\begin{equation}
\frac{{6 \choose 2} * {43 \choose 4}}{{49 \choose 6}}
\end{equation}


- Probability of getting 1 out 6 numbers right: 

\begin{equation}
\frac{{6 \choose 1} * {43 \choose 5}}{{49 \choose 6}}
\end{equation}


So, we can generalize: 

\begin{equation}
\text{Probability of matching i numbers} = \frac{{k \choose \text{i}} * {n-k \choose k-i}}{{n \choose k}}
\end{equation}


In [25]:
def probability_less_6(i):
    
    if (i in range(1, 5+1)) & (isinstance(i, int)):
        dividend = combinations(6, i) * combinations(43, 6 - i)

        divisor = combinations(49, 6) 

        probability = dividend / divisor

        percentage = round(probability*100, 4)

        return f'The probability of getting {i} number(s) out of 6, in a single lottery event with a total of 49 numbers, is {percentage}%.'
    
    else:
        return 'Error: The number (integer) inserted must be between 1 and 5 (included). Try again please.'

### Testing the algorithm

- the first entry on the list below with a mistake on purpose: $i = 0$.

In [26]:
test_a = [probability_less_6(i) for i in range(0, 5+1)]

print(*test_a, sep='\n\n')

Error: The number (integer) inserted must be between 1 and 5 (included). Try again please.

The probability of getting 1 number(s) out of 6, in a single lottery event with a total of 49 numbers, is 41.3019%.

The probability of getting 2 number(s) out of 6, in a single lottery event with a total of 49 numbers, is 13.2378%.

The probability of getting 3 number(s) out of 6, in a single lottery event with a total of 49 numbers, is 1.765%.

The probability of getting 4 number(s) out of 6, in a single lottery event with a total of 49 numbers, is 0.0969%.

The probability of getting 5 number(s) out of 6, in a single lottery event with a total of 49 numbers, is 0.0018%.


## Conclusion
---

In this project we were able to build a series of algorithms that can be implemented into an app, which allow the user to retrieve answers to the most frequently asked questions one could have regarding the chance of winning the lottery. These algorithms not only check whether the values placed were correctly specified but try to identify possible mistakes. Along th same line, the information retrieved by the algorithms is conveyed in a clear manner, allowing the user to relate the input/bet to its associated probability.


\[End of Project\]

\***