# Guided Project 11: Mobile App for Lottery Addiction

In [2]:
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## Introduction

From the tutorial: 'A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.'

In [3]:
def factorial(n):
    """Takes as input a number n and computes the factorial of that number n."""

    factorial_n = n

    # Mind that this range ends in n-1.
    for i in range(1, n):
        factorial_n *= i

    return factorial_n

In [4]:
def combinations(n, k):
    """Takes in two inputs (n and k) and outputs the number of combinations
    when we're taking only k objects from a group of n objects.
    """

    return (1 / factorial(k)) * (factorial(n) / factorial(n - k))

writing a function that calculates the probability of winning the big prize.

In [5]:
def one_ticket_probability(list_=[int, int, int, int, int, int]):
    """Takes in a list of six unique numbers and prints the probability of winning.
    This functions requires numpy.
    """

    # Condition 1 - Is the input a list?
    if (type(list_) is not list):
        return 'Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49.'

    list_lenght = len(list_)

    nr_draws = 6

    # Condition 2 - Does the input list has the same number of draws?
    if list_lenght != nr_draws:
        return 'Error: the bet must be composed by six natural numbers from 1 to 49.'

    
    # Conditions 3, 4 and 5.
    
    repetitions = 0

    outside_range = 0

    not_int = 0

    for el in list_:
        if list_.count(el) > 1:
            repetitions += 1

        if el not in range(1, 50):
            outside_range += 1

        if type(el) is not int:
            not_int += 1

    # Condition 3 - Are the values in the list integers?
    if not_int > 0:
        return 'Error: only natural numbers within the 1 to 49 range are accepeted.'

    # Conditions 4 and 5 - Does the list has repeated values or out of range values?
    if (repetitions != 0) or (outside_range != 0):
        return 'Error: the inserted bet has either repeated number(s) or number(s) outside the 1 to 49 range.'

    # If every restriction is met:
    if (repetitions == 0) and (outside_range == 0):

        nr_draws = len(list_)

        n_choose_k = combinations(49, nr_draws)

        one_ticket_prob_perc = (1 / n_choose_k) * 100

        return 'The probability of winning the lottery with the inserted bet is {}'.format(one_ticket_prob_perc)

The function only computes the desired output if the following conditions are met regarding the function's input:

- 1. the input argument of our function must be a list.
- 2. the list must have six values.
- 3. the values are integers (natural numbers).
- 4. there is no repeated numbers.
- 5. every number is inside the 1 to 49 range.

Moreover, the function will scan in order for the conditions 1 to 3 and then scan simultaneously for conditions 4 and 5. 

If the conditions above are met the function computes the probability of winning a lottery of the style $49 \choose 6$ with the given bet.

Testing `one_ticket_probability()` by inserting six different numbers:

In [6]:
one_ticket_probability([1, 2, 3, 4, 5, 6])

'The probability of winning the lottery with the inserted bet is 7.151123842018516e-06'

First error, inserting an argument in the function that is not a list, two examples:

In [7]:
one_ticket_probability(2324)

'Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49.'

In [8]:
one_ticket_probability('dog')

'Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49.'

Second error, inserting one list with 5 elements, and another with 7.

In [9]:
# 5 values.
one_ticket_probability([22, 41, 35, 5, 3])

'Error: the bet must be composed by six natural numbers from 1 to 49.'

In [10]:
# 7 values.
one_ticket_probability([1, 2, 3, 4, 5, 6, 7])

'Error: the bet must be composed by six natural numbers from 1 to 49.'

Third error, inserting a number with decimals or any other value that is not an integer within th 1 to 49 range (see the last value in the input list); Two examples:

In [11]:
one_ticket_probability([22, 41, 35, 12, 5, 22.5])

'Error: only natural numbers within the 1 to 49 range are accepeted.'

In [12]:
one_ticket_probability([22, 41, 35, 12, 5, 'dog'])

'Error: only natural numbers within the 1 to 49 range are accepeted.'

In this last case we make two mistakes on purpose: a repeated number (6), and a string value. The function is built so that will first assess whether the input list is exclusively constituted by integers. If not an integer is included in the list, it will raise the error printed below. If the string value is replaced by an integer, the function will then return the error concerning the number(s) repeated or outside the range.

In [13]:
one_ticket_probability([22, 41, 35, 6, 6, 'dog'])

'Error: only natural numbers within the 1 to 49 range are accepeted.'

For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In [14]:
# The name of the df is short for 'lottery history'.
lot_hist = pd.read_csv('649.csv')

In [15]:
# First three and last three rows.
lot_hist.iloc[[0, 1, 2, -3, -2, -1], :]

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [16]:
lot_hist.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 300.7+ KB


In [17]:
def extract_numbers(n = int):
    """Takes as input a row of the lottery dataframe and returns
    a set containing all the six winning numbers.
    """
    
        # n should be an integer (whole number).
    if type(n) is not int:
        return 'The input should be an integer (whole number).'
    
    
    # n should be between 0 and 3664.
    if n not in range(0, 3665):
        return 'The input should be a whole number inside the 0 to 3664 range.'
    

    # `{}` gives us an empty dictionary, `set()` gives an empty set.
    winning_set = set()
    
    # columns that represent draws are lot_hist.iloc[:, 4:10] (6 draws)
    for i in range(4, 10, 1):
        draw = lot_hist.iloc[n, i]
        winning_set.add(draw)
    
    return winning_set

Testing `extracting_numbers()` vs slicing the draw columns for row 0:

In [18]:
lot_hist.iloc[0, 4:10]

NUMBER DRAWN 1     3
NUMBER DRAWN 2    11
NUMBER DRAWN 3    12
NUMBER DRAWN 4    14
NUMBER DRAWN 5    41
NUMBER DRAWN 6    43
Name: 0, dtype: object

In [19]:
extract_numbers(0)

{3, 11, 12, 14, 41, 43}

Creating a series with all the winning bets. I will use list comprehension because `extract_numbers()` only accepts values within the `lot_hist` row index: 0 to 3664 included. Using df.apply() implied that `extract_numbers()` accepted as an input a whole row from `lot_hist`.

In [20]:
past_winning_bets = pd.Series([extract_numbers(n) for n in range(0, lot_hist.shape[0])], name='Winning Bets')

Checking new series:

In [21]:
past_winning_bets.iloc[[0, 1, 2, -3, -2, -1]]

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3662     {32, 34, 6, 22, 24, 31}
3663     {2, 38, 15, 49, 21, 31}
3664    {35, 37, 14, 48, 24, 31}
Name: Winning Bets, dtype: object

The next function,`check_historical_occurence`, takes two inputs - the 6 digit bet as before, and the history of past winning combinations, and has two fold purpose: 1. returns the same output of `one_ticket_probability`, i.e. the probability of winning the lottery with the inserted bet; 2. also returns the number of times the inserted bet has been drawn in the Canadian lottery.

Inside the function's logic, the second output is subjugated to the first output, i.e. the output 2. is only returned if all the input restrictions are met for output 1., or in other words, both outputs will only be returned if the input is a combination of six non-repeated natural numbers from 1 to 49 (included). 

In [22]:
def check_historical_occurence(bet=[int]*6, history = past_winning_bets):
    """Takes in two inputs: a Python list containing the user numbers and 
    a pandas Series containing sets with the winning numbers; returns information 
    about the number of times the combination inputted by the user occurred in the past.
    And information about the probability of winning the big prize in the next drawing 
    with that combination.
    """
    
    # Essential information: the input is a '49 choose 6' type of combination:
    n = 49
    k = 6

    # PART ONE: defining input restrictions and calculating probabilities of winning. 
    
    # Condition 1 - Is the input a list?
    if type(bet) is not list:
        return 'Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49.'

    bet_lenght = len(bet)


    # Condition 2 - Does the input list has the same number of draws?
    if bet_lenght != k:
        return 'Error: the bet must be composed by six natural numbers from 1 to 49.'

    
    # Conditions 3, 4 and 5.
    
    repetitions = 0

    outside_range = 0

    not_int = 0

    for el in bet:
        if bet.count(el) > 1:
            repetitions += 1

        if el not in range(1, n+1):
            outside_range += 1

        if type(el) is not int:
            not_int += 1

    # Condition 3 - Are the values in the list integers?
    if not_int > 0:
        return 'Error: only natural numbers within the 1 to 49 range are accepeted.'

    # Conditions 4 and 5 - Does the list has repeated values or out of range values?
    if (repetitions != 0) or (outside_range != 0):
        return 'Error: the inserted bet has either repeated number(s) or number(s) outside the 1 to 49 range.'

    # If every restriction is met:
    if (repetitions == 0) and (outside_range == 0):

        n_choose_k = combinations(n, k)

        one_ticket_prob_perc = round((1 / n_choose_k) * 100, 10)

        # For printing probability without scientific notation.
        one_ticket_prob_perc_1 = np.format_float_positional(one_ticket_prob_perc, trim='-')
    
    
        # PART TWO: comparing actual bet with past bets of the Canadian Lottery history.

        bet_set = set(bet)

        condition = history == bet_set

        # Is the bet placed by the user a winning past in the past?
        past_winning_bet = history[condition]

        # Number of times the user bet was a winning bet in the past.
        past_winning_bet_count = past_winning_bet.size
        
        
        print_list = [f'The probability of winning the lottery with the inserted bet is {one_ticket_prob_perc_1}.']
        
        print_list += [f'Number of times the user bet was a winning bet in the past: {past_winning_bet_count}.']
        
        return  print(*print_list, sep='\n\n')

Testing the function.

In [23]:
# because the second argumant is fixed - `history = past_winning_bets` we can ommit it.
check_historical_occurence([23, 41, 3, 35, 12, 8])

The probability of winning the lottery with the inserted bet is 0.0000071511.

Number of times the user bet was a winning bet in the past: 0.


In [24]:
check_historical_occurence([23, 41, 3, 35, 12]) # Mistake on purpose: 5 digits.

'Error: the bet must be composed by six natural numbers from 1 to 49.'

`multi_ticket_probability` is a function that helps the lottery player understand what are the realistic chances of winning the lottery given a certain number of bets for one lottery event.

In [25]:
def multi_ticket_probability(number_of_bets = int):
    """Takes the number of bets the user wants to play and returns the probability of
    winning a '49 choose 6' type of lottery game.
    """
    
    # Fixed values:
    n = 49
    k = 6
    
    # Number of possible combinations.
    n_choose_k = int(combinations(n, k)) 
    
    # Is `number_of_bets` an integer?
    if type(number_of_bets) != int:
        return 'The inserted number must be natural number (integer)'
    
    # Is `number_of_bets` within the range of possible combinations?
    if number_of_bets not in range(1, n_choose_k+1):
        return 'Error: number of bets must be within the 1 to {} range.'.format(n_choose_k) 
    
    # Probability of winning.
    prob_win = number_of_bets / (n_choose_k)
    
    # Rounded and percentage converted.
    prob_win_perc = round(prob_win*100, 10)

    # For printing probability without scientific notation.
    prob_win_perc = np.format_float_positional(prob_win_perc, trim='-')
    
    return 'The probability of winning with the lottery with {p} bet(s) is {z}%.'.format(p=number_of_bets, z=prob_win_perc)

Testing the function:

In [26]:
list_test_1 = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

tests_1 = [multi_ticket_probability(i) for i in list_test_1]

In [27]:
print(*tests_1, sep='\n\n')

The probability of winning with the lottery with 1 bet(s) is 0.0000071511%.

The probability of winning with the lottery with 10 bet(s) is 0.0000715112%.

The probability of winning with the lottery with 100 bet(s) is 0.0007151124%.

The probability of winning with the lottery with 10000 bet(s) is 0.0715112384%.

The probability of winning with the lottery with 1000000 bet(s) is 7.151123842%.

The probability of winning with the lottery with 6991908 bet(s) is 50%.

The probability of winning with the lottery with 13983816 bet(s) is 100%.


Next we produce a function - 'probability_less_6()', that allows the user to know what is the probability of getting i numbers right out of a '49 choose 6' lottery; e.g. i = 2: what is the chance of getting right 2 numbers out of 6? 'i' can take a value from {2, 3, 4, 5}.

In [28]:
def probability_less_6(i=int):
    """Takes an integer that represents a combination composed of i numbers (2 to 5), 
    and returns its probability of success when a '49 choose 6' type of lottery is drawn,
    e.g. what is the probability of getting right 4 numbers (i = 4) out of 6, in a '49 choose 6' 
    type of lottery single event?
    """

    n = 49

    k = 6

    n_choose_k = combinations(n, k)

    if (i not in range(2, 5+1)) or (type(i) != int):
        return 'The number (integer) inserted must be between 2 and 5 (included).'

    # Total number of combiantions.
    n_combinations_ticket = combinations(k, i)
    n_combinations_remaining = combinations(43, k - i)
    
    # successful_outcomes = combinatons(k, i) * comabinations(43, k - i)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    prob_success_perc = (successful_outcomes / n_choose_k) * 100

    prob_success_perc = round(prob_success_perc, 3)
    
    # For printing probability without scientific notation.
    prob_success_perc = np.format_float_positional(prob_success_perc, trim='-')

    return 'The probability of getting {f1} numbers out of 6, in a single lottery event with a total of 49 numbers, is {f2}%.'.format(f1=i, f2=prob_success_perc)

Testing the function (first entry on the list below with a mistake on purpose: n = 1).

In [29]:
test_b = [probability_less_6(i) for i in range(1, 5+1)]

print(*test_b, sep='\n\n')

The number (integer) inserted must be between 2 and 5 (included).

The probability of getting 2 numbers out of 6, in a single lottery event with a total of 49 numbers, is 13.238%.

The probability of getting 3 numbers out of 6, in a single lottery event with a total of 49 numbers, is 1.765%.

The probability of getting 4 numbers out of 6, in a single lottery event with a total of 49 numbers, is 0.097%.

The probability of getting 5 numbers out of 6, in a single lottery event with a total of 49 numbers, is 0.002%.


## Exploring Other Possible  App Features
---

In [30]:
def check_historical_occurence_v2(bet=[int]*6, history = past_winning_bets):
    """Takes in two inputs: a Python list containing the user numbers and 
    a pandas Series containing sets with the winning numbers; returns information 
    about the number of times the combination inputted by the user occurred in the past.
    And information about the probability of winning the big prize in the next drawing 
    with that combination.
    """
    
    # Essential information: the input is a '49 choose 6' type of combination:
    n = 49
    k = 6

    # PART ONE: defining input restrictions and calculating probabilities of winning. 
    
    # Condition 1 - Is the input a list?
    if type(bet) is not list:
        return 'Error: the function input must be a list composed of 6 natural non-repeated numbers from 1 to 49.'

    bet_lenght = len(bet)


    # Condition 2 - Does the input list has the same number of draws?
    if bet_lenght != k:
        return 'Error: the bet must be composed by six natural numbers from 1 to 49.'

    
    # Conditions 3, 4 and 5.
    
    repetitions = 0

    outside_range = 0

    not_int = 0

    for el in bet:
        if bet.count(el) > 1:
            repetitions += 1

        if el not in range(1, n+1):
            outside_range += 1

        if type(el) is not int:
            not_int += 1

    # Condition 3 - Are the values in the list integers?
    if not_int > 0:
        return 'Error: only natural numbers within the 1 to 49 range are accepeted.'

    # Conditions 4 and 5 - Does the list has repeated values or out of range values?
    if (repetitions != 0) or (outside_range != 0):
        return 'Error: the inserted bet has either repeated number(s) or number(s) outside the 1 to 49 range.'

    # If every restriction is met:
    if (repetitions == 0) and (outside_range == 0):

        n_choose_k = combinations(n, k)

        one_ticket_prob_perc = round((1 / n_choose_k) * 100, 10)

        # For printing probability without scientific notation.
        one_ticket_prob_perc_1 = np.format_float_positional(one_ticket_prob_perc, trim='-')
    
    
        # PART TWO: comparing actual bet with past bets of the Canadian Lottery history.

        bet_set = set(bet)

        condition = history == bet_set

        # Is the bet placed by the user a winning past in the past?
        past_winning_bet = history[condition]

        # Number of times the user bet was a winning bet in the past.
        past_winning_bet_count = past_winning_bet.size
        
        print_list = [f'The probability of winning the lottery with the inserted bet is {one_ticket_prob_perc_1}.']
        
        print_list += [f'Number of times the user bet was a winning bet in the past: {past_winning_bet_count}.']
        
        return  print(*print_list, sep='\n\n')


In [31]:
check_historical_occurence_v2([1, 2, 3, 4, 5, 6])

The probability of winning the lottery with the inserted bet is 0.0000071511.

Number of times the user bet was a winning bet in the past: 0.


In [32]:
b = ['a']

In [33]:
b += ['b']

In [34]:
b

['a', 'b']

In [35]:
'\\n'.join(b)

'a\\nb'

In [36]:
newline = ord('\n')