# Mobile App for Lottery Addiction

#### Many people play the lottery for fun, but for some this becomes an unhealthy habit which can potentially lead to desperate criminal activity to feed the addiction. A medical institute has hired my team to build a dedicated mobile app that can help lottery addicts better estimate their chances of winning, seeing a more realistic chance than the optimistic one they commonly percieve.

### My job will be to create the logical core for the app, calculating the probabilities. I'll focus on the 6/49 lottery and make the logic for users to answer questions like:
#### - What is the probability of winning the big prize with a single ticket?
#### - What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
#### - What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

#### The dataset I will work with is historical data from the 6/49 national lottery game in Canada, it has over 3665 drawings dating from 1982 to 2018. 

##### (This project addresses a fictional scenario, the purpose is to utilise and practice my statistical knowledge)



### Core Functions

#### Make the statistical functions I'll need through the project


In [17]:
import math

def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

# Number of combinations when sampling without replacement and taking only k objects from a group of n objects.
def combinations(n,k):
    permutation = factorial(n)/factorial(n-k)
    
    return permutation/factorial(k)


#### I need to create a function that calculates the probability of winning for any 6 number input, which then prints out a clear and easy to understand message for the user. I have been told to be aware of the following details when creating the function:
####  - Inside the app, the user inputs six different numbers from 1 to 49.
####  - Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
####  - The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [39]:
def one_ticket_probability(numbers):
    
    # Error handling for input
    error_message = 'This was not valid input for this function. Please enter a list of 6 numbers between 1 and 49.'
    if not isinstance(numbers, list):
        raise TypeError(error_message)
    elif len(numbers) != 6:
        raise TypeError(error_message)
    for i in numbers:
        if (not isinstance(i, int)) or (i > 49) or (i < 0):
            raise TypeError(error_message)
            
    # Calculate number of possible combinations for a six-numbers lottery ticket
    c = combinations(49, 6)
    
    prob_win = 1 / c
    percent_win = prob_win * 100
    
    # Print an easy to understand message with the calculated values and input numbers
    print("Your calculated chance of winning with the numbers {} are {:0.5f}%. In other words 1 in {}".format(numbers, percent_win,
                                                                                                               round(c)))
    return 1
    

In [64]:
# Quick test for the function.

user_inputs = [23, 26, 2, 35, 16, 42]

one_ticket_probability(user_inputs)
# one_ticket_probability([1,50,3,4,5,6])
# one_ticket_probability([1,2,3,4])
# one_ticket_probability('random string')

Your calculated chance of winning with the numbers [23, 26, 2, 35, 16, 42] are 0.00001%. In other words 1 in 13983816


1

### Historical Data Check for Canada Lottery

#### We need to explore the historical data coming from the Canada 6/49 lottery. The data can be found on kaggle here (https://www.kaggle.com/datasets/datascienceai/lottery-dataset). It contains data from 3665 drawings between 1982 and 2018. Here we will open and explore the dataset a little using pandas.

In [51]:
import pandas as pd

lottery_legacy = pd.read_csv('649.csv')
lottery_legacy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB


In [53]:
lottery_legacy.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [54]:
lottery_legacy.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


#### It looks like the data is correct, it contains 3665 entries with 6 drawn number columns and an additional bonus number column, with entries dating from 6/12/1982 - 6/13/2018 (be aware this is the American date format).

### Function for Historical Data Check

#### Here I'll write a function that will let users compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now. The engineering team wants us to output a print that displays the numbers of times the combination selected occurred in the Canada data set, along with the probability of winning the big prize in the next drawing with that combination.

In [62]:
# This will convert each row into a set with all the winning numbers
def extract_numbers(row):
    row = row[4:10]
    return set(row.values)

# Applies the function to each data set row
winning_numbers = lottery_legacy.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

#### Here I'll create a function that checks the previous series of winning numbers found by the extract_numbers function against the users numbers and returns a comment based on the number of matches. 

In [68]:
def check_historical_occurence(input_numbers, series):
    
    user_numbers = set(input_numbers)
    
    # Make list of bools for match/no-match
    match_check = user_numbers == winning_numbers
    # Count the number of matches
    match_count = match_check.sum()
    
    if match_count == 1:
        print('Your numbers have come up once in the past, this does not change the probability of them coming up again. The chances of you winning today are less than 0.00001%, or more specifically 1 in 13,983,816.')
    elif match_count > 1:
        print('Your numbers have come up {} times in the past, these wins do not change the probability of them coming up again. The chances of you winning today are less than 0.00001%, or more specifically 1 in 13,983,816.'.format(match_count))
    else:
        print('Your numbers have not come up in the past, this does not change the probability of them coming up again. The chances of you winning today are less than 0.00001%, or more specifically 1 in 13,983,816.')
    
    return 1

In [69]:
# Quick test for the function.

check_historical_occurence(user_inputs, winning_numbers)

Your numbers have not come up in the past, this does not change the probability of them coming up again. The chances of you winning today are less than 0.00001%, or more specifically 1 in 13,983,816.


1

### Multi-ticket Probability

#### Lottery addicts commonly play many tickets on one drawing under the illusion that this will increase their chances of winning. Here I will write a function that calculate the chances of winning for any number of different tickets.

#### I have the following instructions:
#### - The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
#### - Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
#### - The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [84]:
def multi_ticket_probability(n):
    # Calculate the total number of possible winning outcomes
    c = combinations(49,6)
    # Probability of winning
    p = n / c
    percent_win = p * 100
    
    # Print an easy to understand message with the calculated values and input numbers
    if n > 1 and n <= 1000000:
        print("Your calculated chance of winning with {} lottery tickets is {:0.5f}%. In other words {} in {}".format(n, percent_win, n, round(c)))
    elif n > 1000000: 
        print("Your calculated chance of winning with one lottery ticket is {:0.1f}%. In other words {} in {}".format(percent_win, n, round(c)))                                                                                                         
    
    return 1
    

In [86]:
multi_ticket_probability(100000)

Your calculated chance of winning with 100000 lottery tickets is 0.71511%. In other words 100000 in 13983816


1

### Less Winning Numbers

#### In most 6/49 lotteries there are smaller prizes if a players ticket matches more than two of the numbers shown. Therefore the user may be interested in knowing the probability of knowing the probability of having two, three, four or five winning numbers.

#### I will therefore make a function that will make this possible, I have been given the following instructions:
#### The user inputs six different numbers from 1 to 49 and an integer between 2 and 5 that represents the number of winning numbers expected.
#### Our function prints information about the probability of having the inputted number of winning numbers.

In [108]:
def probability_less_6(n):
    
    # Error handling to make sure input is correct
    error_message = 'The input for this function must be between an integer 2 and 5.'
    if n > 5 or n < 2 or not isinstance(n, int):
        raise TypeError(error_message)
        
    # Find the n-number combinations out of the chosen six numbers that can potentially win.
    matching_n = combinations(6,n)
    # Find the (6-n) number combinations for the other digits that don't match any drawn nos.
    # It's 43 to exclude the anymore matching digits, we want exactly n not at least n matches.
    non_matching_n = combinations(43, 6 - n)
    # Can have any combination of non-matching n for each combination of matching n.
    successful_n_matches = matching_n * non_matching_n
    
    total_c = combinations(49, 6)
    p_nmatches = successful_n_matches / total_c
    
    percent_nmatches = p_nmatches * 100
    
    # Print an easy to understand message with the calculated values and input numbers
    print("Your calculated chance of winning with {} matching number(s) is {:0.5f}%. In other words 1 in {:0.0f}"
          .format(n, percent_nmatches, 1/p_nmatches))
                                                  
    return 1
           
        

In [109]:
for i in [2, 3, 4, 5]:
    probability_less_6(i)
    print('--------------------------')

Your calculated chance of winning with 2 matching number(s) is 13.23780%. In other words 1 in 8
--------------------------
Your calculated chance of winning with 3 matching number(s) is 1.76504%. In other words 1 in 57
--------------------------
Your calculated chance of winning with 4 matching number(s) is 0.09686%. In other words 1 in 1032
--------------------------
Your calculated chance of winning with 5 matching number(s) is 0.00184%. In other words 1 in 54201
--------------------------


#### Here I have provided a way to calculate the probability of having any number of matches out of the 6 chosen numbers.