# Mobile App for Lottery Addiction

In this project, we are going to look at a fictional scenario where a medical institute is attempting to prevent and treat gambling addictions by building a mobile app to help lottery addicts better estimate their chances of winning. 

For the first version of the app, our task will be to create the logical core and calculate probabilities. We'll focus on the 6/49 lottery and build functions that will enable the app users to answer questions like:

* What is the probability of winning the jackpot with a single ticket?
* What is the probability of winning the jackpot with x number of different tickets?
* What is the probability of having x number of winning numbers on a single ticket?

We'll be looking at a dataset from the national 6/49 lottery game in Canada that contains historical data for 3,665 drawings, dating from 1982 to 2018 that can be downloaded from Kaggle [here](https://www.kaggle.com/datascienceai/lottery-dataset).

## Core Functions

To start off, we'll write two functions that we will frequently use throughout our project.

* `factorial()`  – a function to calculate factorials
* `combinations()` – a function to calculate combinations

In [1]:
# The lottery drawing is done without replacement, so once a number is drawn, it isn't put back in the set
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n - k)
    return numerator / denominator

## One-Ticket Probability

In the 6/49 lottery, six numbers are drawn from a set ranging from 1 to 49. For a player to win the jackpot, all six of the numbers on their ticket must match all six of the numbers drawn.

Next, we'll write the `one_ticket_probability` function. For our function to work in the app, the user will need to enter six different numbers from 1 to 49. That data will come as a Python list and serve as the input to our function. The function needs to then print the probability value in a user friendly was that will be easy for people without probability training to understand. 

In [2]:
def one_ticket_probability(number_list):
    num_combinations = combinations(49, 6)
    one_ticket = 1 / num_combinations
    probability_pct = one_ticket * 100
    
    print('''The chances of winning the jackpot with the numbers {} are only {:.7f}%. \nThis means that you have only a 1 in {:,} chance of winning.'''.format(number_list, probability_pct, int(num_combinations)))

Let's test our function using a few inputs to make sure it's working correctly.

In [3]:
test_input_1 = [11, 12, 18, 24, 44, 10]
one_ticket_probability(test_input_1)

The chances of winning the jackpot with the numbers [11, 12, 18, 24, 44, 10] are only 0.0000072%. 
This means that you have only a 1 in 13,983,816 chance of winning.


In [4]:
test_input_2 = [1, 2, 3, 4, 5, 6]
one_ticket_probability(test_input_2)

The chances of winning the jackpot with the numbers [1, 2, 3, 4, 5, 6] are only 0.0000072%. 
This means that you have only a 1 in 13,983,816 chance of winning.


## Looking at the Historical Data for Canada Lottery

Next, we'll check out the data in our dataset so that we can compare the user's ticket numbers against the historical lottery data in Canada, and determine whether or not they would have ever won the jackpot by now.

In [5]:
import pandas as pd

canada_lottery = pd.read_csv('649.csv')
canada_lottery.shape

(3665, 11)

In [6]:
canada_lottery.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [7]:
canada_lottery.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Function for Historical Data Check

We'll write a function that takes in the 6 numbers from the ticket as input, prints the number of times that combination occurred in the data, and prints the probability of winning the jackpot in the next drawing with those numbers.

We'll begin by creating a function to extract all of the winning numbers from the dataset.

In [8]:
def extract_numbers(row):
    numbers = row[4:10]
    numbers = set(numbers.values)
    return numbers

winning_numbers = canada_lottery.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Next, we're going to write a function names `check_historical_occurrence()` that's going to take two inputs: the Python list of the users ticket numbers and a pandas series containing the sets with the winning numbers that we get from the `extract_numbers()` function.

The function will return True every time there is a match between the users numbers and the winning numbers. It will also need to print the information about the probability of winning the jackpot with that number combination in an understandable way.

In [9]:
def check_historical_occurrence(user_numbers, historical_numbers):
    user_number_set = set(user_numbers)
    occurrence_check = historical_numbers == user_number_set
    num_occurrences = occurrence_check.sum()
    
    if num_occurrences == 0:
        print('''The combination of {} has never occurred.
The chance of winning the jackpot in the next drawing with {} is 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.'''
             .format(user_numbers, user_numbers))
    
    else:
        print('''The combination of {} has occurred {} time(s) before.
The chance of winning the jackpot in the next drawing with {} is 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning'''
             .format(user_numbers, num_occurrences, user_numbers))

In [10]:
test_input_3 = [11, 12, 18, 24, 44, 10]
check_historical_occurrence(test_input_3, winning_numbers)

The combination of [11, 12, 18, 24, 44, 10] has never occurred.
The chance of winning the jackpot in the next drawing with [11, 12, 18, 24, 44, 10] is 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.


In [11]:
test_input_4 = [14, 24, 31, 35, 37, 48]
check_historical_occurrence(test_input_4, winning_numbers)

The combination of [14, 24, 31, 35, 37, 48] has occurred 1 time(s) before.
The chance of winning the jackpot in the next drawing with [14, 24, 31, 35, 37, 48] is 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning


## Multi-ticket Probability

Lottery addicts usually play more than one ticket for each drawing in the hopes that this will significantly increase their chances of winning. We want to help them better estimate their chances of winning, so we'll write another function called `multi_ticket_probability()` that will allow the user to calculate the chances of winning for any number of tickets.

The function's input will be the number of tickets the user wants to play, given that each of those tickets contains different combinations of numbers. The maximum number of tickets that can be played will be equal to the maximum number of possible combinations. The function's output will be similar to our previous functions.

In [12]:
def multi_ticket_probability(num_tickets):
    
    num_combinations = combinations(49, 6)
    
    probability = num_tickets / num_combinations
    probability_pct = probability * 100
    
    if num_tickets == 1:
        print('''The chance of winning the jackpot with 1 ticket is {:.6f}%.
In other words, you have a 1 in {:,} chance of winning.'''
              .format(probability_pct, num_combinations))
        
    else:
        user_combinations = round(num_combinations / num_tickets)
        print('''The chance of winning the jackpot with {:,} tickets is {:.6f}%.
In other words, you have a 1 in {:,} chance of winning.'''
              .format(num_tickets, probability_pct, int(user_combinations))) # Note rounding error when num_tickets above 50% mark

In [13]:
test_inputs = [1, 11, 100, 10000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('\n')

The chance of winning the jackpot with 1 ticket is 0.000007%.
In other words, you have a 1 in 13,983,816.0 chance of winning.


The chance of winning the jackpot with 11 tickets is 0.000079%.
In other words, you have a 1 in 1,271,256 chance of winning.


The chance of winning the jackpot with 100 tickets is 0.000715%.
In other words, you have a 1 in 139,838 chance of winning.


The chance of winning the jackpot with 10,000 tickets is 0.071511%.
In other words, you have a 1 in 1,398 chance of winning.


The chance of winning the jackpot with 6,991,908 tickets is 50.000000%.
In other words, you have a 1 in 2 chance of winning.


The chance of winning the jackpot with 13,983,816 tickets is 100.000000%.
In other words, you have a 1 in 1 chance of winning.




## Quantity of Winning Numbers

The last function we are going to write will be for the user to calculate the probabilities of having two, three, four, or five winning numbers on the ticket.

In most lotteries, there are smaller prizes when a ticket matches two or more of the six numbers drawn. These smaller winning can still keep those that are addicted to playing the lottery hooked, so we'll want to make sure the user understands these probabilities as well. 

Our function `small_prize_probability` will take two inputs: the six user ticket numbers and an integer between 2 and 5 to represent the quantity of winning numbers expected. It will need to calculate the probability of having that quantity of winning numbers exactly. The function will need to print the probability information in an understandable way just like in our previous functions. 

Note, there are two ways we can tackle this probability question: What is the probability of having exactly five winning numbers? What is the probability of having at least five winning numbers? For our function, we are going to be answering the first question and not the second. Keep in mine that the probability of having five winning numbers for our example would be equivalent to the number of successful outcomes divided by the total possible outcomes.

In [14]:
def small_prize_probability(qty_winning_numbers):
    
    num_combinations_ticket = combinations(6, qty_winning_numbers)
    num_combinations_remaining = combinations(43, 6 - qty_winning_numbers)
    successful_outcomes = num_combinations_ticket * num_combinations_remaining
    
    num_combinations_total = combinations(49, 6)
    probability = successful_outcomes / num_combinations_total
    probability_pct = probability * 100
    
    user_combinations = round(num_combinations_total / successful_outcomes)
    
    print('''The chance of winning a {} number match with this ticket is {:.6f}%.
In other words, you have a 1 in {:,} chance of winning a prize.'''
         .format(qty_winning_numbers, probability_pct, int(user_combinations)))

In [15]:
# Testing all possible inputs
for test_input in [2, 3, 4, 5]:
    small_prize_probability(test_input)
    print('\n')

The chance of winning a 2 number match with this ticket is 13.237803%.
In other words, you have a 1 in 8 chance of winning a prize.


The chance of winning a 3 number match with this ticket is 1.765040%.
In other words, you have a 1 in 57 chance of winning a prize.


The chance of winning a 4 number match with this ticket is 0.096862%.
In other words, you have a 1 in 1,032 chance of winning a prize.


The chance of winning a 5 number match with this ticket is 0.001845%.
In other words, you have a 1 in 54,201 chance of winning a prize.




## Conclusion & Next Steps

In this project we coded four main functions for our app:

* one_ticket_probability() to calculate the probability of winning the jackpot with a single ticket
* check_historical_occurrence() to checks if a certain combination has occurred before in the dataset
* multi_ticket_probability() to calculate the probability for having any number of tickets for a drawing
* small_prize_probability() to calculate the probability of having a two, three, four or five winning number match

If we wanted to continue building more features into our app, some next steps could be:

* Make the outputs even easier for the user to understand by adding memorable analogies of strange events that occur in life at similar probabilities.
* Combine the `one_ticket_probability()` and `check_historical_occurrence()` functions to output probability and historical occurrence information at the same time.
* Create another function similar to `small_prize_probability()`, but one that calculates the probability of having at least two, three, four, or five winning numbers instead of the quantity entered exactly.

The idea for this project comes from the [DATAQUEST](https://app.dataquest.io/) **Probability: Fundamentals** course.