# Guided Project: Mobile App for Lottery Addiction

## Introduction

The aim of this project is to contribute to the development of a mobile app that is meant to help lottery addicts better estimate their chances of winning.

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that enable users to answer questions like:

* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?


The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

## Calculate Factorials and Combinations

Our goal is to write code that can enable users to answer probability questions about playing the lottery. Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:

* A function that calculates factorials; and
* A function that calculates combinations.

To calculate factorials, this is the formula we need to use:

<div align="center">_$$n! = n \times (n-1) \times (n-2) \times ... \times 2 \times 1$$_</div>

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

To find the number of combinations when we're sampling without replacement and taking only k objects from a group of n objects, we can use the formula:
$$_{n}^{}\textrm{C}_{k}= \binom{n}{k} = \frac{n!}{k!(n-k)!}$$

Now, let's start coding the two functions.

In [48]:
def factorial(n):
    factorial = 1
    for i in range(n,0,-1):
        factorial *= i
    return factorial

def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

In this section, we focused on writing `factorial()` and `combinations()`, two core functions that we're going to need repeatedly moving forward.
In the next section, we focus on writing a function that calculates the probability of winning the big prize.

## Probability of Winning the Big Prize

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, he doesn't win.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:

* Inside the app, the user inputs six different numbers from 1 to 49.
* Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
* The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

Let's write this function!

In [62]:
def one_ticket_probability(user_numbers):
    number_total_outcomes = combinations(49,6)
    number_successful_outcomes = 1
    probability_one_ticket = number_successful_outcomes / number_total_outcomes
    returned_str = "The probability of winning the big prize with the numbers: {} is of {:.7%}"
    return print(returned_str.format(user_numbers, probability_one_ticket))

# Test of the function with a few inputs
one_ticket_probability([13,22,24,27,42,44])
one_ticket_probability([13,15,3,30,22,16])

The probability of winning the big prize with the numbers: [13, 22, 24, 27, 42, 44] is of 0.0000072%
The probability of winning the big prize with the numbers: [13, 15, 3, 30, 22, 16] is of 0.0000072%


In this section, we wrote a function that can tell users what is the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In the next section, we focus on writing a function that will enable users to make this comparison.

## Comparing to Historical Lottery Data

In this section, we focus on writing a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

But first, let's explore the data set, which can be downloaded from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset).

In [15]:
import pandas as pd

draws = pd.read_csv("649.csv")

print(draws.shape)

print(draws.head(3))
print(draws.tail(3))

(3665, 11)
   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   
2      649            3                0  6/26/1982               1   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   
2               6              23              24              27   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  
2              39            34  
      PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
3662      649         3589                0  6/13/2018               6   
3663      649         3590                0  6/16/2018               2   
3664      649         3591                0  6/20/2018              1

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

* `NUMBER DRAWN 1`
* `NUMBER DRAWN 2`
* `NUMBER DRAWN 3`
* `NUMBER DRAWN 4`
* `NUMBER DRAWN 5`
* `NUMBER DRAWN 6`

Let's now write the function!

In [43]:
# Extraction of the previous winning numbers
def extract_numbers(row):
    numbers = {x for x in row[4:10]}
    return numbers

winning_numbers = draws.apply(extract_numbers, axis=1)
print(winning_numbers.head())

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object


In [44]:
# Check if user numbers were already drawn in the past
def check_historical_occurence(user_numbers, winning_numbers):
    user_numbers = set(user_numbers)
    occurences = sum(user_numbers == winning_numbers)
    str = "The combination of numbers: {} occured {} times in the past."
    return print(str.format(user_numbers, occurences))

# Test of the function with a few inputs
check_historical_occurence([13,22,24,27,42,44], winning_numbers)
check_historical_occurence([13,15,3,30,22,16], winning_numbers)

The combination of numbers: {42, 44, 13, 22, 24, 27} occured 0 times in the past.
The combination of numbers: {3, 13, 15, 16, 22, 30} occured 0 times in the past.


So far, we wrote two functions:
* `one_ticket_probability()` - calculates the probability of winning the big prize with a single ticket
* `check_historical_occurence()` - checks wether a certain combination has occured in the Canada lottery data set

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — in the next section, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

## Chances of Winning for Any Number of Different Tickets

We've talked with the engineering team and they gave us the following information:

* The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
* Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
* The function should print information about the probability of winning the big prize depending on the number of different tickets played.

Let's now start writing this function.

In [47]:
def multi_ticket_probability(number_tickets):
    number_possible_outcomes = combinations(49,6)
    number_successful_outcomes = number_tickets
    probability = number_successful_outcomes / number_possible_outcomes
    str = "The probability of winning with {} tickets is of {:.6%}"
    return print(str.format(number_tickets,probability))

# Test of the function with a few inputs
multi_ticket_probability(1)
multi_ticket_probability(10)
multi_ticket_probability(100)
multi_ticket_probability(10000)
multi_ticket_probability(100000)
multi_ticket_probability(6991908)
multi_ticket_probability(13983816)

The probability of winning with 1 tickets is of 0.000007%
The probability of winning with 10 tickets is of 0.000072%
The probability of winning with 100 tickets is of 0.000715%
The probability of winning with 10000 tickets is of 0.071511%
The probability of winning with 100000 tickets is of 0.715112%
The probability of winning with 6991908 tickets is of 50.000000%
The probability of winning with 13983816 tickets is of 100.000000%


So far, we wrote three main functions:

* `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket
* `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set
* `multi_ticket_probability()` — calculates the probability for any number of tickets between 1 and 13,983,816

In the next section, we're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

## Chances of Winning for Two, Three, Four of Five Winning Numbers

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:

* Inside the app, the user inputs:
    * six different numbers from 1 to 49; and
    * an integer between 2 and 5 that represents the number of winning numbers expected
* Our function prints information about the probability of having the inputted number of winning numbers.

First, we need to differentiate between these two probability questions:

* What is the probability of having _exactly_ five winning numbers?
* What is the probability of having _at least_ five winning numbers?

For our purposes here, we want to answer the first question.

For the sake of example, let's say a player chose these six numbers on a ticket: (1, 2, 3, 4 ,5 ,6). Out of these six numbers, we can form six five-number combinations:

* (1,2,3,4,5)
* (1,2,3,4,6)
* (1,2,3,5,6)
* (1,2,4,5,6)
* (1,3,4,5,6)
* (2,3,4,5,6)

We can also find the total number of five-number combinations by calculating "6 choose 5":
$$_{6}^{}\textrm{C}_{5}= \binom{6}{5} = \frac{6!}{5!(6-5)!} = 6$$

For each one of the six five-number combinations above, there are 44 possible successful outcomes in a lottery drawing. For the combination (1, 2, 3, 4, 5), for instance, there are 44 lottery outcomes that would return a prize:
* (**1, 2, 3, 4, 5,** 6)
* (**1, 2, 3, 4, 5,** 7)
* ...
* (**1, 2, 3, 4, 5,** 30)
* (**1, 2, 3, 4, 5,** 31)
* ...
* (**1, 2, 3, 4, 5,** 49)

However, we need to leave out the outcome (1, 2, 3, 4, 5, 6) because we're only interested in outcomes that match _exactly_ five numbers, not _at least_ five numbers. This means that for each of our six five-number combinations we have 43 possible successful outcomes, not 44.

Since there are six five-number combinations and each combination corresponds to 43 successful outcomes, we need to multiply 6 by 43 to find the _total_ number of successful outcomes:
$$6 \times 43 = 258$$

Since there are 258 successful outcomes and there are 13,983,816 total possible outcomes (the result of 49 choose 6), the probability of having exactly five winning numbers for a single lottery ticket is:
$$P(\mathit{5-winning\:numbers}) = \frac{258}{\binom{49}{6}} = 0.00001845$$

Now let's code the function. To calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant behind the scenes, and we only need the integer between 2 and 5 representing the number of winning numbers expected.

In [60]:
def probability_less_6(number_winning_numbers):
    number_successful_outcomes = combinations(6,number_winning_numbers) * combinations(49-number_winning_numbers,6-number_winning_numbers)
    number_possible_outcomes = combinations(49,6)
    probability = number_successful_outcomes / number_possible_outcomes
    str = "The probability of having exactly {} winning numbers is of {:.6%}"
    return print(str.format(number_winning_numbers,probability))
                 
# Test of the function with a few inputs
probability_less_6(2)
probability_less_6(3)
probability_less_6(4)
probability_less_6(5)     

The probability of having exactly 2 winning numbers is of 19.1327%
The probability of having exactly 3 winning numbers is of 2.1711%
The probability of having exactly 4 winning numbers is of 0.1062%
The probability of having exactly 5 winning numbers is of 0.0019%


## Conclusion and Next Steps

That was all for the guided part of the project! We managed to write four main functions for our app:

* `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket
* `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set
* `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816
* `probability_less_6()` — calculates the probability of having two, three, four or five winning numbers

Possible features for a second version of the app include:

* Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in lottery; for instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than winning the lottery").
* Combining the `one_ticket_probability()` and `check_historical_occurrence()` to output information on probability and historical occurrence at the same time.
* Creating a function similar to `probability_less_6()` which calculates the probability of having at least two, three, four or five winning numbers.