# Title: Mobile App for Lottery Addiction

## Introduction: 

### Context:

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.


### Goals:

For the first version of the app, they want us to focus on the [6/49](https://en.wikipedia.org/wiki/Lotto_6/49) lottery and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. 

The data set has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

_The scenario we're following throughout this project is fictional — the main purpose is to practice applying the concepts we learned in a setting that simulates a real-world scenario._


## Part 1: Factorial & Combinations Function

Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:

A function that calculates factorials; and
A function that calculates combinations.

Factorial is defined as:

**<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>n</mi>
  <mo>!</mo>
  <mo>=</mo>
  <mi>n</mi>
  <mo>&#x00D7;<!-- × --></mo>
  <mo stretchy="false">(</mo>
  <mi>n</mi>
  <mo>&#x2212;<!-- − --></mo>
  <mn>1</mn>
  <mo stretchy="false">)</mo>
  <mo>&#x00D7;<!-- × --></mo>
  <mo stretchy="false">(</mo>
  <mi>n</mi>
  <mo>&#x2212;<!-- − --></mo>
  <mn>2</mn>
  <mo stretchy="false">)</mo>
  <mo>&#x00D7;<!-- × --></mo>
  <mo>.</mo>
  <mo>.</mo>
  <mo>.</mo>
  <mo>&#x00D7;<!-- × --></mo>
  <mn>2</mn>
  <mo>&#x00D7;<!-- × --></mo>
  <mn>1</mn>
</math>**

Combination is defined as:

**<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <msub>
    <mi></mi>
    <mi>n</mi>
  </msub>
  <msub>
    <mi>C</mi>
    <mi>k</mi>
  </msub>
  <mo>=</mo>
  <mrow class="MJX-TeXAtom-ORD">
    <mrow>
      <mrow class="MJX-TeXAtom-OPEN">
        <mo maxsize="2.047em" minsize="2.047em">(</mo>
      </mrow>
      <mfrac linethickness="0">
        <mi>n</mi>
        <mi>k</mi>
      </mfrac>
      <mrow class="MJX-TeXAtom-CLOSE">
        <mo maxsize="2.047em" minsize="2.047em">)</mo>
      </mrow>
    </mrow>
  </mrow>
  <mo>=</mo>
  <mfrac>
    <mrow>
      <mi>n</mi>
      <mo>!</mo>
    </mrow>
    <mrow>
      <mi>k</mi>
      <mo>!</mo>
      <mo stretchy="false">(</mo>
      <mi>n</mi>
      <mo>&#x2212;<!-- − --></mo>
      <mi>k</mi>
      <mo stretchy="false">)</mo>
      <mo>!</mo>
    </mrow>
  </mfrac>
</math>**

In [1]:
# we will define the factorial function recursively, 
# we have less memory expensive options such as a for loop, but this is not an issue here

def factorial(n):
    """
    Recursive function used to calculate the factorial of a positive integer n.
    Returns value of 1 if n is equal to 1, else
    returns n * factorial(n-1) until the end of sequence of n.
    """
    if n == 1:
        return 1
    else:
        return n * factorial(n-1)

# since the definition of a combination mathematically uses factorial,
# we will build our combination function with it 

def combinations(n,k):
    """
    Function that returns the total amount of combinations, given n choose k positive integers.
    """
    return factorial(n)/(factorial(k)*factorial(n-k))

In [2]:
# lets test our functions
print("{:,}".format(factorial(10)))
print("{:,}".format(combinations(10,5)))

3,628,800
252.0


Our functions work correctly. Lets move on and start applying them.

## Part 2: Simple Probability Function

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. 

A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. 

If only one number differs, he doesn't win.

We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

**Write a function named** `one_ticket_probability()`**, which takes in a list of six unique numbers and prints the probability of winning in a way that's easy to understand.**

In [3]:
def one_ticket_probability(array):
    """
    Simple theoretical proability function that takes an array, purely for aesthetic reasons,
    and returns the probability of guessing a correct lottery entry of 6 numbers out of 49 possible numbers,
    hence 49 choose 6.
    """
    total_combinations = combinations(49,6)
    successful_outcomes = 1
    probability = (successful_outcomes/total_combinations)*100
    return "Your chances of winning with entry {} is {:,.9f} percent.".format(array,probability)

In [4]:
print(one_ticket_probability([1,2,3,4,5,6]))
print('------------------------------------')
print(one_ticket_probability([5,4,3,6,9,11]))
print('------------------------------------')
print(one_ticket_probability([14,33,62,9,11]))

Your chances of winning with entry [1, 2, 3, 4, 5, 6] is 0.000007151 percent.
------------------------------------
Your chances of winning with entry [5, 4, 3, 6, 9, 11] is 0.000007151 percent.
------------------------------------
Your chances of winning with entry [14, 33, 62, 9, 11] is 0.000007151 percent.


 Simple theoretical proability function that takes an array, purely for aesthetic reasons,
    and returns the probability of guessing a correct lottery entry of 6 numbers out of 49 possible numbers,
    hence 49 choose 6

## Part 3: Comparing to Historic Wins

For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

On this screen, we'll focus on exploring the historical data coming from the Canada 6/49 lottery.

The data set can be downloaded from [Kaggle](https://www.kaggle.com/datasets/datascienceai/lottery-dataset)

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- NUMBER DRAWN 1
- NUMBER DRAWN 2
- NUMBER DRAWN 3
- NUMBER DRAWN 4
- NUMBER DRAWN 5
- NUMBER DRAWN 6

In [5]:
import pandas as pd

lottery_historical_df_raw = pd.read_csv('649.csv')
print('Number of rows: {}'.format(lottery_historical_df_raw.shape[0]))

Number of rows: 3665


In [6]:
lottery_historical_df_raw.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [7]:
lottery_historical_df_raw.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Part 3.5: Functions to Compare user input to previous Winnings

The engineering team told us that we need to be aware of the following details:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
    - the number of times the combination selected occurred in the Canada data set; and
    - the probability of winning the big prize in the next drawing with that combination.

In [8]:
def extract_numbers(row):
    """
     Function takes as input a row of the lottery dataframe and returns a set containing all the six winning numbers. 
     For the first row, for instance, the function should return the set {3, 41, 11, 12, 43, 14}
    """
    winning_numbers_row = [row['NUMBER DRAWN 1'],row['NUMBER DRAWN 2'],row['NUMBER DRAWN 3'],row['NUMBER DRAWN 4'],
    row['NUMBER DRAWN 5'],row['NUMBER DRAWN 6']]
    winning_numbers_row = set(winning_numbers_row)
    return winning_numbers_row
winning_numbers  = pd.Series(lottery_historical_df_raw.apply(extract_numbers,axis=1))

In [9]:
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [10]:
def check_historical_occurence(usernumbers, series):
    """
    Function that takes in two inputs: a Python list containing the user numbers and a pandas Series containing sets with the winning numbers.
    """
    successful_outcomes = 0
    input_set = set(usernumbers)
    for row in series:
        if (input_set==row):
            successful_outcomes += 1
    probabilityofwinning = (successful_outcomes/(combinations(49,6)))*100
    txtoutput = 'Your entry {} was not found in the historical winning index. This does not increase your odds of winning, you still have a 1/13,983,816 chance of winning the big prize in the next drawing with that combination.'.format(input_set,successful_outcomes,probabilityofwinning)
    return txtoutput
    if (successful_outcomes == 0): 
        txt = 'Your entry {} was not found in the winning index. However, your odds of winning still are 1/13,983,816.'
        return txt
            
print(check_historical_occurence([3, 41, 11, 12, 43, 14],winning_numbers))
print('------------------------------------------------------------------')
print(check_historical_occurence([3, 9, 10, 43, 13, 20],winning_numbers))
print('------------------------------------------------------------------')
print(check_historical_occurence([3,10,10,10,10,10],winning_numbers))

Your entry {3, 41, 11, 12, 43, 14} was not found in the historical winning index. This does not increase your odds of winning, you still have a 1/13,983,816 chance of winning the big prize in the next drawing with that combination.
------------------------------------------------------------------
Your entry {3, 9, 10, 43, 13, 20} was not found in the historical winning index. This does not increase your odds of winning, you still have a 1/13,983,816 chance of winning the big prize in the next drawing with that combination.
------------------------------------------------------------------
Your entry {10, 3} was not found in the historical winning index. This does not increase your odds of winning, you still have a 1/13,983,816 chance of winning the big prize in the next drawing with that combination.


We wanted to showcase that even by historical bounds that the odds of winning the lottery are extremely slim. Say a user randomly happens to choose a correct value, we will want to quantifiably show to that user, that they are extremely lucky and that winning is unrealistic. 

## Part 4: Multi Ticket Probability Function

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning. We're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

In [11]:
def multi_ticket_probability(n_tickets):
    total_n_possibleoutcome = combinations(49,6)
    if type(n_tickets)==list:
        prob_winning_dict = {}
        for i in n_tickets:
            succesful_outcomes = i
            prob_winning_dict[i] = (succesful_outcomes/total_n_possibleoutcome)*100
        return prob_winning_dict
    if type(n_tickets)==int:
        succesful_outcomes = n_tickets
        prob_winning = succesful_outcomes/total_n_possibleoutcome*100
        txt = 'Your chances of winning are {:,.9f}% given you supplied {:,d} number of tickets.'.format(prob_winning,n_tickets)
        return txt
        
        
print(multi_ticket_probability([1, 10, 100, 10000, 1000000, 6991908, 13983816]))
print('----------')
print(multi_ticket_probability(1000000))

{10000: 0.07151123842018516, 1: 7.151123842018516e-06, 100: 0.0007151123842018516, 1000000: 7.151123842018517, 6991908: 50.0, 10: 7.151123842018517e-05, 13983816: 100.0}
----------
Your chances of winning are 7.151123842% given you supplied 1,000,000 number of tickets.


Because many lottery addicts will choose to buy multiple tickets to increase their odds of winning, even though wining is still unlikely, here we wanted to create a fcuntion that also showcases to users their odds of winning by the number of tickets they purchase. 

The function intakes two types; list or integer, for different use cases. Say a person wanted to input a random array of various numbers of tickets, the function can handle that as a list versus running it over and over each individual time for one integer, which the function also takes.

## Part 5: Winning Combinations Function

We're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In [54]:
def probability_less_6(n_winning_numbers):
    
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))

In [55]:
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------') # output delimiter

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances to win.
--------------------------
Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances to win.
--------------------------
Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances to win.
--------------------------
Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances to win.
--------------------------
