# Mobile App for Lottery Addiction

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

Some of the questions that we need to answer are:
* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a sigle ticket?

Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:
* A function that calculates factorials

$$P(n) = {n!}$$

* A function that calculates combinations

$$C(n,k) = \frac{n!}{(n-k)!n!}$$

In [1]:
# factorial function
def factorial_manual(n):
    result = 1
    for i in range(n,0,-1):
        result *= i
    return result
import time
s = time.time()
print(factorial_manual(5))
e = time.time()
print(e-s)

120
0.0


In [2]:
from math import factorial
s = time.time()
print(factorial(5))
e = time.time()
print(e-s)

120
0.0


Both functions are vey fast but with big calculations the factorial imported from math has a better performance

In [3]:
# function combinations
def combinations(n,k):
    return factorial(n)/(factorial(n-k)*factorial(k))
combinations(5,3)

10.0

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big price if the six numbers on their tickets match all the six numbers drawn.

We'll start by building a function that calculates the probability of winning the big prize for **any given ticket**.

In [4]:
def one_ticket_probability(numbers):
    probability = 1/combinations(49,6)
    return print("You have a chance of winning the lottery of {:.8f} %,\
                \nIn other words you have a chance of 1 in {:,} to win with the numbers {}.".format(probability*100,
                                                                                       int(combinations(49,6)),
                                                                                       numbers))
numbers = [10,8,5,3,48,7]
one_ticket_probability(numbers)

You have a chance of winning the lottery of 0.00000715 %,                
In other words you have a chance of 1 in 13,983,816 to win with the numbers [10, 8, 5, 3, 48, 7].


The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kagle.com/datascienceai/lottery-dataset) has data for 3665 drawings, dating from 1982 to 2018.

In [5]:
import pandas as pd
df = pd.read_csv("649.csv")

In [6]:
df.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [7]:
df.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [8]:
df.shape

(3665, 11)

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


There aren't null cells in the file. The data set has 3665 rows and 11 columns.

Write a function named extract_numbers() that takes as input a row of the lottery dataframe and returns a set containing all the six winning numbers

In [10]:
def extract_numbers(row):
    return set(row)

winner_numbers = df.iloc[:,4:10].apply(extract_numbers,axis = 1)
winner_numbers.head(10)

0     {3, 41, 11, 12, 43, 14}
1     {33, 36, 37, 39, 8, 41}
2      {1, 6, 39, 23, 24, 27}
3      {3, 9, 10, 43, 13, 20}
4     {34, 5, 14, 47, 21, 31}
5     {8, 41, 20, 21, 25, 31}
6    {33, 36, 42, 18, 25, 28}
7     {7, 40, 16, 17, 48, 31}
8     {37, 5, 38, 10, 23, 27}
9     {4, 37, 46, 15, 48, 30}
dtype: object

In [11]:
def check_historical_occurence(numbers,historical_register):
    '''
    numbers: a Python list
    historical_register: a Pandas series
    '''
    occurred = set(numbers) == historical_register
    n_occurred = occurred.sum()
    
    if n_occurred ==0:
        
        return print("The combination {} has never occurred. This doesn't mean it's more likely to occur now.\
        \nIn other words you have a chance of 1 in {:,} to win the big prize.\n".format(numbers,int(combinations(49,6)),
                                                                                       numbers))
    else:
        return print("This combination {} has occurred {} times before.\
        \nIn other words you have a chance of 1 in {:,} to win the big prize.\n".format(numbers,n_occurred,
                                                                                        int(combinations(49,6)),
                                                                                       numbers))

check_historical_occurence([1, 6, 39, 23, 24, 27],winner_numbers)    
check_historical_occurence(numbers,winner_numbers)    

This combination [1, 6, 39, 23, 24, 27] has occurred 1 times before.        
In other words you have a chance of 1 in 13,983,816 to win the big prize.

The combination [10, 8, 5, 3, 48, 7] has never occurred. This doesn't mean it's more likely to occur now.        
In other words you have a chance of 1 in 13,983,816 to win the big prize.



## Multiple Tickets

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning.

In [12]:
def multi_ticket_probability(number_of_tickets):
    probability = number_of_tickets/combinations(49,6)
    return print(f"  Your chances to win the big price with {number_of_tickets} ticket are {probability*100:.5f}%\n\
------------------------------------------------------------------------")
test = [1,10,100,10000,1000000,6991908,13983816]
for i in test:
    multi_ticket_probability(i)

  Your chances to win the big price with 1 ticket are 0.00001%
------------------------------------------------------------------------
  Your chances to win the big price with 10 ticket are 0.00007%
------------------------------------------------------------------------
  Your chances to win the big price with 100 ticket are 0.00072%
------------------------------------------------------------------------
  Your chances to win the big price with 10000 ticket are 0.07151%
------------------------------------------------------------------------
  Your chances to win the big price with 1000000 ticket are 7.15112%
------------------------------------------------------------------------
  Your chances to win the big price with 6991908 ticket are 50.00000%
------------------------------------------------------------------------
  Your chances to win the big price with 13983816 ticket are 100.00000%
------------------------------------------------------------------------


In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four or five of the six numbers drawn. As a consecuence, the user might be interested in knowing the probability of having two, three, four or five winning numbers.

Write a function named probability_less_6() which takes in an integer between 2 and 5 and prints information about the chances of winning depending on the value of that integer.

In [13]:
def probability_less_6(n_numbers):
    combination_ticket = combinations(6,n_numbers)
    probability = (combination_ticket*combinations(43,6-n_numbers))/combinations(49,6)
    return print(f"Your chances of have {n_numbers} winner numbers with this ticket are {probability*100:.3f} %.")
for i in range(2,6):
    probability_less_6(i)

Your chances of have 2 winner numbers with this ticket are 13.238 %.
Your chances of have 3 winner numbers with this ticket are 1.765 %.
Your chances of have 4 winner numbers with this ticket are 0.097 %.
Your chances of have 5 winner numbers with this ticket are 0.002 %.


Now let's create a function which calculates the probability of having at least two, three, four or five winning numbers

In [14]:
def probability_at_least(n_numbers):
    result = 0
    for i in range(n_numbers,6): 
        result+=combinations(6,n_numbers)*combinations(43,6-n_numbers)/combinations(49,6)
    return print(f"The probability of having at least {n_numbers} winners numbers are than {100*result:.3f}%.")
for i in [2,3,4,5]:
    probability_at_least(i)

The probability of having at least 2 winners numbers are than 52.951%.
The probability of having at least 3 winners numbers are than 5.295%.
The probability of having at least 4 winners numbers are than 0.194%.
The probability of having at least 5 winners numbers are than 0.002%.
