# Fighting Lottery Addiction


We work for a medical institute aiming to prevent and treat gambling addictions. Many people start playing the lottery for fun, but for some, it escalates into addiction. Like other compulsive gamblers, lottery addicts start to accumulate debts and eventually engage in desperate behavior.

Our goal is to build a dedicated mobile app to help addicts estimate their chances of winning.

For the first version of the app, we will focus on the 6/49 lottery.,
Six numbers are drawn from a set of 49. If a ticket matches all six numbers, the jackpot prize of at least 5,000,000 is won. A bonus number is also drawn, and if a player's ticket matches five numbers and the bonus number, the player wins the "second prize," which is usually between 100,000 and 500,000. If the top prize is not won, the jackpot prize increases for the next draw.

We aim to define functions that will help the user to answer the below questions:


  - What is the probability of winning the big prize with a single ticket?
  - What is the likelihood of winning the big prize if we play 40 different tickets (or any other number)?
  - What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?
  

## Defining helper functions

Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:

  - A function that calculates factorials; and
  - A function that calculates combinations.
  
Let's start by defining the function `factorial.`
we will use the below formula:

                n! = n.(n-1).(n-2). ... . 2.1

In [1]:
def factorial(n):
    '''
    Return the factorial of a number n.
    
    Example: factorial (3) returns 6
    Value returned is an Integer
    
    parameter n: number we want to calculate the factorial for
    precondition: n is an integer
    '''
    result = n
    
    for i in range(1,n):
        result *= (n-i)
    
    return result   
    
    

In [2]:
# testing the function factorial
import numpy as np

# test case factorial 3
result = 6
np.testing.assert_equal(factorial(3),result)

# test case factorial 6
result= 720

np.testing.assert_equal(factorial(6),result)


In the 6/49 lottery, six numbers are drawn from 49 numbers that range from 1 to 49. The drawing is done without replacement, which means that it's not put back in the set once a number is drawn.

To find the number of combinations when we're sampling without replacement and taking only k objects from a group of n objects, we can use the formula:


                nCk=(nk)=n!/k!(n−k)!

In [3]:
def combinations(n,k):
    '''
    Return all possible combinations of taking k objects from a group of n objects.
    
    Example: Combination n = 10 k =2 returns 45
    
    Value returned is an integer
    
    Parameter n: the set or population
    Precondition: n is an integer
    
    Parameter k: the subset of n or sample set
    Precondition: k is an integer
    '''
    nominator = factorial(n)
    denominator = factorial(k)*factorial(n-k)
    
    return nominator / denominator
    
    

In [4]:
#testing the function combination

# test case n = 10 k= 2
result = 45
np.testing.assert_equal(combinations(10,2),result)

# test case n = 20 k=5
result = 15504
np.testing.assert_equal(combinations(20,5),result)

## Probability of winning the big prize

In the 6/49 lottery, six numbers are drawn from 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn.

We want players to calculate the probability of winning the big prize with the various numbers they play on a single ticket. After a discussion with the engineering team, they advise us on the below:

 - Inside the app, the user inputs six different numbers from 1 to 49.
 - Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
 - The engineering team wants the function to print the probability value in a friendly way


In [5]:
def one_ticket_probability(array):
    ''' 
    Function takes a list of 6 unique numbers and returns the probability
    of winning in a readable manner.
    
    Parameter array: conbination for a six-number lottery
    Precondition: array is a list with 6 unique numbers
    '''
    total_outcomes = combinations(49,6)
    probability = (1 / total_outcomes)
    return "You have a {:%} chance of winning the big price.".format(probability)


In [6]:
# Testing the function one_ticket_probability
one_ticket_probability([5,25,34,32,15,41])

'You have a 0.000007% chance of winning the big price.'

## Historical Data check for Canada Lottery

The engineering team would like to add a feature to compare their tickets against the historical lottery data and determine whether they would have ever won by now.

The dataset is available in [Kaggle][1]
[1]:https://www.kaggle.com/datascienceai/lottery-dataset

###  Exploring the data

In [7]:
import pandas as pd

lottery = pd.read_csv('649.csv')

shape = lottery.shape

"The data set has {} rows and {} columns".format(shape[0],shape[1])

'The data set has 3665 rows and 11 columns'

In [8]:
lottery.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [9]:
lottery.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The data set contains the historcial data for 3 665 drawing, from 1982 to 2018. We can find the six number drawn in the following six columns:

- `NUMBER DRAWN 1`
- `NUMBER DRAWN 2`
- `NUMBER DRAWN 3`
- `NUMBER DRAWN 4`
- `NUMBER DRAWN 5`
- `NUMBER DRAWN 6`

## Function for Historical Data Check

The engineering team wants us to write a function that prints:

- the number of times the combination selected occurred in the Canada data set; and
- the probability of winning the big prize in the next drawing with that combination.


First, we will define a function that will help us extract all the winning 6 numbers from the data set.

In [10]:
def extract_numbers(row):
    '''
    Returns a set containing all the six winning numbers for each row of the lottery
     
    Parameter row: a row from the data set Lottery
    '''
    winning_number = {row[4],row[5],row[6],row[7],row[8],row[9]}
    
    return winning_number


# Use extract_numbers function to extract all winning numbers.
won_lottery = lottery.apply(extract_numbers,axis =1)
    
won_lottery.head()

0    {3, 41, 43, 12, 11, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

The next step is to define a function that takes the user numbers and check whether or not the user would have won the lottery by now.

In [11]:
def check_historical_occurence(array,history=won_lottery):
    '''
    Returns the number of times the combination inputted by the user occurred in the past
    
    Parameter array: a combinatio of 6 numbers selected by the user
    Precondition : array is a list of 6 unique numbers from 1 to 49
    
    Parameter history: Series containing the historical combination that won the lottery
    Precondition : history is a Pandas Series with Python sets
    Default value: won_lottery; historical records from 1982 to 2018
    '''
    user_input = set(array)
    
    won_in_past = history[history == user_input].count()
    
    if won_in_past == 0:
        past = "You current combination as never won the lottery." 
    elif won_in_past ==1:
        past = "You current combination won the lottery {} time in the past."
    else:
         past = "You current combination won the lottery{} times in the past." 
    
    probability_win_next = one_ticket_probability(array)
    
    return past.format(won_in_past), probability_win_next


In [12]:
#testing the function check_historical_occurence
check_historical_occurence([5,25,34,32,15,41])

('You current combination as never won the lottery.',
 'You have a 0.000007% chance of winning the big price.')

In [13]:
# Test for a combination that won the lottery
check_historical_occurence([3, 41, 43, 12, 11, 14])

('You current combination won the lottery 1 time in the past.',
 'You have a 0.000007% chance of winning the big price.')

In [14]:
# checking is there was any combination that won the lottery multiple tmes
# use frozenset() to be able to check for duplicates
won_check = [frozenset(row) for row in won_lottery]
pd.Series(won_check).duplicated().value_counts()

False    3665
dtype: int64

There has not been a combination that won the lottery more than once. therefore, we have tested all possible scenarios for the function `check_historical_occurence.`

## Multi-ticket Probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might significantly increase their chances of wiining. 

In the next step, we will define a function to estimate better their chances of winning.



In [15]:
def multi_ticket_probability(n):
    '''
    Returns the probability of winning the Big price based on the number of ticket purchased
    
    Parameter n: number of tickets purchased
    Precondition: n is an integer from 1 to 13 983 816
    '''
    total_combinations = combinations(49,6)
    probability = (1 /total_combinations)*n
    
    return "By purchasing {:,} tickets, you have {:%} chance of winning the Big Price.".format(n,probability)


In [16]:
# testing the function multi_ticket_probability

test_case =[1, 10, 100, 10000, 1000000, 6991908, 13983816]

for n in test_case:
    print(multi_ticket_probability(n))

By purchasing 1 tickets, you have 0.000007% chance of winning the Big Price.
By purchasing 10 tickets, you have 0.000072% chance of winning the Big Price.
By purchasing 100 tickets, you have 0.000715% chance of winning the Big Price.
By purchasing 10,000 tickets, you have 0.071511% chance of winning the Big Price.
By purchasing 1,000,000 tickets, you have 7.151124% chance of winning the Big Price.
By purchasing 6,991,908 tickets, you have 50.000000% chance of winning the Big Price.
By purchasing 13,983,816 tickets, you have 100.000000% chance of winning the Big Price.


## Probability of winning Smaller prizes

In 6/49 lotteries, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. Consequently, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.


In [19]:
def probability_less_6(n):
    '''
    Returns the probability of having at exactly n  winning numbers
    
    Parameter n: the number of matching numbers the user wants to calculate the probability
    Precondition: n is an integer between 2 and 5
    '''
    ticket_combination = combinations(6,n)
    combination_remaining = combinations(43,6-n)
    succesful_outcomes = ticket_combination*combination_remaining
    
    total_outcomes = combinations(49,6)
    
    probability = succesful_outcomes/total_outcomes
    
    return "You have {:%} chance of getting exactly {} winning numbers".format(probability,n)
    

In [21]:
# testing the function probability_less_6
test_case = [2,3,4,5]

for n in test_case:
    print(probability_less_6(n))

You have 13.237803% chance of getting exactly 2 winning numbers
You have 1.765040% chance of getting exactly 3 winning numbers
You have 0.096862% chance of getting exactly 4 winning numbers
You have 0.001845% chance of getting exactly 5 winning numbers


# Summary

We have defined 4 functions that will help the users :
 - Find the probablity of winning the big prize for one ticket and any number of tickets, 
 - Find the probability of having 2,3,4 or 5 winning numbers, a
 - Check whether their number combination has ever won the lottery.
