# Mobile App for Lottery Addiction

## Introduction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions, wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

## Objective

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:

* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018 

# Writing the core functions

Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:

A function that calculates factorials and a function that calculates combinations.

## Function for factorials

In [1]:
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

## Function for combinations

In [2]:
def combination(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n - k)
    return numerator / denominator

# The 6/49 lottery

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}(does NOT need to be in that order). If only one number differs, he doesn't win.

The main thing to consider about lottery drawings is they are **Unordered sampling without replacement.**

## Writing a function to calculate probability for a single ticket

The engineering team of the medical institute told us we need to be aware of the following details when we write the function:

* Inside the app, the user inputs six different numbers from 1 to 49.
* Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
* The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [3]:
def one_ticket_probability(a, b, c, d, e, f):
    total_comb = combination(49, 6)
    probability = (1 / total_comb) * 100
    formatted_number = f"{probability:.10f}"
    return "The probability of you winning the big prize with these numbers are {} % or once in every {} chances !!!".format(
        formatted_number, total_comb)
    
    

## Testing the function

Let's say we get a lottery ticket with numbers 6, 34, 26, 44, 12 and 31.

In [4]:
one_ticket_probability(6, 34, 26, 44, 12, 31)

'The probability of you winning the big prize with these numbers are 0.0000071511 % or once in every 13983816.0 chances !!!'

It's important to note that no matter what numbers the players input into the function, the answer will always be the same.

# Emperical probabilities

Till now, we told the users what is the theoratical probability of them winning the big prize with a single ticket. But let's allow them to compare their ticket numbers with previous draws from 1982 to 2018 (in Canada) and see if they could have ever won with those numbers.

## Loading the dataset

In [5]:
import pandas as pd
lottery_data = pd.read_csv(r'C:\Users\nbnav\OneDrive\Desktop\Dataquest\649.csv')
print(lottery_data.head())
print(lottery_data.shape)

   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   
2      649            3                0  6/26/1982               1   
3      649            4                0   7/3/1982               3   
4      649            5                0  7/10/1982               5   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   
2               6              23              24              27   
3               9              10              13              20   
4              14              21              31              34   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  
2              39            34  
3              43     

We can see that the date column "DRAW DATE" is not a datetime object. Let's fix that.

In [6]:
lottery_data['DRAW DATE'] = pd.to_datetime(lottery_data['DRAW DATE'], format='%m/%d/%Y')

In [7]:
lottery_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   PRODUCT          3665 non-null   int64         
 1   DRAW NUMBER      3665 non-null   int64         
 2   SEQUENCE NUMBER  3665 non-null   int64         
 3   DRAW DATE        3665 non-null   datetime64[ns]
 4   NUMBER DRAWN 1   3665 non-null   int64         
 5   NUMBER DRAWN 2   3665 non-null   int64         
 6   NUMBER DRAWN 3   3665 non-null   int64         
 7   NUMBER DRAWN 4   3665 non-null   int64         
 8   NUMBER DRAWN 5   3665 non-null   int64         
 9   NUMBER DRAWN 6   3665 non-null   int64         
 10  BONUS NUMBER     3665 non-null   int64         
dtypes: datetime64[ns](1), int64(10)
memory usage: 315.1 KB


We have 3665 rows of data which means the lottery was drawn 3,665 times from 1982 to 2018.

# Function to match numbers with historical draws

We will need to convert all the six winning numbers as a python set to later compare them with users numbers.

In [8]:
def extract_numbers(data_row):
    numbers = set(data_row.iloc[4:10].values)
    return numbers

Now we apply the function to every row in lottery_data dataset.

In [9]:
all_win_num = lottery_data.apply(extract_numbers, axis=1)

In [10]:
all_win_num = pd.DataFrame(all_win_num, columns=['winning_numbers'])

In [11]:
all_win_num['draw_num'] = lottery_data['DRAW NUMBER']

In [12]:
print(all_win_num.head())
print(all_win_num.shape)

           winning_numbers  draw_num
0  {3, 41, 11, 12, 43, 14}         1
1  {33, 36, 37, 39, 8, 41}         2
2   {1, 6, 39, 23, 24, 27}         3
3   {3, 9, 10, 43, 13, 20}         4
4  {34, 5, 14, 47, 21, 31}         5
(3665, 2)


Now that we have extracted all the winning numbers from the dataset and converted them into a set, we will now define a function that will take in a users numbers and compare it with the extracted numbers and tell the user whether their numbers would have ever won or not.

In [13]:
def check_historical_occurence(user_num_list, pandas_df=all_win_num):
    for idx, num_set in enumerate(pandas_df['winning_numbers']):
        if set(user_num_list) == num_set:
            draw_number = pandas_df['draw_num'].iloc[idx]
            return "Congrats! You would have won in the draw number {}".format(draw_number)
    return "Oops! These numbers have never won!"

# Testing the function

Now let's check if the function works as intended. We will first check on some random numbers and with a high probability, would not get a match. To check the usability of the function, we will pick a random row from the winning numbers.

In [14]:
check_historical_occurence([21, 23, 1, 2, 3, 4])

'Oops! These numbers have never won!'

Now let's pick a random set of winning numbers and also scramble the order of those numbers.

In [15]:
all_win_num.iloc[456]

winning_numbers    {2, 34, 37, 6, 22, 27}
draw_num                              457
Name: 456, dtype: object

In [16]:
check_historical_occurence([34, 2, 6, 37, 27, 22])

'Congrats! You would have won in the draw number 457'

Awesome, our code works as intended. Now users can check if their numbers have ever won the big prize.

# Function for multi-ticket probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

For this function, the users will not submit all their lottery numbers but how many tickets they are playing with.

In [17]:
def multi_ticket_probability(n):
    total_comb = combination(49, 6)
    probability = (n / total_comb) * 100
    formatted_number = f"{probability:.10f}"
    return "The probability of you winning the big prize with these tickets is {} %".format(formatted_number)
    

## Testing the function

Let's say a user had purchased 20 lottery tickets.

In [18]:
multi_ticket_probability(20)

'The probability of you winning the big prize with these tickets is 0.0001430225 %'

# Less winning numbers

In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In [26]:
def probability_exactly_n(integer):
    poss_for_chosen_numbers = combination(6, integer)
    poss_for_remaining_numbers = combination(49 - 6, 6 - integer)
    total_possible_outcomes = poss_for_chosen_numbers * poss_for_remaining_numbers
    possible_outcomes_in_6_49 = combination(49, 6)
    p_winning_number = total_possible_outcomes / possible_outcomes_in_6_49
    formatted_number = round(p_winning_number, 5) * 100
    return "The probability of you winning {} exact numbers is {} %".format(integer, formatted_number)

## Testing the function

Let's say a user wants to know the probability for winning in exactly 3 numbers.

In [27]:
probability_exactly_n(3)

'The probability of you winning 3 exact numbers is 1.765 %'

In [28]:
probability_exactly_n(2)

'The probability of you winning 2 exact numbers is 13.238 %'

We can see that the probability of a user winning increases if they only hope for only certain numbers to win.

# Conclusion

We made four important functions that can be embeded into the app which will allow the users to calculate their chances of winning in each respect.
These functions are:

* Function to calculate probability for a single ticket - Allows the user to calculate the probability of winning the big prize(always outputs the same answer).

* Function to match numbers with historical draws - Allows the user to compare their numbers with historical draws (from Canada) and see if their chosen number has ever won the lottery.

* Function for multi-ticket probability - Allows the user to check the probability of winning the big prize if they buy more than one ticket.

* Function for Less winning numbers - Allows the user to know the probability of winning less than 6 numbers.