# Mobile App for Lottery Prediction
## Introduction
Lottery Addiction is a problem we all know too well. Although it starts out as a fun activity, it mostly turns out to become a habit and inevitably ends in an addiction.

Just like sports betting addicts among others, spending loans and savings is common to lottery addicts. This leaves them in debts and leads them to borrow or even steal.

Assuming a medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they first need to create the logical core of the app and calculate probabilities.

I am going to zero in on the 6/49 lottery and develop functions that wil help users answer these kind of questions:

* What is the probability that I will win the big prize with just one ticket?
* What is the probability that I will win the big prize if I play multiple tickets?
* What is the probability that I will have at least five winning numbers on just one winning ticket?

For the purpose of this project, I will consider historical data from the national 6/49 lottery game in Canada. This data set is available on Kaggle and contains data for 3,665 drawings made between 1982 and 2018.

## Core Functions
Throughout the project, I will need to calculate repeatedly probabilities and combinations.

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

As a consequence, I will start by writing two functions that we'll use throughout this project:

* A function that calculates factorials; and
* A function that calculates combinations.


To calculate factorials, this is the formula:

            n! = n x (n-1)x(n-2)x...x 2 x 1

In [4]:
#Function for computing the factorial.
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

To find the number of combinations when we're sampling without replacement and taking only k objects from a group of n objects, we can use this formula:

            nCk = (nk) = n! / (k!/(n−k)!)

In [5]:
#Function for calculating combinations.
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

## Probability of Winning with one-ticket
One of the main questions that the app is supposed to help answer is: What is the probability that I will win the big prize with just one ticket?

Keeping in mind that in the 6/49 lottery, a player chooses 6 out of 49 numbers for a single ticket.

What I have to do next is to write a function that calculates the probability that a user will win the big prize for any ticket.

According to the engineering team, the following details should be noted.

* Inside the app, the user inputs six different numbers from 1 to 49.
* Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
* The probability value must be printed in a friendly way — in a way that people without any probability training are able to understand.

In [6]:
#Function for finding the probability of wining with one ticket.
def one_ticket_probability(six_numbers):
    no_combinations = combinations(49, 6)
    probability_one_ticket = 1/no_combinations
    probability = probability_one_ticket * 100
    return print('''You have a {:.7f}% chance of winning the big prize with one ticket when you use the numbers {}!
This means you have 1 in {:,} chances of winning the lottery.'''.format(probability, six_numbers, int(no_combinations)))

We now test the function on two different outputs.

In [7]:
one_ticket_probability([1,2,43,13,5,6])

You have a 0.0000072% chance of winning the big prize with one ticket when you use the numbers [1, 2, 43, 13, 5, 6]!
This means you have 1 in 13,983,816 chances of winning the lottery.


In [8]:
one_ticket_probability([12,41,33,21,7,9])

You have a 0.0000072% chance of winning the big prize with one ticket when you use the numbers [12, 41, 33, 21, 7, 9]!
This means you have 1 in 13,983,816 chances of winning the lottery.


## Historical Data Check for Canada Lottery

Here we will add a feature to compare their tickets against the historical lottery data and determine whether they would have ever won by now.

This dataset is available in Kaggle

Let's scrutinize the data...

In [19]:
#Importing the dataset and saving it as a pandas DataFrame
import pandas as pd
lottery_canada = pd.read_csv("649.csv")

#Printing the number of rows and columns
shape = lottery_canada.shape
print("The dataset has {} rows and {} columns.".format(shape[0],shape[1]))

The dataset has 3665 rows and 11 columns.


In [20]:
#Printing the first three rows
lottery_canada.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [21]:
#Printing the last three rows
lottery_canada.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The data set contains the historcial data for 3,665 drawings, dating from 1982 to 2018. We can find the six numbers drawn in the following six columns:

* NUMBER DRAWN 1
* NUMBER DRAWN 2
* NUMBER DRAWN 3
* NUMBER DRAWN 4
* NUMBER DRAWN 5
* NUMBER DRAWN 6

## Function for Historical Data Check

We are now going to write a function that helps users compare their ticket with the historical data from the Canada lottery.

Here are a few things that we will consider when writing the function:

* To use the lottery app, the user will input 6 different numbers from 1 to 49.
* The 6 numbers will be presented as a Python list under the hood and will serve as the input to our function.
* The function will print:
 - the frequency of the selected combination in the Canada data set
 - the probability of winning the big prize with the selected combination in the next drawing.


In [46]:
#Function for extracting winning numbers.
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row


# Using extract_numbers function to extract all winning numbers.
won_lottery = lottery_canada.apply(extract_numbers,axis = 1)
    
won_lottery.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [45]:
def check_historical_occurence(user_list, user_series):
    user_numbers_set = set(user_list)
    check_occurrence = user_series == user_numbers_set
    n_occurrences = sum(check_occurrence)
    
    if n_occurrences == 0:
        print('The combination {} never occured. The chances of drawing the combination {} is 0.00001% which implies 1 in 13,983,816 chances to win'.format(user_list, user_list))
        
    else:
        print('This combination {} has occured {} times. Chances of winning the big prize using the combination {} is 0.00001% which is 1 in 13,983,816 chances to win'.format(user_list, n_occurrences, user_list))

In [49]:
#testing the function
check_historical_occurence([5,20,14,23,15,44],won_lottery)

The combination [5, 20, 14, 23, 15, 44] never occured. The chances of drawing combination [5, 20, 14, 23, 15, 44] are 0.00001% which is 1 in 13,983,816 chances to win


In [50]:
# Test for numbers that won the lottery
check_historical_occurence([3, 41, 43, 12, 11, 14], won_lottery)

The number of times combination [3, 41, 43, 12, 11, 14] has occured is 1. Chances to win the big prize using the combination [3, 41, 43, 12, 11, 14] are 0.00001% which is 1 in 13,983,816 chances to win


The essense of this is to check if the numbers had occured in the past in order to know if there are chances of it occuring in the future.

* For instances which had occured in the past, we have slender chances of getting them in the future.
* For instances which will never occur, there are chances that they may occur in future.

Lottery addicts mostly play more than one ticket on a single drawing, assuming that this might increase their chances of winning significantly. The purpose of this is to help them better estimate their chances of winning. Next, is a function that will allow the users to calculate the chances of winning for any number of different tickets.

Here are a few important details we will be considering when we write the function:

* Users will input the number of different tickets they will like to play without indicating the combinations they want to play.
* The function will receive integers ranging from 1 to 13,983,816 as input.
* The function should print a personalized message about the chances of wining the big prize based on the number of different tickets inputted.

## Multi - ticket Probability

In [58]:
#Function for finding the probability of using multiple tickets.
def multi_ticket_probability(no_of_tickets):
    total_combinations = combinations(49,6)
    probability = (1 /total_combinations) * no_of_tickets
    
    return "If you purchase {:,} tickets, you have {:%} chance of winning the lottery.".format(no_of_tickets, probability)

In [72]:
# Testing the function using the following inputs

test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for x in test:
    print(multi_ticket_probability(x))
    print('--------------------------') # output delimiter

If you purchase 1 tickets, you have 0.000007% chance of winning the lottery.
--------------------------
If you purchase 10 tickets, you have 0.000072% chance of winning the lottery.
--------------------------
If you purchase 100 tickets, you have 0.000715% chance of winning the lottery.
--------------------------
If you purchase 10,000 tickets, you have 0.071511% chance of winning the lottery.
--------------------------
If you purchase 1,000,000 tickets, you have 7.151124% chance of winning the lottery.
--------------------------
If you purchase 6,991,908 tickets, you have 50.000000% chance of winning the lottery.
--------------------------
If you purchase 13,983,816 tickets, you have 100.000000% chance of winning the lottery.
--------------------------


The reason for doing this is to check the actual chances of winning the lottery when the number of tickets are defined. Because the more tickets you have, the higher your chance.

## Probability of winning Smaller prizes
In 6/49 lotteries, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. Consequently, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are details that should be noted:
* Inside the app, the user inputs:
  - six different numbers from 1 to 49; and
  - an integer between 2 and 5 that represents the number of winning numbers expected
* The function prints information about the probability of having the inputted number of winning numbers.


In [69]:
#Function for calculating the probabilities of having exactly two, three, four, or five winning numbers.
def probability_less_6(n):
    ticket_combination = combinations(6,n)
    combination_remaining = combinations(43,6-n)
    winning_outcomes = ticket_combination*combination_remaining
    
    total_no_outcomes = combinations(49,6)
    
    probability = winning_outcomes/total_no_outcomes
    combinations_simplified = round(total_no_outcomes/winning_outcomes) 
    
    print("You have {:%} probability of getting exactly {} winning numbers".format(probability,n))
    print("In other words, you have a 1 in {:,} chances to win.".format(combinations_simplified))

In [71]:
# Testing the function probability_less_6
test_case = [2,3,4,5]

for number in test_case:
    print(probability_less_6(number))
    print('--------------------------')

You have 13.237803% probability of getting exactly 2 winning numbers
In other words, you have a 1 in 8 chances to win.
None
--------------------------
You have 1.765040% probability of getting exactly 3 winning numbers
In other words, you have a 1 in 57 chances to win.
None
--------------------------
You have 0.096862% probability of getting exactly 4 winning numbers
In other words, you have a 1 in 1,032 chances to win.
None
--------------------------
You have 0.001845% probability of getting exactly 5 winning numbers
In other words, you have a 1 in 54,201 chances to win.
None
--------------------------


The implication of this function is as follows:

* Depending on the value of the integer between 2 and 5 it print the chances of winning

* The lesser the value of winning numbers, the greater the chances of winning



Let's make some changes to the probability_less_6() function to calculate the probability of having at least 2, 3, 4, or 5 winning numbers.

For every entered number n, the new function will calculate the sum of the number of winning outcomes for having exactly n+1, n+2,...,6 winning numbers.

For instance, the number of successful outcomes for having at least 3 winning numbers will be the sum of:

* The number of winning outcomes for having exactly 3 winning numbers.
* The number of winning outcomes for having exactly 4 winning numbers.
* The number of winning outcomes for having exactly 5 winning numbers.
* The number of winning outcomes for having exactly 6 winning numbers.

In [73]:
#Function for calculating the probability of having at least 2, 3, 4, or 5 winning numbers.
def probability_at_least(n):
    tot_successful_outcomes = 0
    for x in range(n,7):
        number_of_combinations = combinations(6,x)
        number_of_combinations_left = combinations(43, 6-x)
        successful_outcomes = number_of_combinations * number_of_combinations_left
        tot_successful_outcomes = tot_successful_outcomes + successful_outcomes
    
    tot_possible_outcomes = combinations (49, 6)
    
    probability = tot_successful_outcomes / tot_possible_outcomes * 100
    combination_rounded = round(tot_possible_outcomes/tot_successful_outcomes)
    print('''You have a {:.7f}% chance of having at least {} winning numbers with this ticket.
This means you have 1 in {} chances of winning'''.format(probability, n, combination_rounded))

Now I will test the probability_at_least() function with all 4 possible inputs...

In [74]:
for winning_numbers in [2,3,4,5]:
    probability_at_least(winning_numbers)
    print("-------------------------")

You have a 15.1015574% chance of having at least 2 winning numbers with this ticket.
This means you have 1 in 7 chances of winning
-------------------------
You have a 1.8637545% chance of having at least 3 winning numbers with this ticket.
This means you have 1 in 54 chances of winning
-------------------------
You have a 0.0987141% chance of having at least 4 winning numbers with this ticket.
This means you have 1 in 1013 chances of winning
-------------------------
You have a 0.0018521% chance of having at least 5 winning numbers with this ticket.
This means you have 1 in 53992 chances of winning
-------------------------


## Conclusion

Conclusion

I began this project with the goal to write the logic for an app that provides lottery addicts with better estimates of their chances of winning the lottery.


To achieve this, I developed the following functions:

* one_ticket_probability() — calculates the probability of winning the lottery with only one ticket.
* check_historical_occurrence() — checks if a particular combination occurred in the Canada lottery data set.
* multi_ticket_probability() — calculates the probability of winning the lottery with any number of tickets between 1 and 13,983,816.
* probability_less_6() — calculates the probability of having exactly two, three, four or five winning numbers to win smaller prizes.
* probability_at_least() - calculates the probability of having at least two, three, four or five winning numbers to win smaller prizes.



Here are the questions we started with and the answers we got:

* What is the probability that I will win the big prize with just one ticket?
 - From the analysis, you are over 400,000 times more likely to become wealthy from making investments or running a business than you are to winning the lottery with a single ticket.

* What is the probability that I will win the big prize if I play multiple tickets?
 - The more tickets you have, the higher your chances of winning the lottery,but the chance only increases largely with a significant amount of tickets; which will cost you a lot of money.



* Given that a combination costs $3:
 - 3 million dollars worth of tickets will only give you a 7.2 % chance.
 - You will need about 20 million dollars worth of tickets to get a 50% chance at winning.



* What is the probability that I will win smaller prizes?
 - The probability of winning smaller prizes is relatively higher with a smaller number of expected winning numbers. You have a better chance of getting exactly 2 winning numbers (13.238%) than getting exactly 5 winning numbers (0.002%).



* What is the probability that I will have at least five winning numbers on just one winning ticket?
 - You have 1 in 53,992 chances of having at least 5 winning numbers on a ticket. This means you are 5 times more likely to win an Oscar award than you are to have at least 5 winning numbers on a 6/49 lottery ticket. So, probably enrolling in acting classes may be a better investment than buying lottery tickets.

