In this project, we will try to contribute to the development of a mobile app aimed at helping individuals understand their chances of actually winning a lottery. The app idea originates from a medical institute that specializes in treating gambling addictions.

While the institute already has a team in place for developing the app, we have been tasked with developing the core logic to determine the proper probabilities of winning. For the first version of the app, we want to focus on the 6/49 lottery and answer the following questions:

* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three) winning numbers on a single ticket?

*The scenario we're following throughout this project is fictional — the main purpose is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.*

## Core Functions

We will start off by writing two functions that will aid us in the rest of the project:

* *factorial()* - a function dedicated to calculate factorials
* *combinations()* - a function dedicated to calculate combinations 

In [5]:
def factorial(n):
    final_product = 1
    for i in range(n,0,-1):
        final_product *= i
    return final_product

def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

## One-ticket Probability

We need to build another function that will calculate the probability of winning the prize for any ticket. In a drawing, six numbers are selected from a total of 49, and a winner is chosen if all six of the numbers match.

The engineering team gave us the following details:

* Inside the app, the user inputs six different numbers from 1 to 49.
* Under the hood, the six numbers will come as a Python list and serve as an input to our function.
* The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [12]:
def one_ticket_probability(user_numbers):
    total_outcome = combinations(49,6)
    probability_one_ticket = 1/total_outcome
    percentage = probability_one_ticket*100
    
    print("Your numbers are: {}".format(user_numbers))
    print("Your chance of winning the prize is: {:.6f}%" .format(percentage))
    print("This means that you have a 1 in {:,} chance of winning".format(int(total_outcome)))

In [13]:
test_1 = [2, 43, 22, 23, 11, 5]
one_ticket_probability(test_1)

Your numbers are: [2, 43, 22, 23, 11, 5]
Your chance of winning the prize is: 0.000007%
This means that you have a 1 in 13,983,816 chance of winning


In [14]:
test_2 = [21, 32, 41, 1, 2, 3]
one_ticket_probability(test_2)

Your numbers are: [21, 32, 41, 1, 2, 3]
Your chance of winning the prize is: 0.000007%
This means that you have a 1 in 13,983,816 chance of winning


## Historical Data Check for Canada Lottery

The institute also wants us to use the data from the national 6/49 lottery in Canada. The dataset contains 3,665 drawings dating from 1982 to 2018.

In [20]:
import pandas as pd

lottery = pd.read_csv("C:\\Users\\pc\\Desktop\\Dataquest\\Mobile App for Lottery Addiction\\649.csv")
print(lottery.shape)

(3665, 11)


In [23]:
lottery.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [24]:
lottery.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Function for Historical Data Check

We have also been tasked with writing a function that can help users determine if they would have ever won by now using the numbers that they inputted. Here are some points we should be aware of:

* Inside the app, the user inputs six different numbers from 1 to 49.
* Under the hood, the six numbers will come as a Python list and serve as an input to our function.
* The engineering team wants us to write a function that prints:
    * the number of times the combination selected occurred; and
    * the probability of winning the big prize in the next drawing with that combination.
    
Let's start by extracting the numbers from the lottery data. We will create the function *extract_numbers()* that will go over every row of the dataset and extract the six winning numbers as a Python set.

In [27]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_numbers = lottery.apply(extract_numbers,axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

We will now write a function called *check_historical_occurrence()* that takes the user's numbers and the historical numbers and prints information in regards to the number of occurrences and probability of winning the next drawing.

In [29]:
def check_historical_occurrence(user_numbers,winning_numbers):
    user_numbers = set(user_numbers)
    check_occurrence = user_numbers == winning_numbers
    n_occurrences = check_occurrence.sum()
        
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

In [30]:
test_3 = [21, 42, 23, 11, 14, 1]
check_historical_occurrence(test_3, winning_numbers)

The combination {1, 42, 11, 14, 21, 23} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {1, 42, 11, 14, 21, 23} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [31]:
test_4 = [34, 5, 14, 47, 21, 31]
check_historical_occurrence(test_4, winning_numbers)

The number of times combination {34, 5, 14, 47, 21, 31} has occured in the past is 1.
Your chances to win the big prize in the next drawing using the combination {34, 5, 14, 47, 21, 31} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-ticket Probability

Let's go a bit more in depth and attempt to see the results if the user buys multiple tickets. The engineering team wants us to be aware of the following:

* The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
* Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
* The function should print information about the probability of winning the big prize depending on the number of different tickets played.

We will create a function *multi_ticket_probability()* which takes in the number of tickets and prints the probability.

In [36]:
def multi_ticket_probability(n_tickets):
    total = combinations(49,6)
    successful_outcomes = n_tickets/total
    percentage = successful_outcomes*100
    
    if n_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(percentage, int(total)))
    
    else:
        combinations_simplified = round(total / n_tickets)   
        print('''Your chances to win the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_tickets, percentage,
                                                               combinations_simplified))

Let's go ahead and test the function

In [37]:
test_5 = [1, 100, 1000, 10000]

for i in test_5:
    multi_ticket_probability(i)
    print("======================")

Your chances to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chances to win.
Your chances to win the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chances to win.
Your chances to win the big prize with 1,000 different tickets are 0.007151%.
In other words, you have a 1 in 13,984 chances to win.
Your chances to win the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chances to win.


## Less Winning Numbers

Many 6/49 lotteries comtain smaller prizes if a player's numbers match anywhere from two of the six drawn. Let's attempt to solve for these probabilities as well.

Here are a few details to note:

* Inside the app, the user inputs:
    * six different numbers from 1 to 49; and
    * an integer between 2 and 5 that represents the number of winning numbers expected
* Our function prints information about the probability of having a certain number of winning numbers

To find this probability, we realize that the specific combination on the ticket is irrelevant and we only require the intergers from 2 and 6 that represent the winning numbers.

We will do this by writing a function *probability_less_6()* that takes an integer and prints information about the chances of winning depending on that integer's value.

In [38]:
def probability_less_6(n_winning_numbers):
    
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(49 - n_winning_numbers,
                                           6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    n_combinations_total = combinations(49, 6)
    
    probability = successful_outcomes / n_combinations_total
    probability_percentage = probability * 100
    
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))


In [39]:
for test in [2, 3, 4, 5]:
    probability_less_6(test)
    print('===========================')

Your chances of having 2 winning numbers with this ticket are 19.132653%.
In other words, you have a 1 in 5 chances to win.
Your chances of having 3 winning numbers with this ticket are 2.171081%.
In other words, you have a 1 in 46 chances to win.
Your chances of having 4 winning numbers with this ticket are 0.106194%.
In other words, you have a 1 in 942 chances to win.
Your chances of having 5 winning numbers with this ticket are 0.001888%.
In other words, you have a 1 in 52,969 chances to win.
