# Mobile App for Lottery Addiction

In this project, we are aiming to contribute to a medical institute's mobile app to help estimata the chances of winning the lottery on order to prevent and treat gambling addictions.

The institute's tema of engineers will build the app and they need us to create the logical core of the app and calculate probabilities.

To start, we will focus on the 6/49 lottery, a nationwide Canadian lottery game, and build functions that will enable users to answer questions like:

* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

We will also consider the historical [data](https://www.kaggle.com/datascienceai/lottery-dataset)
that has 3,665 drawings, dating from 1982 (game launched) to 2018. 

## Core Functions

As we mentioned in our goals above, we will answer various probability questions, therefore we will have to repeatedly calculate probabilities and combinations. 

In order to make our analysis more efficient , we will start by writing two functions that we wull use often:

* A function that calculates factiorials; and
* A function that calculates combinations.


In [1]:
#writing a factorial() function

def factorial(n):
    final_product = 1
    for i in range(n,0,-1):
        final_product *= i
    return final_product

#writing a combinations() function

def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator


## One-ticket Probability

Now we will have to build a function that will be able to calculate the probabilitt of winning the big prize with the various numbers they play on a single ticket.

As it was pointed out by the engineering team of the medical institute, we need to be aware of the following details:

* Inside the app, the user inputs six different numbers from 1 to 49.
* Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
* The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.


In [2]:
def one_ticket_probability(a1,a2,a3,a4,a5,a6):
    possible_outcomes = combinations(49,6)
    successful_outcome = 1
    probability = successful_outcome/possible_outcomes
    return "The probability of winning the big prize is {0:.7f} percent".format(probability*100)
            

one_ticket_probability(34,4,16,7,23,1)   

'The probability of winning the big prize is 0.0000072 percent'

In the code above we have created a finction to calculate the probability of winning a big prize when the user inputs six different numbers from 1 to 49. Due to the fact that the user inputs just one combination, the successfull outcome is 1. We have also calculated the number of possible outcomes - total nuimber of combinations for a six-number lottery ticket.

As the output of the function we printed the sentecnce explaining in a friendly way the probability of winning the lottery.


## Historical Data Check for Canada Lottery

Now, since we have a function that can tell users what is the probability if winning the big prize with a single ticket, we should also compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In [3]:
#Let's start by exploring the data

import pandas as pd

lottery = pd.read_csv('649.csv')
lottery.info()
lottery.size

lottery.head(3)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [4]:
lottery.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Function for Historical Data Check

At this stage, we are going to write a function that will enable us to compare the user's ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In [5]:
#extracting all the winning six numbers from historical data
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row
   
winning_numbers = lottery.apply(extract_numbers, axis=1)
winning_numbers.head()


0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [7]:
def check_historical_occurrence(user_numbers, historical_numbers):   
    '''
    user_numbers: a Python list
    historical numbers: a pandas Series
    '''
    
    user_numbers_set = set(user_numbers)
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

In [8]:
test_input = [40, 18, 27, 20, 5, 26]
check_historical_occurrence(test_input_3, winning_numbers)

The combination [40, 18, 27, 20, 5, 26] has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination [40, 18, 27, 20, 5, 26] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [9]:
test_input = [33, 36, 37, 39, 8, 41]
check_historical_occurrence(test_input_3, winning_numbers)

The number of times combination [33, 36, 37, 39, 8, 41] has occured in the past is 1.
Your chances to win the big prize in the next drawing using the combination [33, 36, 37, 39, 8, 41] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-ticket Probability

Besides creating a function to calculate the probability of winning with one ticket, we are going to write a function that will allow th users to calculate the chances of winning for any number of different tickets. This is to be added as lottery addicts usually play more than one ticket on a single drawing in order to increase their chances of winning. Therefore, our purpose here is to help them better estimate their chances.

While writing the function we should take the following factors into the consideration:

* The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
* Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
* The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [20]:
def multi_ticket_probability(n_tickets):
    n_combinations = combinations(49, 6)
    probability = n_tickets/n_combinations
    percentage_form = probability * 100
    if n_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(percentage_form, int(n_combinations)))
    
    else:
        combinations_simplified = round(n_combinations / n_tickets)   
        print('''Your chances to win the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_tickets, percentage_form,
                                                               combinations_simplified))


In [22]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter

Your chances to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chances to win.
------------------------
Your chances to win the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chances to win.
------------------------
Your chances to win the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chances to win.
------------------------
Your chances to win the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chances to win.
------------------------
Your chances to win the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chances to win.
------------------------
Your chances to win the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chances to win.
------------------------
Your chances to win the big prize with 13,983,816 different ti

## Less Winning Numbers - Function


Now we are going to write one more fucntion to allow the users to calculate probabilities foe two, three, four, or five winning numbers as in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn.


In [34]:
def probability_less_6(n_winning_numbers):
    n_combinations_ticket = combinations(6,n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    total_outcomes = combinations(49,6)
    probability = successful_outcomes/total_outcomes
    percentage_form = probability * 100
    combinations_simplified = round(total_outcomes/successful_outcomes)
    return print('''Your chances to win the small prize with {} winning numbers are {:.6f}%
    In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers,percentage_form, int(combinations_simplified)))
      

In [35]:
test_inputs = [2,3,4,5]

for test_input in test_inputs:
    probability_less_6(test_input)
    print('------------------------') # output delimiter

Your chances to win the small prize with 2 winning numbers are 13.237803%
    In other words, you have a 1 in 8 chances to win.
------------------------
Your chances to win the small prize with 3 winning numbers are 1.765040%
    In other words, you have a 1 in 57 chances to win.
------------------------
Your chances to win the small prize with 4 winning numbers are 0.096862%
    In other words, you have a 1 in 1,032 chances to win.
------------------------
Your chances to win the small prize with 5 winning numbers are 0.001845%
    In other words, you have a 1 in 54,201 chances to win.
------------------------


## Next Steps

For the first version of the app, we coded four main functions:

* one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
* check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set
* multi_ticket_probability() — calculates the probability for any number of of tickets between 1 and 13,983,816
* probability_less_6() — calculates the probability of having two, three, four or five winning numbers exactly
Possible features for a second version of the app include:

Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in lottery; for instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than winning the lottery")
Combining the one_ticket_probability() and check_historical_occurrence() to output information on probability and historical occurrence at the same time
Create a function similar to probability_less_6() which calculates the probability of having at least two, three, four or five winning numbers:

* The number of successful outcomes for having four winning numbers exactly
* The number of successful outcomes for having five winning numbers exactly
* The number of successful outcomes for having six winning numbers exactly