# A Mobile App for Lottery Addiction

In this project we will play the role of data scientist consultant for a medical institute that aims to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

For the first version of the app, they want us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that enable users to answer questions like:

* What is the probability of winning the big prize with a sigle ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

In [1]:
# Import libraries we'll use
import pandas as pd
import numpy as np
import random
import math
import matplotlib.pyplot as plt
import seaborn as sns

# Display graphs in notebook
%matplotlib inline

## Core functions

We'll be using these functions regularly to make probability calculations:
* `factorial()` to calculate factorials
* `combinations()` to calculate combinations

In [2]:
# Function to calculate the factorial of any integer n
def factorial(n):
    answer = 1
    for i in range(n,0,-1):
        answer *= i
    return int(answer)

# Function to calculate the number of unique combinations for only k objects from a group of n objects
def combinations(n, k):
    return int(factorial(n) / (factorial(k)*factorial(n-k)))

## One ticket probability

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:
* Inside the app, the user inputs six different numbers from 1 to 49.
* Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
*The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

We will write the function and then use a random ticket number generator function, `lottery_ticket()`, to test our new `one_ticket_probability()` function.

In [3]:
# Function that takes in a list of six unique numbers and prints the probability of winning
def one_ticket_probability(six_numbers):
    total_outcomes = combinations(49, 6)
    probability = 1 / total_outcomes
    percentage = probability * 100
    
    print('''Your lottery numbers, {}, have a {:.7f}% chance to win the big prize.
In other words, you have a 1 in {:,} chance to win.'''.format(six_numbers, percentage, total_outcomes))

# Function for generating a random six-number lottery ticket
def lottery_ticket():
    ticket = []
    while len(ticket) < 6:
        ticket.append(random.randint(1,49))
    return ticket

# Lottery ticket tests
test_1 = lottery_ticket()
test_2 = lottery_ticket()
test_3 = lottery_ticket()

# Our test runs
one_ticket_probability(test_1)
print('\n')
one_ticket_probability(test_2)
print('\n')
one_ticket_probability(test_3)

Your lottery numbers, [32, 40, 49, 8, 48, 3], have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.


Your lottery numbers, [28, 21, 37, 29, 22, 6], have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.


Your lottery numbers, [10, 49, 44, 41, 11, 19], have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.


Each ticket combination will have the same probability, so our output looks like it should.

## Exploring historical Canada lottery ticket data

For the first version of the app, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

We'll explore the historical data coming from the Canada 6/49 lottery. The data set can be downloaded from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset). 

Lets take a look.

In [4]:
# Read in the lottery history data and display the number of rows and columns
lotto_hist = pd.read_csv('649.csv')
print('The dataset has ' + str(lotto_hist.shape[0]) + ' rows and ' + str(lotto_hist.shape[1]) + ' columns.')

The dataset has 3665 rows and 11 columns.


In [5]:
# Display the first and last three rows of the dataset
pd.concat([lotto_hist.head(3), lotto_hist.tail(3)])

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


We can find the six numbers drawn in the following six columns:
* `NUMBER DRAWN 1`
* `NUMBER DRAWN 2`
* `NUMBER DRAWN 3`
* `NUMBER DRAWN 4`
* `NUMBER DRAWN 5`
* `NUMBER DRAWN 6`

## Comparing user lottery tickets against historical lottery data

We'll write a function, `check_historical_occurence()` , that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

The engineering team wants us to write a function that prints:
* The number of times the combination selected occurred in the Canada data set.
* The probability of winning the big prize in the next drawing with that combination.

First, we'll write a function, `extract_numbers()`, to extract all the winning numbers from the dataset.

In [6]:
# Function to extract the six winning lottery numbers from a row of the lottery dataframe
def extract_numbers(row):
    numbers = row[4:10]
    numbers = set(numbers.values)
    return numbers

# Apply the new function to the dataset and display the first few results
winning_numbers = lotto_hist.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Now we'll write the `check_historical_occurence()` function next, which will print information on how many times the user's numbers have occurred and on the probability of winning.

In [7]:
# Function to compare user lottery numbers to past winning lottery numbers and print info about the probability of winning
def check_historical_occurrence(user_numbers, historical_numbers):
    # user_numbers: a list
    # historical_numbers: a series
    user_numbers = set(user_numbers)
    check_occurrences = user_numbers == historical_numbers
    n_occurrences = check_occurrences.sum()
    
    if n_occurrences == 0:
        print(''''Your lottery numbers, {}, have never been drawn in the past.
You have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.'''.format(user_numbers))
    else:
        print(''''The number of times your lottery numbers, {}, have occured in the past is {}.
You have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.'''.format(user_numbers, n_occurrences))

# Test runs
test_4 = [7, 24, 11, 32, 49, 20]
test_5 = [34, 5, 14, 47, 21, 31]

check_historical_occurrence(test_4, winning_numbers)
print('\n')
check_historical_occurrence(test_5, winning_numbers)

'Your lottery numbers, {32, 7, 11, 49, 20, 24}, have never been drawn in the past.
You have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.


'The number of times your lottery numbers, {34, 5, 14, 47, 21, 31}, have occured in the past is 1.
You have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.


## Multi-ticket winning probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — we're going to write a function, `multi_ticket_probability()`, that will allow the users to calculate the chances of winning for any number of different tickets.

In [8]:
# Function that calculates the probability of winning the lottery given an n number of tickets
def multi_ticket_probability(n_tickets):
    total_combos = int(combinations(49,6))
    percentage = n_tickets / total_combos * 100
    if n_tickets == 1:
        print('''If you buy 1 lottery ticket, you have a {:.7f}% chance to win the big prize.
In other words, you have a 1 in {:,} chance to win.'''.format(percentage, total_combos))
    else:
        combo_simple = round(total_combos  / n_tickets)
        print('''If you buy {} lottery tickets, you have a {:.7f}% chance to win the big prize.
In other words, you have a 1 in {:,} chance to win.'''.format(n_tickets, percentage, combo_simple))
              
# Test runs
test_inputs = [1, 10, 100, 10000, 1000000, 13983816]

for test in test_inputs:
    multi_ticket_probability(test)
    print('\n')

If you buy 1 lottery ticket, you have a 0.0000072% chance to win the big prize.
In other words, you have a 1 in 13,983,816 chance to win.


If you buy 10 lottery tickets, you have a 0.0000715% chance to win the big prize.
In other words, you have a 1 in 1,398,382 chance to win.


If you buy 100 lottery tickets, you have a 0.0007151% chance to win the big prize.
In other words, you have a 1 in 139,838 chance to win.


If you buy 10000 lottery tickets, you have a 0.0715112% chance to win the big prize.
In other words, you have a 1 in 1,398 chance to win.


If you buy 1000000 lottery tickets, you have a 7.1511238% chance to win the big prize.
In other words, you have a 1 in 14 chance to win.


If you buy 13983816 lottery tickets, you have a 100.0000000% chance to win the big prize.
In other words, you have a 1 in 1 chance to win.





## Partial lottery number matches

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:
* Inside the app, the user inputs:
    * Six different numbers from 1 to 49; and
    * An integer between 2 and 5 that represents the number of winning numbers expected.

Our function, `probability_less_6()`, will print information about the the probability of having the inputted number of winning numbers.

In [9]:
# Function to calculate probability of matching n numbers on any given ticket
def probability_less_6(n_matches):
    total_outcomes = int(combinations(49, 6))
    n_number_combinations = combinations(6, n_matches)
    
    outcomes_per_combo = combinations(43, 6 - n_matches)
    successful_outcomes = n_number_combinations * outcomes_per_combo
    
    percentage =  successful_outcomes * 100 / total_outcomes
    combo_simple = round(total_outcomes / successful_outcomes)
    print('''Your chances of having {} winning numbers on your ticket is {:.7f}%.
In other words, you have a 1 in {:,} chance to win.'''.format(n_matches, percentage, combo_simple))
    
# Test runs
test_inputs = [2, 3, 4, 5]
for test in test_inputs:
    probability_less_6(test)
    print('\n')

Your chances of having 2 winning numbers on your ticket is 13.2378029%.
In other words, you have a 1 in 8 chance to win.


Your chances of having 3 winning numbers on your ticket is 1.7650404%.
In other words, you have a 1 in 57 chance to win.


Your chances of having 4 winning numbers on your ticket is 0.0968620%.
In other words, you have a 1 in 1,032 chance to win.


Your chances of having 5 winning numbers on your ticket is 0.0018450%.
In other words, you have a 1 in 54,201 chance to win.




These outputs answer the question: "What is the probability of having **exactly** n winning numbers?"

But what about the probability for having **at least** n winning numbers? We'll write a separate function below.

In [11]:
def probability_less_6_atleast(n_matches):
    total_outcomes = int(combinations(49, 6))

    successful_outcomes = 0   
    for i in range(n_matches, 7):
        n_number_combinations = combinations(6, i)
        outcomes_per_combo = combinations(43, 6 - i)
        successful_outcomes += n_number_combinations * outcomes_per_combo
    
    percentage =  successful_outcomes * 100 / total_outcomes
    combo_simple = round(total_outcomes / successful_outcomes)
    
    print('''Your chances of having at least {} winning numbers on your ticket is {:.7f}%.
In other words, you have a 1 in {:,} chance to win.'''.format(n_matches, percentage, combo_simple))
    
# Test runs
test_inputs = [2, 3, 4, 5]
for test in test_inputs:
    probability_less_6_atleast(test)
    print('\n')

Your chances of having at least 2 winning numbers on your ticket is 15.1015574%.
In other words, you have a 1 in 7 chance to win.


Your chances of having at least 3 winning numbers on your ticket is 1.8637545%.
In other words, you have a 1 in 54 chance to win.


Your chances of having at least 4 winning numbers on your ticket is 0.0987141%.
In other words, you have a 1 in 1,013 chance to win.


Your chances of having at least 5 winning numbers on your ticket is 0.0018521%.
In other words, you have a 1 in 53,992 chance to win.




## Conclusion

We created five functions for the first version of a gambling addiction app:
* `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket.
* `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set.
* `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816.
* `probability_less_6()` — calculates the probability of having two, three, four or five winning numbers.
* `probability_less_6_atleast()` — calculates the probability of having *at least* two, three, four or five winning numbers.

We hope this app will help combat gambling addiction by sharing eye-opening probabilities.

## Next steps

Possible features for a second version of the app include:
* Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in lottery; for instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than winning the lottery").
* Combining `one_ticket_probability()` and `check_historical_occurrence()` functions to output information on probability and historical occurrence at the same time.
* Begin adding information/probabilities for other lottery formats and gambling games.