**Project: Mobile App for Lottery Addiction**

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. 

The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and to build functions that enable users to answer questions:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. 

[The dataset](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.


In [1]:
# Functions
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

def one_ticket_prob(n, k):
    res = 1 / combinations(n, k)
    print('There is a 1 in {} million chance with once ticket.'.format(round(1 / (1e6 * res), 1) ) )
    return res

** Data Model**

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('649.csv')
print('There are {} rows and {} columns.'.format(df.shape[0], df.shape[1]))

There are 3665 rows and 11 columns.


In [4]:
df.sample(10, random_state = 0)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
775,649,776,0,6/29/1991,3,5,10,12,13,40,43
117,649,118,0,9/8/1984,23,25,29,38,39,43,17
3457,649,3384,0,6/25/2016,4,5,6,8,39,42,30
2601,649,2602,0,12/27/2008,6,10,20,23,45,47,11
3324,649,3251,0,3/18/2015,16,17,37,39,41,45,14
1593,649,1594,0,5/1/1999,6,12,18,20,30,45,40
598,649,599,0,10/18/1989,1,24,28,40,46,49,39
3645,649,3572,0,4/14/2018,1,11,19,36,48,49,9
2652,649,2653,0,6/24/2009,13,19,23,33,38,48,2
3235,649,3162,0,5/10/2014,16,23,33,36,48,49,37


In [5]:
def extract_numbers(row):
    return set(row[4:10].values)

winners = df.apply(extract_numbers, axis = 1)
winners.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [6]:
def check_historicals(nums, set_series):
    '''
    This function will check a list of historical winning sets
    and output the number of times they occur.
    '''
    nums = set(nums)
    occurs = 0
    for s in set_series:
        if nums in s:
            occurs += 1
    
    if occurs == 0:
        print('These numbers have never come up.')
    else:
        print('These numbers have come up {} times. This makes no difference to your chance of winning with them.'.format(occurs))
    
    return occurs

In [7]:
check_historicals((8, 37, 22, 12, 1, 3), winners)

These numbers have never come up.


0

In [8]:
def multi_ticket_prob(n_tickets, n_numbers, n_picked):
    comb = combinations(n_numbers, n_picked)
    prob = n_tickets / comb
    
    print_comb = round(comb / (1e3 * n_tickets), 2)
    print('There is a 1 in {} thousand chance of winning.'.format(print_comb))
    return prob

In [9]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

probs = []
for tickets in test_inputs:
    probs.append(
        multi_ticket_prob(tickets, 49, 6)
    )

There is a 1 in 13983.82 thousand chance of winning.
There is a 1 in 1398.38 thousand chance of winning.
There is a 1 in 139.84 thousand chance of winning.
There is a 1 in 1.4 thousand chance of winning.
There is a 1 in 0.01 thousand chance of winning.
There is a 1 in 0.0 thousand chance of winning.
There is a 1 in 0.0 thousand chance of winning.


AS the number of tickets increases the players chances of winning improve.

In [10]:
def prob_less_6(n_winning):
    
    n_comb_ticket = combinations(6, n_winning)
    n_comb_remaining = combinations(43, 6 - n_winning)
    successful = n_comb_ticket * n_comb_remaining
    
    n_comb_total = combinations(49, 6)    
    prob = successful / n_comb_total
    
    print_comb = round(n_comb_total / successful)
    print('There is a 1 in {} chance of winning.'.format(print_comb))
    return prob

In [11]:
for test_input in [2, 3, 4, 5]:
    prob_less_6(test_input)

There is a 1 in 8 chance of winning.
There is a 1 in 57 chance of winning.
There is a 1 in 1032 chance of winning.
There is a 1 in 54201 chance of winning.


**Next steps **

For the first version of the app, we coded four main functions:

- one_ticket_prob() — calculates the probability of winning the big prize with a single ticket

- check_historicals() — checks whether a certain combination has occurred in the Canada lottery data set

- multi_ticket_prob() — calculates the probability for any number of of tickets between 1 and 13,983,816

- prob_less_6() — calculates the probability of having two, three, four or five winning numbers exactly


Possible features for a second version of the app include:

- Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in lottery; for instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than winning the lottery")

- Combining the one_ticket_probability() and check_historical_occurrence() to output information on probability and historical occurrence at the same time

- Create a function similar to probability_less_6() which calculates the probability of having at least two, three, four or five winning numbers. Hint: the number of successful outcomes for having at least four winning numbers is the sum of these three numbers:

    - The number of successful outcomes for having four winning numbers exactly
    
    - The number of successful outcomes for having five winning numbers exactly
    
    - The number of successful outcomes for having six winning numbers exactly
    