#  Lottery, fun habit, that ends into gambling addiction.

## Introduction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

**A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.**

For the first version of the app, they want us to focus on the **6/49 lottery** and build functions that enable users to answer questions like:

<ul>
<li>What is the probability of winning the big prize with a single ticket?</li>
<li>What is the probability of winning the big prize if we play 40 different tickets (or any other number)?</li>
<li>What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?</li>
<li>The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018.</li>
</ul>

The scenario we're following throughout this project is fictional — the main purpose is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.

## Imports

In [20]:
import math
import pandas as pd
import numpy as np

## Functions

In [12]:
def factorial(n):
    return math.factorial(int(n))

In [13]:
def combinations(n, k):
    n_fac = factorial(n)
    k_fac = factorial(k)
    return n_fac / (k_fac * factorial(n - k))

In [18]:
def one_ticket_probability(num_list):
    comb = combinations(49,6)    
    chances = 1 * 100 / comb
    print("Your chances to win the big prize is {:.8f}%. \nIn other words, you have a 1 in 13,983,816 chances to win.".format(chances))    
    return num_list,chances  

In [19]:
one_ticket_probability([])

Your chances to win the big prize is 0.00000715%. 
In other words, you have a 1 in 13,983,816 chances to win.


([], 7.151123842018516e-06)

With the previous implemented functions we are ell users what is the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

We're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In [31]:
def extract_numbers(row):
    return set(row[4:10])

In [42]:
def check_historical_occurence(winning_numbers, numbers):
    ticket = one_ticket_probability(numbers)
    matches = winning_numbers == numbers
    if matches.sum() == 0:
        print("The combination has never occured.")
    else:
        print("The combination " + str(numbers) + " has occured: " + str(matches.sum()) + " times.")
    

## Reading Data

In [21]:
df = pd.read_csv("649.csv")

In [22]:
df.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [24]:
df.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [25]:
df.shape

(3665, 11)

As we can see from the previous cells, the dataset is formed by **3665** entries and **11** features. Which include:
<ul>
 <li>PRODUCT</li>
 <li>DRAW NUMBER</li>
 <li>SEQUENCE NUMBER</li>
 <li>DRAW DATE</li>
 <li>NUMBER DRAWN 1</li>
 <li>NUMBER DRAWN 2</li>
 <li>NUMBER DRAWN 3</li>
 <li>NUMBER DRAWN 4</li>
 <li>NUMBER DRAWN 5</li>
 <li>NUMBER DRAWN 6</li>
 <li>BONUS NUMBER</li>
</ul>

Let's check how many times have occured different random combinations of numbers:

In [33]:
winning_numbers = df.apply(extract_numbers, axis=1)
winning_numbers

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3         {3, 9, 10, 43, 13, 20}
4        {34, 5, 14, 47, 21, 31}
                  ...           
3660    {38, 40, 41, 10, 15, 23}
3661    {36, 46, 47, 19, 25, 31}
3662     {32, 34, 6, 22, 24, 31}
3663     {2, 38, 15, 49, 21, 31}
3664    {35, 37, 14, 48, 24, 31}
Length: 3665, dtype: object

In [43]:
check_historical_occurence(winning_numbers, {7, 41, 11, 12, 43, 14})

Your chances to win the big prize is 0.00000715%. 
In other words, you have a 1 in 13,983,816 chances to win.
The combination has never occured.


In [45]:
check_historical_occurence(winning_numbers, {7, 10, 11, 12, 43, 14})

Your chances to win the big prize is 0.00000715%. 
In other words, you have a 1 in 13,983,816 chances to win.
The combination has never occured.


In [47]:
check_historical_occurence(winning_numbers, {12, 41, 11, 12, 43, 14})

Your chances to win the big prize is 0.00000715%. 
In other words, you have a 1 in 13,983,816 chances to win.
The combination has never occured.


In [44]:
check_historical_occurence(winning_numbers, {3, 41, 11, 12, 43, 14})

Your chances to win the big prize is 0.00000715%. 
In other words, you have a 1 in 13,983,816 chances to win.
The combination {3, 41, 11, 12, 43, 14} has occured: 1 times.


As we can see from the previous **4** random combinations, only 1 have a historical record as a winning number. But, what we want to clarify is that all combinations have the same probablities of winning: **0.00000715%**.