# Mobile App for Lottery Addiction


Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:

* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

The scenario we're throughout this project is fictional — the main purpose is to practice applying the concepts we learned in a setting that simulates a real-world scenario.

## Core Functions

We will write two functions to calculate repeatedly probabilities and combinations:

 - A function that calculates factorials;
     #### `n! = n x (n - 1) x (n - 2) x ... x (2) x (1)`
     
In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.


 - A function that calculates combinations;
     ####  `nCk = n! / ( k! * (n - k)! )`
     
To find the number of combinations when we're sampling without replacement and taking only **k** objects from a group of **n** objects.
     

 


In [1]:
# factorials function:
def factorial(n):
    result = 1
    for i in range(n, 0, -1):  #reverse step -1
        result *= i
    return result 

# combinations function:
def combinations (n, k): 
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

## One-ticket Probability

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, he doesn't win.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:

* Inside the app, the user inputs six different numbers from 1 to 49.


* Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.


* The engineering team wants the function to print the probability value in a friendly way — in a way that people without any     probability training are able to understand.

In [2]:
# function that calculates one_ticket probability to win big prize:
def one_ticket_prob (num_list):
    total_outcomes = combinations(49, 6)
    one_ticket_probability = 1 / total_outcomes
    return print("Based on your numbers {}, chance to win is {:.7%}".format(num_list, one_ticket_probability))

In [3]:
# function test:
test_numbers = [5,1,3,10,31,6]
one_ticket_prob(test_numbers)

Based on your numbers [5, 1, 3, 10, 31, 6], chance to win is 0.0000072%


### Historical Data Check for Canada Lottery 

Now, we'll focus on exploring the historical data coming from the Canada 6/49 lottery. The data set can be downloaded from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset)

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- `NUMBER DRAWN 1`


- `NUMBER DRAWN 2`


- `NUMBER DRAWN 3`


- `NUMBER DRAWN 4`


- `NUMBER DRAWN 5`


- `NUMBER DRAWN 6`

In [4]:
# general data set information:
import pandas as pd
lottery = pd.read_csv("649.csv")
print("Number of rows and columns:", lottery.shape)
print("\n")
lottery.sample(10)

Number of rows and columns: (3665, 11)




Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
1508,649,1509,0,7/8/1998,10,26,28,35,38,39,3
2912,649,2904,0,11/19/2011,3,4,14,18,28,31,47
3593,649,3520,0,10/14/2017,16,22,28,29,32,34,43
2674,649,2675,0,9/9/2009,3,6,17,27,28,44,43
2870,649,2871,0,7/27/2011,2,3,15,23,41,46,10
520,649,521,0,1/18/1989,6,9,14,16,19,28,26
3068,649,3003,0,10/31/2012,4,8,17,23,34,47,30
2175,649,2176,0,11/27/2004,12,15,30,38,47,48,27
562,649,563,0,6/14/1989,6,13,24,30,32,33,35
1708,649,1709,0,6/7/2000,7,10,27,42,43,47,44


## Function for Historical Data Check

we're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In [5]:
# function to extract only numbers and make a set:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

# extracting all winning series of numbers:
winning_numbers = lottery.apply(extract_numbers, axis = 1)

# function to check historical occurence:
def check_historical_occurence(user_list, winning_numbers):
    user_numbers = set(user_list)
    comparing_result = user_numbers == winning_numbers  # compare results and returns True or False
    wins = 0
    for v in comparing_result:
        if v == True:
            wins += 1
        else:
            False
    return print ("The inputted combination occurred in the past {} times.".format(wins))

We made a function to compare the user's possible numbers combination with the results in the past.
Now let's make some tests:

In [6]:
# test 1 , any combination: 
user_1_numbers = [49,40,30,20,10,9]
user_1_check = check_historical_occurence(user_1_numbers, winning_numbers)

The inputted combination occurred in the past 0 times.


In [7]:
# test 2, winnig combination in the past (6/14/2014):
user_2_numbers = [7,11,12,13,35,41]
user_2_check = check_historical_occurence(user_2_numbers, winning_numbers)

The inputted combination occurred in the past 1 times.


The results of our tests showed if a user input a number combination it will show him or her how many times the combination occured in the past.

## Multi-ticket Probability

Before we made some calculations for a one-ticket probability to win.

And now we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

In [8]:
# function counts the winning probability depending on the number of different tickets played:
def multi_ticket_probability(num_tickets):
    total_outcomes = combinations(49,6)
    multi_ticket_prob = num_tickets/total_outcomes
    return print("If you buy {} tickets, your chance to win {:.6%}".format(num_tickets, multi_ticket_prob))

# testing of the function:
testing = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for i in testing:
    print (multi_ticket_probability(i))

If you buy 1 tickets, your chance to win 0.000007%
None
If you buy 10 tickets, your chance to win 0.000072%
None
If you buy 100 tickets, your chance to win 0.000715%
None
If you buy 10000 tickets, your chance to win 0.071511%
None
If you buy 1000000 tickets, your chance to win 7.151124%
None
If you buy 6991908 tickets, your chance to win 50.000000%
None
If you buy 13983816 tickets, your chance to win 100.000000%
None


Our results show, that to get a 50% chance to win is a large amount of tickets should be bought, about 700K tickets.

## Less Winning Numbers - Function

So far, we wrote three main functions:

- `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket


- `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set


- `multi_ticket_probability()` — calculates the probability for any number of tickets between 1 and 13,983,816


Next, we're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

In [9]:
# function that prints chances with 2-5 winning numbers:
def probability_less_6(int_2_5):
    possible_outcomes = combinations(6, int_2_5)
    dif_possible_outcomes = combinations(43 ,6 - int_2_5)
    total_possible_outcomes = possible_outcomes * dif_possible_outcomes
    total_outcomes = combinations(49,6)
    probability = total_possible_outcomes/total_outcomes
    return print("With {} winning numbers, your chance to win is {:.4%}".format(int_2_5, probability))

# testing:
probability_less_6(5)

With 5 winning numbers, your chance to win is 0.0018%


## Conclusion

We coded four main functions for the first version of the app:

- `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket


- `check_historical_occurrence()`  — checks whether a certain combination has occurred in the Canada lottery data set


- `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816


- `probability_less_6()` — calculates the probability of having two, three, four or five winning numbers exactly