# Guided Project: Mobile App for Lottery Addiction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that enable users to answer questions like:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

We will also consider historical data coming from the national 6/49 lottery game in Canada. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018

**CORE FUNCTIONS**

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:
- A function that calculates factorials; and 
- A function that calculates combinations.

In [1]:
def factorial(n):
    result = 1
    for i in range(n):
        result *=  (n - i)
    return result

def combinations(n, k):
    return factorial(n) / (factorial(k) * factorial(n - k))

**ONE-TICKET PROBABILITY**

We'll start by building a function that calculates the probability of winning the big prize for any given ticket. Some details we need be aware of are:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [2]:
def one_ticket_probability(num_list):
    total_outcome = combinations(49, len(num_list))
    probability = 1 / total_outcome
    outcome_percentage = probability * 100
    message = 'The chances for you to win the big prize with your set of six numbers are of {:.7f} %'
    print(message.format(outcome_percentage), '\n')

Let's the function using few inputs.

In [3]:
input_1 = one_ticket_probability([55, 2, 63, 48, 7, 1])

The chances for you to win the big prize with your set of six numbers are of 0.0000072 % 



In [4]:
input_2 = one_ticket_probability([6, 49, 35, 12, 9, 2])

The chances for you to win the big prize with your set of six numbers are of 0.0000072 % 



From the results above, we see that each person that play the lottery has the same percentage of winning. It's also easy to notice how low this number is.

**HISTORICAL DATA CHECK FOR CANADA LOTTERY**

For the first version of the app, users should be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- NUMBER DRAWN 1
- NUMBER DRAWN 2
- NUMBER DRAWN 3
- NUMBER DRAWN 4
- NUMBER DRAWN 5
- NUMBER DRAWN 6

In [5]:
import pandas as pd

In [6]:
# Read the data set
six_four_nine = pd.read_csv('649.csv')

print('(rows numbers, columns number) : ', six_four_nine.shape)

(rows numbers, columns number) :  (3665, 11)


In [7]:
print('\n- First 3 rows')
six_four_nine.head(3)


- First 3 rows


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [8]:
print('\n- Last 3 rows')
six_four_nine.tail(3)


- Last 3 rows


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


**FUCNTION FOR HISTORICAL DATA CHECK**

On this screen, we're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

On top of what previously stated, the engineering team wants the fucntion to print:
- the number of times the combination selected occurred in the Canada data set; and
- the probability of winning the big prize in the next drawing with that combination.

In [9]:
def extract_numbers(row):
    winning_number = [row[4], row[5], row[6], row[7], row[8], row[9], row[10]]
    return set(winning_number)

winning_numbers = six_four_nine.apply(extract_numbers ,axis=1)
winning_numbers

0        {3, 41, 11, 12, 43, 14, 13}
1         {33, 36, 37, 39, 8, 41, 9}
2         {1, 34, 6, 39, 23, 24, 27}
3         {34, 3, 9, 10, 43, 13, 20}
4        {34, 5, 45, 14, 47, 21, 31}
5        {33, 8, 41, 20, 21, 25, 31}
6        {33, 36, 7, 42, 18, 25, 28}
7        {7, 40, 16, 17, 48, 26, 31}
8        {33, 5, 38, 37, 10, 23, 27}
9         {3, 4, 37, 46, 15, 48, 30}
10        {33, 38, 7, 9, 42, 45, 21}
11       {36, 9, 11, 43, 17, 19, 20}
12       {34, 37, 7, 14, 47, 17, 20}
13       {35, 3, 44, 25, 28, 29, 30}
14       {36, 39, 8, 41, 47, 18, 31}
15       {9, 12, 13, 14, 44, 48, 18}
16        {4, 5, 40, 43, 44, 14, 18}
17      {34, 35, 36, 13, 16, 18, 26}
18      {36, 11, 23, 25, 27, 28, 29}
19       {37, 7, 39, 45, 18, 23, 25}
20      {37, 41, 11, 45, 18, 19, 31}
21       {8, 45, 14, 16, 48, 18, 31}
22       {4, 41, 11, 45, 23, 24, 25}
23        {33, 34, 3, 4, 39, 48, 19}
24       {36, 5, 43, 17, 21, 28, 30}
25       {36, 6, 38, 46, 17, 24, 29}
26         {3, 4, 9, 10, 11, 43, 46}
2

In [10]:
def check_historical_occurence(user_nums, winning_nums):
    user_nums = set(user_nums)
    bool_serie = winning_nums == user_nums
    message = 'In past lottery, the number of time your choosing six numbers was a winning serie is {}'
    print(message.format(bool_serie.sum()))
    

Let's do some tests on the function.

In [11]:
input_1 = check_historical_occurence([6, 49, 35, 12, 9, 2], winning_numbers)
input_1

In past lottery, the number of time your choosing six numbers was a winning serie is 0


In [12]:
# Actual winning numbers at leat once
input_2 = check_historical_occurence([35, 43, 44, 46, 16, 17, 49], winning_numbers)
input_2

In past lottery, the number of time your choosing six numbers was a winning serie is 1


From the results above, we see that one of our test combination that we knew will work only appeared one time in the winning numbers. The combination we randomly choose was not a winning number. But, that does not mean it will ever be. Remember, the chances to have a winning number independant of the fact it was once one are of 0.0000072 % .

**MULTI-TICKET PROBABILITY**

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning.

We're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We've talked with the engineering team and they gave us the following information:

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [13]:
def multi_ticket_probability(ticket_num):
    total_outcome = combinations(49, 6)
    probability = ticket_num / total_outcome
    outcome_percentage = probability * 100
    message = '\nThe chances for you to win the big prize with your {} purchased tickets are of {:.7f} %'
    print(message.format(ticket_num, outcome_percentage), '\n')
    print('-----------------------------------')


Let's test the function with the following input [1, 10, 100, 10000, 1000000, 6991908, 13983816].

In [14]:
ticket_size = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for number in ticket_size:
    multi_ticket_probability(number)
    


The chances for you to win the big prize with your 1 purchased tickets are of 0.0000072 % 

-----------------------------------

The chances for you to win the big prize with your 10 purchased tickets are of 0.0000715 % 

-----------------------------------

The chances for you to win the big prize with your 100 purchased tickets are of 0.0007151 % 

-----------------------------------

The chances for you to win the big prize with your 10000 purchased tickets are of 0.0715112 % 

-----------------------------------

The chances for you to win the big prize with your 1000000 purchased tickets are of 7.1511238 % 

-----------------------------------

The chances for you to win the big prize with your 6991908 purchased tickets are of 50.0000000 % 

-----------------------------------

The chances for you to win the big prize with your 13983816 purchased tickets are of 100.0000000 % 

-----------------------------------


From the results above, we see that more an user buy tickets for the same lottery, more the chances he has to win the big prize. But, to even have a least 50% chances to win, someone needs to buy at least half of the tickets available which is half of the possible combinations (6991908). That represents a lot of money spent for just a 50-50 chance of winning even if the lottery ticket were cheap.

**LESS WINNING nUMBERS--FUNCTION**

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:

- Inside the app, the user inputs:
    - six different numbers from 1 to 49; and
    - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

We're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

In [15]:
def probability_less_6(integer):
    total_outcomes = combinations(49, 6)
    n_combinations_ticket = combinations(6, integer)
    n_combinations_remaining = combinations(43, 6 - integer)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    outcome_percentage = (successful_outcomes / total_outcomes) * 100
    
    message = '\nThe chances for you to have exactly {} winning numbers are of {:.7f} %'
    print(message.format(integer, outcome_percentage), '\n')
    print('-----------------------------------')


Let's test the function with the following input [2, 3, 4, 5].

In [16]:
possible_inputs = [2, 3, 4, 5]
for num in possible_inputs:
    probability_less_6(num)


The chances for you to have exactly 2 winning numbers are of 13.2378029 % 

-----------------------------------

The chances for you to have exactly 3 winning numbers are of 1.7650404 % 

-----------------------------------

The chances for you to have exactly 4 winning numbers are of 0.0968620 % 

-----------------------------------

The chances for you to have exactly 5 winning numbers are of 0.0018450 % 

-----------------------------------


We can deduce from the results above that an user has better chances to winning something by chosing 2 winning number even if the chances are still low.