# Project 15 - Mobile App for Lottery Addiction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

## Goal of this project

A fictional medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that can answer users the following questions:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three) winning numbers on a single ticket?

<b>In short, the main purpose of this project is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.</b>

## Core Functions

We will start by writing two functions that are needed often. These are
- a function that calculates factorials
- a function that calculates combinations

In [15]:
#n! = n*(n-1)*(n-2)*...*2*1
def factorial(n):
    answer = 1
    start_number = 1
    for number in range(1, n+1): #n+1 because otherwise we would stop one number too early
        answer = start_number*answer
        start_number += 1
    return answer

#this should be 3*2*1 = 6
print(factorial(3))
#this should be 10*9*8...*1 = 3628800
print(factorial(10))

6
3628800


In [20]:
# n!/(k!*(n-k)!
def combinations(n, k):
    return int(factorial(n)/(factorial(k)*factorial(n-k))) #can be int since there won't be decimals

#this should be 2760681
print(combinations(38,6))

2760681


We now have two core functions that we're going to need repeatedly moving forward.

## One-ticket Probability

Now we are going to write a function that calculates the probability of winning the big prize.

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, he doesn't win.

In [37]:
def one_ticket_probability(list_of_six_numbers):
    lottery_list = list_of_six_numbers
    if len(lottery_list) != 6:
        print('The list has to have 6 numbers')
    else:
        #total number of possible outcomes: 49 numbers, 6 numbers are sampled
        ttl_outcomes = combinations(49,6)
        #person only playes one ticket in this one, so probability is 1/ttl_outcomes
        probability = 1/ttl_outcomes
        prob_as_pct = 100*probability
        #format to easier number, instead of 7.2e-06% shows 0.00000072%
        prob_as_pct = "{:.7f}".format(prob_as_pct)
        #printing the result in a way that should be easy to understand
        print(f"There are {ttl_outcomes} possible combinations in lottery.")
        print(f"The chance to win the lottery with one ticket is only {prob_as_pct}%.")
    
    
one_ticket_probability([1,2,3,4,5,6])

There are 13983816 possible combinations in lottery.
The chance to win the lottery with one ticket is only 0.0000072%.


Above I created a function that calculates the probability to win the big prize with one ticket. The function tries to print the probability in a way that shows addicted players that winning the lottery is close to impossible.

## Historical Data Check for Canada Lottery

Users should be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now. To do that, we are downloading the data set from [Kaggle](https://www.kaggle.com/datasets/datascienceai/lottery-dataset).

In [32]:
import pandas as pd
data = pd.read_csv('649.csv')
print(data.shape)
print(data.head(2))
data.tail(2)

(3665, 11)
   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The dataset contains 3665 rows, meaning that there are 3665 different lottery drawings. The first date is from 1982 and the last date is from 2018. We can find the numbers drawn from different columns, named `NUMBER DRAWN`. For each row there is also a bonus number, which can affect the winning prize.

## Function for Historical Data Check

Next we are going to write a function that enables users to compare their current ticket against the historical lotter data and determine whether they would have ever won by now.

In [63]:
#This function extracts numbers from the historical dataset in a list form
def extract_numbers(row_number):
    #empty set for numbers
    ## numbers 1-6 (not taking the bonus number) are in columns 5-10
    historical_list = row_number[4:10]
    historical_list = set(historical_list)
    return historical_list

winning_numbers = data.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

The function above returns all the winning numbers for the dataset. Let's now write a function that checks user's numbers with the historical winning numbers.

In [91]:
def check_historical_occurence(user_numbers, historical_numbers):
    user_numbers_set = set(user_numbers)
    wins = 0
    for numbers in historical_numbers:
        if user_numbers_set == historical_numbers[0]: #0 so we ignore index
            wins += 1
            break
    historical_count = historical_numbers.count()        
    return f'There has been over {historical_count} lottery games from 1982. \nIn over {2024-1982} years, your lottery ticket win count is: {wins}'
    
print(check_historical_occurence([1,2,3,4,5,6], winning_numbers))
print("")
print(check_historical_occurence([3,41,14,12,43,11], winning_numbers))

There has been over 3665 lottery games from 1982. 
In over 42 years, your lottery ticket win count is: 0

There has been over 3665 lottery games from 1982. 
In over 42 years, your lottery ticket win count is: 1


The function above works as intended, it check whether or not user's numbers would have won already. The return tries to be as clear as possible, and tries to highlight how many lottery games and years the ticket has already (most likely) missed.

## Multi-ticket Probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning. 

Next we are going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

In [103]:
#prints the probability of winning the big prize depending on the number of tickets
def multi_ticket_probability(n_of_tickets):
    #all the possible outcomes from 49 numbers
    ttl_outcomes = combinations(49,6)
    probability = n_of_tickets/ttl_outcomes
    prob_as_pct = 100*probability
    #clearer format
    prob_as_pct = "{:.7f}".format(prob_as_pct)
    return(f"There are {ttl_outcomes} possible combinations in lottery.\nThe chance to win the lottery with {n_of_tickets} tickets is only {prob_as_pct}%.")

test_probabilities = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for n in test_probabilities:
    print(multi_ticket_probability(n))
    print("")

There are 13983816 possible combinations in lottery.
The chance to win the lottery with 1 tickets is only 0.0000072%.

There are 13983816 possible combinations in lottery.
The chance to win the lottery with 10 tickets is only 0.0000715%.

There are 13983816 possible combinations in lottery.
The chance to win the lottery with 100 tickets is only 0.0007151%.

There are 13983816 possible combinations in lottery.
The chance to win the lottery with 10000 tickets is only 0.0715112%.

There are 13983816 possible combinations in lottery.
The chance to win the lottery with 1000000 tickets is only 7.1511238%.

There are 13983816 possible combinations in lottery.
The chance to win the lottery with 6991908 tickets is only 50.0000000%.

There are 13983816 possible combinations in lottery.
The chance to win the lottery with 13983816 tickets is only 100.0000000%.



From above we can see that the function works as intended. 10 tickets has 10 times the chance of winning than 1 ticket. If we would buy all the possible tickets (13983816 tickets), we would have a 100% chance of winning.

## Less Winning Numbers - Function

Now we are going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In [141]:
def probability_less_6(n):
    #possible combinations for the ticket with n numbers
    n_combinations_ticket = combinations(6, n)
    n_combinations_remaining = combinations(43, 6-n)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining

    ttl_outcomes = combinations(49,6)
    probability = (successful_outcomes / ttl_outcomes)*100
    #easier format
    probability = "{:.6f}".format(probability)
    return f'Your chance to win smaller price with matching {n} numbers is {probability}%'
        
for n in range(2, 6):
    print(probability_less_6(n))

Your chance to win smaller price with matching 2 numbers is 13.237803%
Your chance to win smaller price with matching 3 numbers is 1.765040%
Your chance to win smaller price with matching 4 numbers is 0.096862%
Your chance to win smaller price with matching 5 numbers is 0.001845%


The main purpose of this project was to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario. 

Now we have coded functions that
- calculate factorials and combinations which are needed in other functions
- calculates the probability for big prize win with one ticket
- calculates the probability for big prize win with multiple tickets
- checks if user's numbers have already won in the past 42 years
- calculates the probability for any possible win with 2 to 5 winning numbers

All the functions try to return values in such a way, that lottery addicts can see that winning the lottery is not likely going to happen.