# Mobile App for Lottery Addiction

## Goal
**Many people are easy to get addicted to playing lottery, which will put a huge burden on their savings, they usually are not clear about chance for winning. This project aims at finding out probability of winning lottery, which will to a certain extent prevent addiction. More specifically, it will focus on the 6/49 lottery and solve all possible winning scenarios:**

- One-ticket Probability
- Multi-ticket probability
- Less winning numbers

## background of 6/49 Lottery
In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, he doesn't win

## Dataset
This dataset, which dates from 1982 to 2018 was made public on <a href='https://www.kaggle.com/datascienceai/lottery-dataset'> Kaggle </a>, where it was downloaded.

In [1]:
import pandas as pd
import numpy as np
import random

**Constrcuct functions**

functions for:
- Factorial
- Permutation
- Combination

In [2]:
#Factorial function 
def factorial(n):
    fact=1
    for i in range(n, 0, -1):
        fact *= i
    return fact

#Permutation
def permutation(n,k):
    numerator=1
    denominator=1
    for i in range(n,0,-1):
        numerator *= i
    for j in range(n-k,0,-1):
        denominator *= j
    return numerator/denominator

#Combination
def combinations(n,k):
    numerator=factorial(n)
    denominator=factorial(k)*factorial(n-k)
    return numerator/denominator

## One-Ticket Probability

For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers

Functions for calculating the winning probability

In [3]:
def one_ticket_probability(user_number):
    
    n_combinations= combinations(49,6)
    prob_single_ticket=1/n_combinations
    percentage_single=prob_single_ticket*100
    
    print('''Your chances to win the big prize with the numbers {} are {:.7f}%.'''.format(user_number,
                    percentage_single))
    

Test the functioin

In [4]:
#Generate 6 unique numbers
user_numbers_list=random.sample(range(1, 50), 6)
user_numbers_list

[14, 1, 48, 37, 26, 10]

In [5]:
#parse the list into the function
one_ticket_probability(user_numbers_list)

Your chances to win the big prize with the numbers [14, 1, 48, 37, 26, 10] are 0.0000072%.


**Import the data**

In [6]:
df=pd.read_csv('649.csv')
df.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB


In [8]:
#Check null values
df.isnull().sum()

PRODUCT            0
DRAW NUMBER        0
SEQUENCE NUMBER    0
DRAW DATE          0
NUMBER DRAWN 1     0
NUMBER DRAWN 2     0
NUMBER DRAWN 3     0
NUMBER DRAWN 4     0
NUMBER DRAWN 5     0
NUMBER DRAWN 6     0
BONUS NUMBER       0
dtype: int64

Write a function to extract winning numbers from the dataset, column based

In [9]:
def extract_number(column):
    col=column[4:10]
    col=set(col.values)
    return col

winning_numbers=df.apply(extract_number, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

**Historical Data Check**

Write a functions which compare the user numbers and historical number sets, then print information with respect to the number of occurrences and probability of winning in the next drawing

In [12]:
def check_historical_occurrence(user_numbers, historical_numbers):
    
    #Convert Usernumbers to a list
    user_numbers=set(user_numbers)
    
    #compare user numbers to historical umbers, then sum up times of occurence
    occurence=user_numbers == historical_numbers
    total_occurence=occurence.sum()
    
    if total_occurence ==0:
        print("Your combination {} has never occured. The chance to win in the next drwaing using the combination {} is 0.0000072%.".format(
        user_numbers, user_numbers))
    
    else:
        print("Number of times of Your combination {} has occured is {}. The chance to win in the next drwaing using the combination {} is 0.0000072%.".format(
        user_numbers, total_occurence,user_numbers
        ))

In [13]:
#Use first set of historical data to test the function
test = [3, 41, 11, 12, 43, 14]
check_historical_occurrence(test, winning_numbers)

Number of times of Your combination {3, 41, 11, 12, 43, 14} has occured is 1. The chance to win in the next drwaing using the combination {3, 41, 11, 12, 43, 14} is 0.0000072%.


**Random Test & Conclusion

In [15]:
#Use randomly generate number to test
check_historical_occurrence(user_numbers_list, winning_numbers)

Your combination {1, 37, 10, 14, 48, 26} has never occured. The chance to win in the next drwaing using the combination {1, 37, 10, 14, 48, 26} is 0.0000072%.


## Multi-ticket Probability

Suppose users want to try their luck on purchase of multiple ticket instead of betting on single-obe shot, they would also need to be aware of the exact probility that they are going to win

Function for nulti-ticket probability calculation

In [20]:
def multi_ticket_probability(tickets):
    
    n_combinations=combinations(49,6)
    prob_multi_ticket=tickets/n_combinations
    percentage_multi=prob_multi_ticket*100
    

    print('''Your chances to win the big prize with the numbers {} are {:.7f}%.'''.format(tickets,
                    percentage_multi))
    
#     n_combinations= combinations(49,6)
#     prob_single_ticket=1/n_combinations
#     percentage_single=prob_single_ticket*100

**Conclusion**

In [23]:
#Use a loop to run multiple test 
n_tickets=[1,3,10,200,500,1000,5000,10000,1000000, 6991908, 13983816]

for n in n_tickets:
    multi_ticket_probability(n)
    print('--------------------')

Your chances to win the big prize with the numbers 1 are 0.0000072%.
--------------------
Your chances to win the big prize with the numbers 3 are 0.0000215%.
--------------------
Your chances to win the big prize with the numbers 10 are 0.0000715%.
--------------------
Your chances to win the big prize with the numbers 200 are 0.0014302%.
--------------------
Your chances to win the big prize with the numbers 500 are 0.0035756%.
--------------------
Your chances to win the big prize with the numbers 1000 are 0.0071511%.
--------------------
Your chances to win the big prize with the numbers 5000 are 0.0357556%.
--------------------
Your chances to win the big prize with the numbers 10000 are 0.0715112%.
--------------------
Your chances to win the big prize with the numbers 1000000 are 7.1511238%.
--------------------
Your chances to win the big prize with the numbers 6991908 are 50.0000000%.
--------------------
Your chances to win the big prize with the numbers 13983816 are 100.0000

## Less winning numbers

There are situations that in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers

Function for finding probability of wining with less winning numbers (from 2 to 5)

In [40]:
def less_winning_probabilities(n_winning):
    #First, calculate the number of total combinations
    n_combinations_tickets=combinations(6,n_winning)
    
    #Calculate the number of combinations for remaining numbers
    n_combinations_remaining=combinations(43, 6-n_winning)
    
    #Total successful outcomes
    successful_outcomes=n_combinations_tickets * n_combinations_remaining
    
    #Total Combinations
    total_outcomes=combinations(49,6)
    
    less_winning_probability=successful_outcomes/total_outcomes
    percentage=less_winning_probability*100
    
    print('''Your chances to win the big prize with {} numbers are {:.6f}%.'''.format(n_winning,
                    percentage))
    

**Conclusion**

In [41]:
for n in [2,3,4,5]:
    less_winning_probabilities(n)
    print('---------------------------')

Your chances to win the big prize with 2 numbers are 13.237803%.
---------------------------
Your chances to win the big prize with 3 numbers are 1.765040%.
---------------------------
Your chances to win the big prize with 4 numbers are 0.096862%.
---------------------------
Your chances to win the big prize with 5 numbers are 0.001845%.
---------------------------
