# UK Lotto Results Analysis

Author: Matthew Carter

## Introduction

The UK Lotto has been run by The National Lottery since 1994 and is now drawn twice a week. Players choose six balls and aim to match the six main balls in the draw to win the jackpot. Including a bonus ball, seven balls in total are drawn. Smaller prizes are won from matching two or more main balls.

In this mini analysis I look at the draws since the new Lotto format was introduced on the 8th October 2015 which increased the number of balls in the draw from 49 to 59.

## Importing the dataset

Lotto results for this project have been collected using my __[uk_lotto_scraper.py](https://github.com/MatthewCarterIO/uk-lotto-DA/blob/master/uk_lotto_scraper.py)__ file.

In [1]:
# Common Python packages that will be used throughout project.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Import the draw results.
results_df = pd.read_csv("lotto_results.csv")
results_df.head()

Unnamed: 0,draw_date,main_balls,bonus_ball
0,2015-12-30,"[22, 31, 47, 52, 55, 59]",23
1,2015-12-26,"[17, 21, 31, 38, 44, 58]",20
2,2015-12-23,"[1, 2, 4, 19, 28, 41]",32
3,2015-12-19,"[13, 14, 27, 46, 48, 50]",42
4,2015-12-16,"[10, 12, 35, 46, 48, 57]",14


In [3]:
# Sort the DataFrame into ascending date order and make the draw_date column the index.
results_df.sort_values(by=["draw_date"], inplace=True)
results_df.set_index("draw_date", inplace=True)
results_df.head()

Unnamed: 0_level_0,main_balls,bonus_ball
draw_date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-01-03,"[10, 15, 17, 18, 19, 31]",24
2015-01-07,"[4, 5, 10, 12, 20, 23]",24
2015-01-10,"[13, 14, 16, 24, 41, 43]",39
2015-01-14,"[17, 20, 28, 29, 33, 36]",23
2015-01-17,"[29, 37, 42, 46, 47, 49]",4


In this analysis the draw dates of interest are those after the 8th October 2015, running to the present (end of January 2020 at time of writing) when the new Lotto format was used.

In [4]:
# Select rows in DataFrame after 2015-10-08.
results_df = results_df.loc["2015-10-08" : "2020-01-31"]
results_df

Unnamed: 0_level_0,main_balls,bonus_ball
draw_date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-10-10,"[2, 3, 16, 32, 53, 54]",8
2015-10-14,"[7, 13, 20, 27, 39, 52]",35
2015-10-17,"[8, 30, 37, 40, 46, 50]",36
2015-10-21,"[13, 14, 21, 25, 51, 53]",39
2015-10-24,"[29, 31, 43, 55, 58, 59]",11
...,...,...
2020-01-15,"[15, 25, 50, 52, 54, 55]",28
2020-01-18,"[14, 29, 35, 41, 42, 55]",11
2020-01-22,"[19, 23, 25, 29, 30, 36]",14
2020-01-25,"[18, 30, 33, 39, 54, 55]",21


Saving the DataFrame to a CSV file in __[uk_lotto_scraper.py](https://github.com/MatthewCarterIO/uk-lotto-DA/blob/master/uk_lotto_scraper.py)__ resulted in the column containing lists of the main balls for each draw to be stored as strings instead of lists of integers.

In [5]:
print(type(results_df.loc["2015-10-10", "main_balls"]))

<class 'str'>


In [6]:
from ast import literal_eval

# Function converting string back into a list of integers.
def string_to_int_list(string_list):
    return literal_eval(string_list)

# Apply function to each row of the main_balls column.
results_df['main_balls'] = results_df['main_balls'].apply(string_to_int_list)
print(type(results_df.loc["2015-10-10", "main_balls"]))

<class 'list'>
