# Table of Contents <a name='home'/>
<ol>
    <li><a href=#imports>Imports</a></li>
    <li><a href=#data>Data</a></li>
    <li><a href=#eda_main>EDA</a></li>
    <ol>
        <li><a href=#eda_one>Visual 1</a></li>
    </ol>
    <li><a href=#histogram>Histogram</a></li>
    <li><a href=#chebyshev>Chebyshev's Theorem</a></li>
</ol> 

### Imports <a name='imports'/>
<a href=#home>Back to Top</a><br>
<a href=#data>Next Section</a>

In [1]:
import math
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import plotly.figure_factory as ff

from statistics import stdev
from ipywidgets import widgets
from config import filename_all, filename_default, day_type, month_type, month_list

### Data <a name='data'/>
<a href=#home>Back to Top</a><br>
<a href=#imports>Previous Section</a><br>
<a href=#eda_main>Next Section</a>

In [2]:
df = pd.read_excel(filename_all)

In [3]:
df = df.rename(columns = {'Winning Numbers': 'Draw'})

In [4]:
df['Year'] = df['Date'].dt.strftime('%Y')

In [5]:
df['Month'] = df['Date'].dt.strftime('%B')

In [6]:
df['Month_num'] = df['Date'].dt.strftime('%m').str.lstrip("0")

In [7]:
df['Day'] = df['Date'].dt.strftime('%d')

In [8]:
df['Day'] = df['Day'].str.lstrip("0")

In [9]:
df.head()

Unnamed: 0,Date,Draw,first,second,third,fourth,fifth,sixth,Day_Name,Odd_Even,Odd_Even_Dist,Year,Month,Month_num,Day
0,2023-09-14,30406132735,3,4,6,13,27,35,Thursday,odd-even-even-odd-odd-odd,"Even: 2, Odd: 4",2023,September,9,14
1,2023-09-12,30514172633,3,5,14,17,26,33,Tuesday,odd-odd-even-odd-even-odd,"Even: 2, Odd: 4",2023,September,9,12
2,2023-09-09,30911193541,3,9,11,19,35,41,Saturday,odd-odd-odd-odd-odd-odd,"Even: 0, Odd: 6",2023,September,9,9
3,2023-09-05,132125273240,13,21,25,27,32,40,Tuesday,odd-odd-odd-odd-even-even,"Even: 2, Odd: 4",2023,September,9,5
4,2023-09-02,112223293234,11,22,23,29,32,34,Saturday,odd-even-odd-odd-even-even,"Even: 3, Odd: 3",2023,September,9,2


### Exploratory Data Analysis <a name='eda_main'/>
<a href=#home>Back to Top</a><br>
<a href=#data>Previous Section</a><br>
<a href=#eda_one>Next Section</a>

#### EDA Total Records <a name='eda_one'/>
<a href=#home>Back to Top</a><br>
<a href=#eda_main>Previous Section</a><br>
<a href=#eda_two>Next Section</a>

#### EDA Column-Wise <a name='eda_one'/>
<a href=#home>Back to Top</a><br>
<a href=#eda_main>Previous Section</a><br>
<a href=#eda_two>Next Section</a>

In [11]:
%run ./Visual_Functions/eda_columns.ipynb
visual_eda_col(df)

VBox(children=(HBox(children=(Checkbox(value=True, description='Date: '), Dropdown(description='Month:', optio…

#### EDA Row-Wise <a name='eda_two'/>
<a href=#home>Back to Top</a><br>
<a href=#eda_one>Previous Section</a><br>
<a href=#histogram>Next Section</a>

In [50]:
fig = go.Figure(data=[go.Table(
    header=dict(values=['Date', 'first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'Average'],
                fill_color='paleturquoise',
                align='left'),
    cells=dict(values=[df['Date'].dt.date, df['first'], df['second'], df['third'], df['fourth'], df['fifth'], df['sixth'], 
                       (df['first'] + df['second'] + df['third'] + df['fourth'] + df['fifth'] + df['sixth'])/6],
               fill_color='lavender',
               align='left'))
])

fig.show()

In [34]:
list(df[['Date', 'first', 'second', 'third', 'fourth', 'fifth', 'sixth']])

['Date', 'first', 'second', 'third', 'fourth', 'fifth', 'sixth']

### Histogram <a name='histogram'/>
<a href=#home>Back to Top</a><br>
<a href=#eda_one>Previous Section</a><br>
<a href=#chebyshev>Next Section</a>

In [24]:
%run ./Visual_Functions/histogram.ipynb
histogram(df)

### Chebyshev's Theorem (Range of Values) 
<p>Chebyshev's Theorem states that 75% of the data set fall within 2 standard deviations from the mean, regardless of its shape. </p>
<p>Being said that, the theorem also tells that 88.89% of the data set fall within 3 standard deviations from the mean.</p>
<p>The percentage of data can be found using the formula:</p>
$$
{1}-\frac{1}{k^2}\,\,\,or\,\,\, {1}-\frac{1}{2^2} = {1}-\frac{1}{4} = \frac{3}{4} = 75\%
$$
<p>where <i>k</i> is the standard deviation. Also, this computation explains the first sentence.</p><br>
<p>On the other hand, to get the actual data range of that particular percentage, you can do the following:</p><br>
<p>In this particular example, the mean is 70 and the standard deviation is 2.</p><br>
$$
{70 + 2(1.5) = 70 + 3 = 73}\\
{70 − 2(1.5) = 70 − 3 = 67}
$$
<p>This means that 75% (2 standard deviations) of the data falls within 67 - 73.</p><br>
<a name='chebyshev'/>
<a href=#home>Back to Top</a><br>
<a href=#histogram>Previous Section</a>