## Lagan Baseline Data: Exploratory Analysis

This notebook provides some initial insights into the Lagan baseline data to allow you to determine if it will be
helpful to use. The notebook format allows the most up-to-date data.

To run the notebook, press Run All in the toolbar.

In [19]:
# SET UP FOR THE NOTEBOOK TO WORK
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# Import the data
df1 = pd.read_csv('data/year_a.csv', infer_datetime_format='CREATED_DT')
df2 = pd.read_csv('data/year_b.csv', infer_datetime_format='CREATED_DT')
df3 = pd.read_csv('data/class_covid.csv', infer_datetime_format='CREATED_DT')
df4 = pd.read_csv('data/class_pp.csv', infer_datetime_format='CREATED_DT')

#### Overall Cases Submitted
The below chart shows that there has been significant drop in the cases raised in Lagan in 2020 compared to 2019.
This is not just the result of the COVID related influence.

The initial 30-40 days of 2019 has a much higher number of cases than 2020. The potential factors,
which require additional analysis, are the provision of Garden Waste via GOSS, and winter weather.

In [17]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df1['day'],
                         y=df1['7-day Avg'],
                         mode='lines',
                         name='2019'))
fig.add_trace(go.Scatter(x=df2['day'],
                         y=df2['7-day Avg'],
                         mode='lines',
                         name='2020'))

fig.update_layout(title='Number of cases - 2019 and 2020 Comparison (7-day Rolling Avg)',
                  xaxis_title='Day: 01 January to 31 August',
                  yaxis_title='No. of cases submitted')

fig.show()

### Top Case Types

This section highlights the differences in the cases request during the two periods.

To simplify the visualisation, classifications are only included where there has been a total of 350 or more cases  have been
reported.

In [31]:
# Prepare the data for visualisation
# Melt dataset
cases_a = pd.melt(df3, id_vars='CREATED_DT', var_name='CaseType', value_name='CountCase')
cases_b = pd.melt(df4, id_vars='CREATED_DT', var_name='CaseType', value_name='CountCase')

The chart below shows the case types being reported by day from 1st March 2020 to 31st August 2020.

In [33]:
# Year A
fig3 = px.line(cases_a, x='CREATED_DT', y='CountCase', color='CaseType')
fig3.update_layout(title='Case Types Submitted - 2020',
                   xaxis_title='March 1 2020 to 29 August 2020',
                   yaxis_title='No. Cases Submitted')
fig3.show()

In comparison, the below chart shows the 2019 case types reported by day from 1st March 2019 to 30 August 2020.

In [41]:
fig4 = px.line(cases_b, x='CREATED_DT', y='CountCase', color='CaseType')
fig4.update_layout(title='Case Types Submitted - 2019',
                   xaxis_title='March 1 2019 to 30 August 2020',
                   yaxis_title='No. Cases Submitted')
fig4.show()
