Project Stage - V (Dashboard)
Goals

The final stage aims a developing a simple interactive dashboard based on the analysis you have done so far. In this we will be utilizing Plotly (https://plotly.com/) along with Dash (https://plotly.com/dash/) as our framework.

Refer here for Plotly: https://github.com/q-tong/CS405-605-Data-Science/tree/main/Fall2023/Lecture/5.Visualization/Visualization

Getting started with Dash: https://www.youtube.com/watch?v=hSPmj7mK6ng

PS: This can be invoked from Jupyter, see here: https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e
Tasks for stage V (team):
Task 1: (70 pts)

- Main graph
    - Allow for selection of date to show the trend of COVID-19 cases and deaths. (30)
    - Allow for linear or log mode selection on the number of cases and deaths. (10)
    - Incorporate your best model prediction trend line - Linear / Non-Linear. (30)
    - Ex: https://ourworldindata.org/coronavirus
    

Task 2: (30 pts)

- Trend
    - Plot the trend line using moving average (https://en.wikipedia.org/wiki/Moving_average). Use 7-day moving average. (15)
    - Allow for selection of multiple states on the same graph. (15)

Deliverable

    Take screenshots of Report upload on canvas.
    Each member creates separate notebooks for member tasks. Upload all notebooks to Github Repository.

Deadline: 04/19/2024
Final Presentation: April 22, 2024 & April 24, 2024

The final presentation is on April 22 and April 24. Each group will have up to 20 minutes slot to showcase their whole project process (make a summary of each stage) and share the results/reports achieved as a team, as well as individual contributions. You need to create your presentation using "Microsoft PowerPoint". Within your team, you can either nominate a single presenter or allow team members to take turns presenting.

Please note that it's crucial for all team members to be present during the presentation. The instructor and audience might pose questions that require input from any team member, so your collective presence is valuable. Look forward to your insightful presentations!

Presentation order (Presentation 1,2 is on April 22, Monday; Presentation 3 is on April 24, Wednesday):


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import plotly.graph_objs as go
import dash
from dash import dcc, html
from dash.dependencies import Input, Output

In [2]:
import pandas as pd

# Load datasets
cases_df = pd.read_csv('covid_confirmed_usafacts.csv')
deaths_df = pd.read_csv('covid_deaths_usafacts.csv')

In [3]:
cases_df

Unnamed: 0,countyFIPS,County Name,State,StateFIPS,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-26,2020-01-27,...,2023-07-14,2023-07-15,2023-07-16,2023-07-17,2023-07-18,2023-07-19,2023-07-20,2023-07-21,2023-07-22,2023-07-23
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,19913,19913,19913,19913,19913,19913,19913,19913,19913,19913
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,70521,70521,70521,70521,70521,70521,70521,70521,70521,70521
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,7582,7582,7582,7582,7582,7582,7582,7582,7582,7582
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,8149,8149,8149,8149,8149,8149,8149,8149,8149,8149
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3188,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,12645,12645,12645,12645,12645,12645,12645,12645,12645,12645
3189,56039,Teton County,WY,56,0,0,0,0,0,0,...,12206,12206,12206,12206,12206,12206,12206,12206,12206,12206
3190,56041,Uinta County,WY,56,0,0,0,0,0,0,...,6468,6468,6468,6468,6468,6468,6468,6468,6468,6468
3191,56043,Washakie County,WY,56,0,0,0,0,0,0,...,2640,2640,2640,2640,2640,2640,2640,2640,2640,2640


In [4]:
deaths_df

Unnamed: 0,countyFIPS,County Name,State,StateFIPS,2020-01-22,2020-01-23,2020-01-24,2020-01-25,2020-01-26,2020-01-27,...,2023-07-14,2023-07-15,2023-07-16,2023-07-17,2023-07-18,2023-07-19,2023-07-20,2023-07-21,2023-07-22,2023-07-23
0,0,Statewide Unallocated,AL,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,Autauga County,AL,1,0,0,0,0,0,0,...,235,235,235,235,235,235,235,235,235,235
2,1003,Baldwin County,AL,1,0,0,0,0,0,0,...,731,731,731,731,731,731,731,731,731,731
3,1005,Barbour County,AL,1,0,0,0,0,0,0,...,104,104,104,104,104,104,104,104,104,104
4,1007,Bibb County,AL,1,0,0,0,0,0,0,...,111,111,111,111,111,111,111,111,111,111
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3188,56037,Sweetwater County,WY,56,0,0,0,0,0,0,...,142,142,142,142,142,142,142,142,142,142
3189,56039,Teton County,WY,56,0,0,0,0,0,0,...,16,16,16,16,16,16,16,16,16,16
3190,56041,Uinta County,WY,56,0,0,0,0,0,0,...,43,43,43,43,43,43,43,43,43,43
3191,56043,Washakie County,WY,56,0,0,0,0,0,0,...,51,51,51,51,51,51,51,51,51,51


## Data Preprocessing

The following steps preprocess the COVID-19 cases and deaths datasets:

1. Drop unnecessary columns from the cases dataframe (`countyFIPS`, `County Name`, `StateFIPS`).
2. Transpose the cases dataframe and set the index to the state names.
3. Drop unnecessary columns from the deaths dataframe (`countyFIPS`, `County Name`, `StateFIPS`).
4. Transpose the deaths dataframe and set the index to the state names.
5. Group the columns of the cases dataframe by state and sum the values.
6. Group the columns of the deaths dataframe by state and sum the values.


In [5]:
# Drop unnecessary columns from the cases dataframe
cases_df = cases_df.drop(['countyFIPS', 'County Name', 'StateFIPS'], axis=1)

# Transpose the cases dataframe and set the index to the state names
cases_df = cases_df.set_index('State').T

# Drop unnecessary columns from the deaths dataframe
deaths_df = deaths_df.drop(['countyFIPS', 'County Name', 'StateFIPS'], axis=1)

# Transpose the deaths dataframe and set the index to the state names
deaths_df = deaths_df.set_index('State').T

# Group the columns of the cases dataframe by state and sum the values
cases_df = cases_df.groupby(level=0, axis=1).sum()

# Group the columns of the deaths dataframe by state and sum the values
deaths_df = deaths_df.groupby(level=0, axis=1).sum()

cases_df=cases_df.rename_axis(None, axis=1)
deaths_df = deaths_df.rename_axis(None, axis=1)

In [6]:
cases_df

Unnamed: 0,AK,AL,AR,AZ,CA,CO,CT,DC,DE,FL,...,SD,TN,TX,UT,VA,VT,WA,WI,WV,WY
2020-01-22,0,0,0,0,722,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2020-01-23,0,0,0,0,733,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2020-01-24,0,0,0,0,739,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2020-01-25,0,0,0,0,749,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2020-01-26,0,0,0,1,756,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-07-19,287319,1659936,977662,2486671,11300486,1769981,982973,169149,334373,7627999,...,283342,2364399,8508204,1099978,2323255,151477,1968539,2036778,652667,187389
2023-07-20,287319,1659936,977662,2486671,11300486,1769981,982973,169149,334466,7627999,...,283342,2364399,8508204,1099978,2323255,151477,1969833,2036872,652772,187389
2023-07-21,287319,1659936,977662,2486671,11300486,1769981,982973,169149,334466,7627999,...,283342,2364399,8508204,1099978,2323255,151477,1969833,2036872,652772,187389
2023-07-22,287319,1659936,977662,2486671,11300486,1769981,982973,169149,334466,7627999,...,283342,2364399,8508204,1099978,2323255,151477,1969833,2036872,652772,187389


In [7]:
print(cases_df.index)

Index(['2020-01-22', '2020-01-23', '2020-01-24', '2020-01-25', '2020-01-26',
       '2020-01-27', '2020-01-28', '2020-01-29', '2020-01-30', '2020-01-31',
       ...
       '2023-07-14', '2023-07-15', '2023-07-16', '2023-07-17', '2023-07-18',
       '2023-07-19', '2023-07-20', '2023-07-21', '2023-07-22', '2023-07-23'],
      dtype='object', length=1265)


In [8]:
deaths_df

Unnamed: 0,AK,AL,AR,AZ,CA,CO,CT,DC,DE,FL,...,SD,TN,TX,UT,VA,VT,WA,WI,WV,WY
2020-01-22,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2020-01-23,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2020-01-24,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2020-01-25,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2020-01-26,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-07-19,1457,21138,13062,29852,102356,14522,11034,1392,3440,89075,...,3245,28113,92378,5397,23769,910,15957,16723,8163,2039
2023-07-20,1457,21138,13062,29852,102356,14522,11034,1392,3440,89075,...,3245,28113,92378,5397,23769,910,15972,16723,8163,2039
2023-07-21,1457,21138,13062,29852,102356,14522,11034,1392,3440,89075,...,3245,28113,92378,5397,23769,910,15972,16723,8163,2039
2023-07-22,1457,21138,13062,29852,102356,14522,11034,1392,3440,89075,...,3245,28113,92378,5397,23769,910,15972,16723,8163,2039


In [9]:
print(deaths_df.index)

Index(['2020-01-22', '2020-01-23', '2020-01-24', '2020-01-25', '2020-01-26',
       '2020-01-27', '2020-01-28', '2020-01-29', '2020-01-30', '2020-01-31',
       ...
       '2023-07-14', '2023-07-15', '2023-07-16', '2023-07-17', '2023-07-18',
       '2023-07-19', '2023-07-20', '2023-07-21', '2023-07-22', '2023-07-23'],
      dtype='object', length=1265)


In [10]:
print(cases_df.columns)

Index(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'HI',
       'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN',
       'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH',
       'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA',
       'WI', 'WV', 'WY'],
      dtype='object')


In [11]:
print(deaths_df.columns)

Index(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'HI',
       'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN',
       'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH',
       'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA',
       'WI', 'WV', 'WY'],
      dtype='object')


### Design of The Dashboard Layout:

The COVID-19 Dashboard layout consists of several interactive components:

- **Start Date:** Select the start date for the data analysis period.
- **End Date:** Select the end date for the data analysis period.
- **Mode Selector:** Choose between Linear and Log scale for the y-axis of the charts.
- **Performance Options for Cases:** Select the performance options for the COVID-19 cases chart, including displaying actual values, trendline, and 7-Day Moving Avg.
- **State Selector for Cases:** Select one or more states to display COVID-19 cases data for.
- **Cases Graph:** Displays the COVID-19 cases data based on the selected options.
- **Performance Options for Deaths:** Select the performance options for the COVID-19 deaths chart, including displaying actual values, trendline, and 7-Day Moving Avg.
- **State Selector for Deaths:** Select one or more states to display COVID-19 deaths data for.
- **Deaths Graph:** Displays the COVID-19 deaths data based on the selected options.


In [12]:
# Create Dash app
app = dash.Dash(__name__)

# Define app layout
app.layout = html.Div([
    # Title of the dashboard
    html.H1("COVID-19 Dashboard"),
    
    # Date picker for selecting the start date
    html.Label("Start Date:"),
    dcc.DatePickerSingle(
        id='start-date-picker',
        min_date_allowed=pd.to_datetime('2020-01-01'),
        max_date_allowed=pd.to_datetime('2023-07-23'),
        initial_visible_month=pd.to_datetime('2020-01-01'),
        date=pd.to_datetime('2020-01-01')
    ),
    
    # Date picker for selecting the end date
    html.Label("End Date:"),
    dcc.DatePickerSingle(
        id='end-date-picker',
        min_date_allowed=pd.to_datetime('2020-01-01'),
        max_date_allowed=pd.to_datetime('2023-07-23'),
        initial_visible_month=pd.to_datetime('2023-07-23'),
        date=pd.to_datetime('2023-07-23')
    ),
    
    # Radio items for selecting the mode (Linear or Log)
    html.Label("Select Mode:"),
    dcc.RadioItems(
        id='mode-selector',
        options=[
            {'label': 'Linear', 'value': 'linear'},
            {'label': 'Log', 'value': 'log'}
        ],
        value='linear',
        labelStyle={'display': 'inline-block'}
    ),
    
    # Checklist for selecting performance options for cases data
    html.Label("Performance Options for Cases:"),
    dcc.Checklist(
        id='performance-options-cases',
        options=[
            {'label': 'Show Actual Values', 'value': 'actual'},
            {'label': 'Show Trendline', 'value': 'trendline'},
            {'label': 'Show 7-Day Moving Avg', 'value': 'moving-avg'}
        ],
        value=['actual', 'trendline', 'moving-avg'],
        labelStyle={'display': 'inline-block'}
    ),
    
    # Dropdown for selecting states related to cases data
    dcc.Dropdown(
        id='state-selector-cases',
        options=[
            {'label': state, 'value': state} for state in cases_df.columns
        ],
        multi=True,
        value=cases_df.columns[:5],  # Default selection
    ),
    
    # Graph component for displaying cases data
    dcc.Graph(id='cases-graph'),
    
    # Checklist for selecting performance options for deaths data
    html.Label("Performance Options for Deaths:"),
    dcc.Checklist(
        id='performance-options-deaths',
        options=[
            {'label': 'Show Actual Values', 'value': 'actual'},
            {'label': 'Show Trendline', 'value': 'trendline'},
            {'label': 'Show 7-Day Moving Avg', 'value': 'moving-avg'}
        ],
        value=['actual', 'trendline', 'moving-avg'],
        labelStyle={'display': 'inline-block'}
    ),
    
    # Dropdown for selecting states related to deaths data
    dcc.Dropdown(
        id='state-selector-deaths',
        options=[
            {'label': state, 'value': state} for state in deaths_df.columns
        ],
        multi=True,
        value=deaths_df.columns[:5],  # Default selection
    ),
    
    # Graph component for displaying deaths data
    dcc.Graph(id='deaths-graph')
])


### Callback for Updating Cases and Deaths Graph
#### 1. Callback for Updating Cases Graph:

This callback function updates the COVID-19 cases graph based on user input. It takes the following inputs:
- **start_date:** Start date selected by the user.
- **end_date:** End date selected by the user.
- **mode:** Selected mode for the y-axis (Linear or Log).
- **performance_options:** Selected performance options for the cases graph, including Actual Values, Trendline, and 7-Day Moving Avg.
- **selected_states:** States selected by the user to display data for.

The function filters the data based on the selected dates and states, then creates traces for the cases graph. For each selected state, it checks the performance options and adds corresponding traces, including Actual Values, Trendline (with one-week prediction), and 7-Day Moving Avg. Finally, it returns the figure data and layout for the cases graph.

#### 2. Callback for Updating Deaths Graph:

Similar to the cases graph callback, this function updates the COVID-19 deaths graph based on user input. It also takes inputs such as start_date, end_date, mode, performance_options, and selected_states. The function filters the deaths data, creates traces for the deaths graph, adds traces for Actual Values, Trendline (with one-week prediction), and 7-Day Moving Avg based on the selected performance options, and returns the figure data and layout for the deaths graph.


In [13]:
import numpy as np

# Define callback to update cases graph based on user input
@app.callback(
    Output('cases-graph', 'figure'),
    [Input('start-date-picker', 'date'),
     Input('end-date-picker', 'date'),
     Input('mode-selector', 'value'),
     Input('performance-options-cases', 'value'),
     Input('state-selector-cases', 'value')]
)
def update_cases_graph(start_date, end_date, mode, performance_options, selected_states):
    # Filter data based on selected dates and states
    cases_data = cases_df.loc[start_date:end_date, selected_states]
    
    # Create traces for cases graph
    cases_traces = []
    for state in selected_states:
        if 'actual' in performance_options:
            cases_traces.append(go.Scatter(x=cases_data.index, y=cases_data[state], mode='lines', name=state))
        if 'trendline' in performance_options:
            cases_polyfit = np.polyfit(np.arange(len(cases_data[state])), cases_data[state], 2)
            cases_trendline = np.polyval(cases_polyfit, np.arange(len(cases_data[state])))
            cases_traces.append(go.Scatter(x=cases_data.index, y=cases_trendline, mode='lines', name=state + ' Trendline'))
            # Calculate one-week prediction
            last_date = cases_data.index[-1]
            one_week_ahead = pd.date_range(start=last_date, periods=8)[1]  # Get the date one week ahead
            prediction = np.polyval(cases_polyfit, len(cases_data[state]))  # Predict value for the next day
            prediction_trace = go.Scatter(x=[last_date, one_week_ahead], y=[cases_data[state].iloc[-1], prediction],
                                          mode='lines', line=dict(dash='dash'), name=state + ' Prediction')
            cases_traces.append(prediction_trace)
        if 'moving-avg' in performance_options:
            cases_moving_avg = cases_data[state].rolling(window=7).mean()
            cases_traces.append(go.Scatter(x=cases_data.index, y=cases_moving_avg, mode='lines', name=state + ' 7-Day Moving Avg'))
    
    # Create layout for cases graph
    cases_layout = go.Layout(title='COVID-19 Cases', xaxis={'title': 'Date'}, yaxis={'title': 'Cases'}, showlegend=True)
    
    # Apply log scale if selected
    if mode == 'log':
        cases_layout.yaxis.type = 'log'
    
    # Return figure
    return {'data': cases_traces, 'layout': cases_layout}

# Define callback to update deaths graph based on user input
@app.callback(
    Output('deaths-graph', 'figure'),
    [Input('start-date-picker', 'date'),
     Input('end-date-picker', 'date'),
     Input('mode-selector', 'value'),
     Input('performance-options-deaths', 'value'),
     Input('state-selector-deaths', 'value')]
)
def update_deaths_graph(start_date, end_date, mode, performance_options, selected_states):
    # Filter data based on selected dates and states
    deaths_data = deaths_df.loc[start_date:end_date, selected_states]
    
    # Create traces for deaths graph
    deaths_traces = []
    for state in selected_states:
        if 'actual' in performance_options:
            deaths_traces.append(go.Scatter(x=deaths_data.index, y=deaths_data[state], mode='lines', name=state))
        if 'trendline' in performance_options:
            deaths_polyfit = np.polyfit(np.arange(len(deaths_data[state])), deaths_data[state], 2)
            deaths_trendline = np.polyval(deaths_polyfit, np.arange(len(deaths_data[state])))
            deaths_traces.append(go.Scatter(x=deaths_data.index, y=deaths_trendline, mode='lines', name=state + ' Trendline'))
            # Calculate one-week prediction
            last_date = deaths_data.index[-1]
            one_week_ahead = pd.date_range(start=last_date, periods=8)[1]  # Get the date one week ahead
            prediction = np.polyval(deaths_polyfit, len(deaths_data[state]))  # Predict value for the next day
            prediction_trace = go.Scatter(x=[last_date, one_week_ahead], y=[deaths_data[state].iloc[-1], prediction],
                                          mode='lines', line=dict(dash='dash'), name=state + ' Prediction')
            deaths_traces.append(prediction_trace)
        if 'moving-avg' in performance_options:
            deaths_moving_avg = deaths_data[state].rolling(window=7).mean()
            deaths_traces.append(go.Scatter(x=deaths_data.index, y=deaths_moving_avg, mode='lines', name=state + ' 7-Day Moving Avg'))
    
    # Create layout for deaths graph
    deaths_layout = go.Layout(title='COVID-19 Deaths', xaxis={'title': 'Date'}, yaxis={'title': 'Deaths'}, showlegend=True)
    
    # Apply log scale if selected
    if mode == 'log':
        deaths_layout.yaxis.type = 'log'
    
    # Return figure
    return {'data': deaths_traces, 'layout': deaths_layout}


In [14]:
# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app '__main__' (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on


SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
