# Final Project, Part 2

The purpose of this assignment is to create a 'Viz for Experts' with an interactive dashboard interface for exploring your data.

For this submission option, you will submit your work through this Workspace.
    
**Please see Homework Prompt in PrairieLearn interface for more details on the requirements for this assignment.**

A rough outline of elements of code and write-up is shown below:

**Outline**

**1. Data Preparation**

1.1. Import necessary libraries.

1.2. Read the dataset and perform any initial data filtering or cleaning.

**2. Data Visualization**

Six visualizations were built.

2.1. max_aqi (2): Calculate the average maximum AQI per state and visualize it using a bar chart and a heatmap. 

2.2. good_days (1): Calculate and visualize the average number of good air quality days per state with a bar chart.

2.3. unhealthy_days (1): Calculate and visualize the average number of bad air quality days per state with a bar chart and a year selector.

2.4. air_quality_map (1): Create an interactive plot that updates based on the year selected from a dropdown, showing different kinds of day counts.

2.5. aqi_trend (1): Plot the trend of AQI over the years for a selected state using a line plot with a trend line.

**3. Building Dashboard and refresh button**

3.1. Organize the individual visualizations into a tabbed layout using ipywidgets.tab, with each tab corresponding to a different visualization of the air quality data as shown above.
3.2. Build a refresh button to update all visualizations with new data if needed.

## Code:

 * An interactive dashboard within your Workspace that helps an expert explore your dataset thoroughly.
 * There should be a "dashboard" type aspect to this - i.e. a linked view exploring your dataset in an interactive way (like in Lab \#4) with [bqplot](https://bqplot.github.io/bqplot/).
 * Do not delete any cells, *just comment them out*. Show your work.



In [1]:
import pandas as pd
import numpy as np
import bqplot as bq
import ipywidgets as widgets
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from IPython.display import display

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-6levj2ej because the default path (/tmp/cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.


In [2]:
df = pd.read_csv("annual_aqi_by_county_2018-2023.csv")

In [3]:
def max_aqi():
    state_max_aqi = df.groupby('State')['Max AQI'].mean().reset_index()
    state_max_aqi.columns = ['State', 'Average Max AQI']

    colors = ['green', 'yellow', 'orange', 'red', 'purple', 'maroon']
    n_bins = [0, 50, 100, 150, 200, 300, 400]
    color_map_name = 'aqi_scale'
    aqi_cmap = LinearSegmentedColormap.from_list(color_map_name, colors, N=400)
    plt.figure(figsize=(25, 6))
    norm = plt.Normalize(min(state_max_aqi['Average Max AQI']), max(state_max_aqi['Average Max AQI']))
    plt.bar(state_max_aqi['State'], state_max_aqi['Average Max AQI'], color=aqi_cmap(norm(state_max_aqi['Average Max AQI'])))
    plt.xlabel('State')
    plt.ylabel('Average Max AQI')
    plt.title('Average Max AQI')
    plt.xticks(rotation=45)
    plt.show()

In [4]:
def good_days():
    state_avg = df.groupby('State')['Good Days'].mean().reset_index()
    state_avg.columns = ['State', 'Average Good Days']
    plt.figure(figsize=(25, 6))
    plt.bar(state_avg['State'], state_avg['Average Good Days'], color='skyblue')
    plt.xlabel('State')
    plt.ylabel('Average Good Days')
    plt.title('Average Good Days per State')
    plt.xticks(rotation=45)
    plt.show()

In [5]:
def update_plot(year):
    df_year = df[df['Year'] == year]

    state_avg = df_year.groupby('State')['Good Days'].mean().reset_index()
    state_avg.columns = ['State', 'Average Good Days']

    plt.figure(figsize=(25, 6))
    plt.bar(state_avg['State'], state_avg['Average Good Days'], color='skyblue')
    plt.xlabel('State')
    plt.ylabel('Average Good Days')
    plt.title(f'Average Good Days per State ({year})')
    plt.xticks(rotation=45)
    plt.show()

year_dropdown = widgets.Dropdown(options=df['Year'].unique(), description='Year:')
good_days_per_state=widgets.interactive(update_plot, year=year_dropdown)


In [6]:
def update_plot(year):
    df_year = df[df['Year'] == year]

    state_avg = df_year.groupby('State')['Good Days'].mean().reset_index()
    state_avg.columns = ['State', 'Good Days']

    plt.figure(figsize=(15, 8))
    plt.scatter(state_avg['State'], state_avg['Good Days'], s=100)
    plt.xlabel('State')
    plt.ylabel('Unhealthy Days')
    plt.title(f'Unhealthy Days per State ({year})')
    plt.xticks(rotation=45)
    plt.grid(True)
    plt.show()

year_dropdown = widgets.Dropdown(options=df['Year'].unique(), description='Year:')
unhealthy_days_per_state= widgets.interactive(update_plot, year=year_dropdown)

In [7]:
state_ids = {'Alabama': '1', 'Alaska': '2', 'Arizona': '4', 'Arkansas': '5', 'California': '6', 'Colorado': '8', 'Connecticut': '9',
             'District Of Columbia': '10', 'Florida': '12', 'Georgia': '13', 'Hawaii': '15', 'Idaho': '16', 'Illinois': '17', 'Indiana': '18',
             'Iowa': '19', 'Kansas': '20', 'Kentucky': '21', 'Louisiana': '22', 'Maine': '23', 'Maryland': '24', 'Massachusetts': '25',
             'Michigan': '26', 'Minnesota': '27', 'Mississippi': '28', 'Missouri': '29', 'Montana': '30', 'Nebraska': '31', 'Nevada': '32',
             'New Hampshire': '33', 'New Jersey': '34', 'New Mexico': '35', 'New York': '36', 'North Carolina': '37', 'North Dakota': '38', 'Ohio': '39',
             'Oklahoma': '40', 'Oregon': '41', 'Pennsylvania': '42', 'Rhode Island': '44', 'South Carolina': '45', 'South Dakota': '46', 'Tennessee': '47',
             'Texas': '48', 'Utah': '49', 'Vermont': '50', 'Virginia': '51', 'Washington': '53', 'West Virginia': '54', 'Wisconsin': '55',
             'Wyoming': '56'}

#I filtered Country of Mexico and Puerto Rico out because i found that the U.S. map I loaded does not include these two states?
df_filtered = df[(df['State'] != 'Country Of Mexico') & (df['State'] != 'Puerto Rico')]
color_scale = bq.ColorScale(scheme='BuPu')
color_axis = bq.ColorAxis(scale=color_scale, orientation='vertical', side='right')
initial_colors = {state_ids[state]: 0 for state in df_filtered['State'].unique()}
us_map = bq.Map(
    map_data=bq.topo_load('map_data/USStatesMap.json'),
    scales={'projection': bq.AlbersUSA(), 'color': color_scale},
    color=initial_colors
)

# Tooltip
tooltip = bq.Tooltip(fields=['name', 'color'], labels=['State', 'Median AQI'])
us_map.tooltip = tooltip

fig = bq.Figure(marks=[us_map], axes=[color_axis], title='2018-2023 U.S. Air Quality Map')


def update_map(*args):
    year = year_selector.value
    # Filter the dataframe for the selected year and valid states
    df_filtered = df[(df['Year'] == year) & (df['State'] != 'Country Of Mexico') & (df['State'] != 'Puerto Rico')]

    column_name = selector.value  # Use the AQI type dropdown value directly
    new_colors = {}
    for state in state_ids.keys():
        state_data = df_filtered[df_filtered['State'] == state]
        # The good/moderate day count representing each state is calculated by the day count mean of all the counties in that state.
        avg_day_count = state_data[column_name].mean() if not state_data.empty else 0
        new_colors[state_ids[state]] = avg_day_count
    us_map.color = new_colors


selector = widgets.Dropdown(
    options=['Good Days', 'Moderate Days', 'Unhealthy for Sensitive Groups Days',
             'Unhealthy Days', 'Very Unhealthy Days', 'Hazardous Days'],
    value='Good Days',
    description='Select AQI Days:',
    disabled=False
)

year_selector = widgets.Dropdown(
    options=[('2018', 2018), ('2019', 2019), ('2020', 2020), ('2021', 2021), ('2022', 2022), ('2023', 2023)],
    value=2023,
    description='Select Year:',
    disabled=False
)

selector.observe(update_map, 'value')
year_selector.observe(update_map, 'value')
vbox = widgets.VBox([year_selector, selector, fig])
display(vbox)
update_map()
air_quality_map = widgets.VBox([year_selector, selector, fig])

VBox(children=(Dropdown(description='Select Year:', index=5, options=(('2018', 2018), ('2019', 2019), ('2020',…

In [8]:
import seaborn as sns
def heatmap():
    heatmap_data = df.groupby(['State', 'Year'])['Max AQI'].mean().unstack()

    plt.figure(figsize=(15, 10))
    sns.heatmap(heatmap_data, annot=True, cmap='Purples', fmt=".0f")
    plt.title('Average Max AQI per State Over Years')
    plt.xlabel('Year')
    plt.ylabel('State')
    plt.show()

In [9]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
import numpy as np

def plot_aqi_trend(state):
    df = pd.read_csv("annual_aqi_by_county_2018-2023.csv")
    state_data = df[df['State'] == state]
    yearly_aqi = state_data.groupby('Year')['Max AQI'].mean().reset_index()

    plt.figure(figsize=(10, 5))
    plt.plot(yearly_aqi['Year'], yearly_aqi['Max AQI'], marker='o', label='Average Max AQI')
    model = LinearRegression()
    model.fit(yearly_aqi['Year'].values.reshape(-1, 1), yearly_aqi['Max AQI'].values.reshape(-1, 1))
    trend_line = model.predict(yearly_aqi['Year'].values.reshape(-1, 1))
    plt.plot(yearly_aqi['Year'], trend_line, label='Trend Line', linestyle='--', color='red')

    #plot detials
    plt.title(f'Time Series of AQI for {state}')
    plt.xlabel('Year')
    plt.ylabel('Average Max AQI')
    plt.legend()
    plt.grid(True)
    plt.show()

states = df['State'].unique()
state_selector = widgets.Dropdown(
    options=states,
    value=states[0],
    description='State:',
    disabled=False,
)

# interactive widget to allow users to pick which state they want to vizualise
aqi_trend = widgets.interactive(plot_aqi_trend, state=state_selector)


In [10]:
tab = widgets.Tab()

tab.children = [
    widgets.interactive_output(max_aqi, {}),
    widgets.interactive_output(heatmap, {}),
    good_days_per_state, 
    unhealthy_days_per_state,
    air_quality_map,
    aqi_trend
]

tab.set_title(0, 'Average Max AQI per State')
tab.set_title(1, 'Heatmap: Average Max AQI per State')
tab.set_title(2, 'Average Good Days per State')
tab.set_title(3, 'Average Unhealthy Days per State')
tab.set_title(4, 'U.S. Air Quality Map')
tab.set_title(5, 'Time Series of AQI trend')

def refresh_visualizations():
    #change the file name when new report is released
    data = pd.read_csv("annual_aqi_by_county_2018-2023.csv")
    tab.children = [
    widgets.interactive_output(max_aqi, {}),
    widgets.interactive_output(heatmap, {}),
    good_days_per_state, 
    unhealthy_days_per_state,
    air_quality_map,
    aqi_trend
]

# Building a refresh button at the bottom
refresh_button = widgets.Button(description="Refresh Data")
refresh_button.on_click(lambda b: refresh_visualizations())
layout = widgets.VBox([tab, refresh_button])
display(layout)

VBox(children=(Tab(children=(Output(), Output(), interactive(children=(Dropdown(description='Year:', options=(…

## Prose:

* One paragraph explaining how to use the dashboard you created, to help someone who is not an expert understand your dataset.
* A list of 1 or more contextual datasets you have identified, links to where they reside, and a sentence about why they might be useful in telling the final story.
  * by "contextual dataset" here means a dataset that would add context to your chosen dataset. For example, if your dataset is the Champaign bus routes, some interesting contextual datasets could be the Chicago bus routes, or the Springfield bus routes, or the Amtrak routes in Champaign
  * you do not have to do anything with this dataset at the moment beyond writing a bit about why it would be useful. Looking forward, you will want to include "contextual visualizations" (which you may or may not generate on your own) in your Final Project, Part 3 and identifying a possibly useful dataset is a great way to start looking for contextual visualizations.
* If you have identified your dataset as a "large one" (i.e. larger than the GitHub file upload limit) comment on if you want to revise your plan for hosting this data or not. If this does not apply to your dataset please explicitly state this.
* Additionally, please note that as of writing, it is not possible to embed images within Starboard. Be sure to address how you plan on including your contextual dataset to add context to your main dataset given that you won't be able to directly embed images if you plan on using Starboard for Part 3.1 of the Final Project.


**One paragraph explaining how to use the dashboard you created, to help someone who is not an expert understand your dataset.**
This interactive dashboard provides a comprehensive view of air quality across the U.S. states from 2018 to 2023. It features several visualizations to explore different aspects of air quality, such as the average maximum Air Quality Index (AQI), good and unhealthy air days, and state-specific AQI trends over the years. Users can select specific years or states using dropdown menus to filter the data displayed. For instance, by choosing a year in the "Average Good Days per State" visualization, one can see how many days each state had good air quality in that particular year. Similarly, the "Time Series of AQI trend" allows users to select a state and view its AQI trend across different years, highlighted with a trend line. Moreover, a map visualization shows overall air quality by state, using color coding to represent different AQI levels. This dashboard is designed to be user-friendly: simply select the desired year or state from the dropdown menus to update the visualizations accordingly. I also built a "Refresh Data" button at the bottom of the dashboard to update the visualizations with the latest data, ensuring the information displayed remains current.

**A list of 1 or more contextual datasets you have identified, links to where they reside, and a sentence about why they might be useful in telling the final story**
I want to add one more contextual dataset: USGC Earthquake Dataset (https://earthquake.usgs.gov/earthquakes/map/?extent=22.67485,-125.94727&extent=51.34434,-64.07227). Earthquakes, like volcanos, can release sulfur dioxide into the air as well as stir up dust. It might be interesting to know whether the states that have more earthquakes also tend to exhibit worse air quality than those where earthquakes take place less often. 

## Plot Summary

Summarize the characteristics of the dataset in words: what does it represent, what are the fields/columns/rows, what data types are they, etc.

The dataset "annual_aqi_by_county_2018-2023.csv" represents a comprehensive overview of air quality across various states in the United States from year 2018-2023. It provides detailed insights into daily air quality indices (AQI). 

The fields/columns/rows and data types include:

State (String): Name of the state.

County (String): Name of the county.

Good Days, Moderate Days, Unhealthy for Sensitive Groups Days, Unhealthy Days, Very Unhealthy Days, Hazardous Days, Days CO, Days NO2, Days Ozone, Days PM2.5, Days PM10(Integer): Number of days falling into each air quality category.

Max AQI, 90th percentile AQI, Median AQI (Integer): AQI metrics.