## Writeup: Create a Tableau Story
by Dylan Rose

_______

Links to project files: 

[Initial Visualization](https://public.tableau.com/shared/8ZCZ8H9GN?:display_count=n&:origin=viz_share_link)

[Final Submission](https://public.tableau.com/views/AirQualityAnalysis1985-2020Final/Story?:language=en-US&:display_count=n&:origin=viz_share_link)

___

__Summary:__

> My data visualization is a Tableau story that explores the _US Environmental Protection Agency's_ Air Quality Index (__AQI__) from 1985 to 2020. My main findings were that air quality has, on average, _improved_ since the 1980s, and that this improvement has occurred along with a _reduction_ of national SO2 emissions.

> I have included several maps, charts, and other figures in my visualization to help convey my data story.

__Design:__

> I went with an iterative design approach, in that I made my sketches in Tableau itself to explore the data and experiment with different visualizations before I got a feel for how I wanted to display my findings.

> Once I started exploring the AQI data and saw the positive trend for air quality, I wanted to see what it looked like broken down per county. I wanted to show a map of the country laid side-by-side with a line chart showing the national average air quality by year. 

> I tried the visualization per county, but it seemed like there was a lot of missing data. When I looked into it, I discovered that the EPA only requires counties with populations above 35,000 to report, so there was a lot of blank spots in the map.

> I ended up switching to just averaging the whole states, but found that Wyoming and South Dakota were missing data from 1980-1984, so I filtered my data to only include 1985 and up. I then realized that 2021 had only been reported up to March, so I omitted it from the report as well.

> The final omission I made was to filter out data from Mexico and US territories. I wanted to confine my report to US states.

At this point, I'm left with this map for the first step in my visualization:

![fig1](fig1.jpg)

> Now that I had a map that showed the air quality for each state, I wanted to add a line chart that demonstrated the national average AQI over the years. I applied a filter on this chart so the user could look at the air quality in each state over time.

> Once I had the filter working, I added a brief description of my main findings and changed the map to a light color.

> The last thing I created for this first dashboard was a bar chart showing the ranked average for each state. The map may have already shown this data, but I feel that including the bar chart allows for easier visual comparisons between states, and instantly shows which states had the best and the worst air quality. It also enhances the visual effect of the change in air quality over time, as the bars themselves become smaller and less orange.

![fig2](fig2.jpg)

> I felt at this point that the dashboard was close enough to its final condition to move on to the next visualization in my story.

> I wanted to specifically indicate that each state had seen an improvement in its air quality. I imagined a bar for each state that changed colors over the years to indicate its air quality in that year. After some experimentation, I ended up with this: 


![fig3](fig3.jpg)

> I reformatted the years to clean them up, resized the chart, and added the average AQI to the size mark as well.

> I added this chart to a dashboard with with a small map of the United States to allow the user to isolate specific states and compare them over time. I then moved onto the next visualization.

> I imagined this visualization to be a dashboard with many charts showing trends over time for each of the _defining parameters_, or the most significant impacting factor on air quality for each day.

> I ultimately decided on four* charts: 
>> Two line charts showing the percent of days that were defined by each parameter - one monthly and one annually. 

>> Two bar charts, one showing the AQI for each month, and another showing the average AQI on days defined by each parameter.

> *I ultimately used six charts, as I included data on SO2 emissions specifically after a later revision.

_________________________

In response to feedback I received, I have made several changes to the final version of the visualization. 
These changes are outlined in the table below. The feedback item that prompted each revision is given.

Feedback Item |    Revisions Made
:------------:|:------
1   | Use of abbreviations has been minimized in favor of common phrases. Wherever abbreviations are introduced, an explanation is provided. 
2   | Color scales changed from relative per year to fixed value to help draw comparisons objectively from year-to-year data.
3   | Annual and Seasonal Defining Paramater line charts changed to area charts. I feel this is an easier to read visualization that directly shows each paramater's makeup of the total.
4   | AVG AQI by Year chart changed to area chart for visual appeal. *Also renamed to _Air Quality Index by Year (Average)_ in response to Feedback Item # 1
5   | Redundant labels and any labels that do not provide useful information were removed.
6   | Updated explanations and added page titles to focus on main findings. Some extraneous details still remain where they explain interesting patterns and relationships. Additionally added visualization for SO2 levels and highlighted correlation between reduction of SO2 emissions and improvement of air quality to further add to the main point of improved air quality over time. Included explanation of Hawaii reduction in air quality in 2008 that was contrary to main point.
7 | Added national average Avg AQI and Air Quality by State by Year charts.
8 | Inserted dashboard titles as captions for story boxes.

Additional revisions:

> Updated all tooltip formatting to display information neatly and correctly

> Added credit footer and URL for the provider of the source data.

> Reformatted floating components as tiled components to improve formatting across display sizes.



__Feedback:__

I shared my visualization through a group chat on Facebook with a few of my friends, and I was given the following feedback:

Feedback Item |    Comment
:------------:|:------
1   | "You're using too many abbreviations. I don't know what the graphs are saying."  
2   | "Why do the scales change when I click the year? Some states are changing colors even though they have the same value."
3   | "Your charts in the third page are hard to read. There's a lot going on."
4   | "The Avg AQI by Year graph looks really simple. It doesn't seem to fit in with the rest of the graphs."
5   | "You have a lot of redundant labels. You don't need to label 'Year of Date' when you've already told me you're showing me 1985-2020."
6   | "There's a lot of information here, but I don't really get what you're trying to say. What's your main point?"
7   | "Is there a way to see the averages for the whole country?"
8   | "Why are the little boxes on the top blank?"

 


__Resources:__

Source  | Reason of Use | URL |
:------|:-------------|:-----|
1980-2021 Daily Air Quality Index from the EPA | This is the source dataset for my project. I found it by searching through Kaggle datasets with high usability scores until I found something interesting. | https://www.kaggle.com/threnjen/40-years-of-air-quality-index-from-the-epa-daily
AirNow.gov AQI Basics | I referenced this page to help me understanding what AQI actually represented. | https://www.airnow.gov/aqi/aqi-basics/
United States Geological Survey| Referenced to help explain anomaly in data regarding air quality in Hawaii during the decade following 2008 |https://www.usgs.gov/observatories/hawaiian-volcano-observatory/frequently-asked-questions-about-volcanic-smog-vog