# MATH 3345 Project 5

Perform the tasks below.  Your final result should be an .HTML file of this notebook, with all code cells run and all answers provided in a markdown cell where instructed.

As a guide, remember to consult the examples and lesson notebooks from all chapters that we have covered in _**R for Data Science (2nd Ed.)**_

## Background

Fine particulate matter (PM2.5) is an ambient air pollutant for which there is strong evidence that it is harmful to human health. In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphere. Approximately every 3 years, the EPA releases its database on emissions of PM2.5. This database is known as the National Emissions Inventory (NEI). You can read
more information about the NEI at the EPA National Emissions Inventory web site.

For each year and for each type of PM source, the NEI records how many tons of PM2.5 were emitted from that source over the course of the entire year. The data that you will use for this assignment are for a sample of 10 U.S. counties during 1999, 2002, 2005, and 2008.

Your data set consists of the following files. On JupyterHub, these files are in the **Data** folder. If you are working in RStudio, download the files from D2L and place them on your machine in any location you choose.

#### Sample_PM25.rds
This file contains PM2.5 emissions data in the selected cities for 1999, 2002, 2005, and 2008. For each year, the table contains number of **tons** of PM2.5 emitted from a specific type of source for the entire year. Below are the columns:
- `fips`: A 5-digit number (represented as a string) indicating the U.S. county
- `SCC`: The name of the source represented as a digit string (see source classification code table below)
- `Emissions`: Amount of PM2.5 emitted, in tons
- `type`: The type of emission source (there are 4 types; explore the data to determine what these are)
- `year`: The year for which the emissions were recorded

#### Source_Classification_Code.rds
This table provides a mapping from the SCC digit strings in the Emissions table to the actual name of the PM2.5 source. The sources are categorized in a few different ways from more general to more specific and you may choose to explore whatever categories you think are most useful. For example, source “10100101” is known as “Ext Comb /Electric Gen /Anthracite Coal /Pulverized Coal”.


### Set Up 

At the very least, you will need the **dplyr** and **ggplot2** packages. Feel free to load and use other packages to implement your solution. Note that all plots should be produced using the **ggplot2** package.

**NOTE:** _Only run the 'install.packages' steps below if the ```library``` commands in the 4th code cell generate an error message._

In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("dplyr")

In [None]:
#Remove the comment symbol on the line below, run the line ONE time, then replace the comment symbol
#install.packages("ggplot2")

In [None]:
library(dplyr)
library(ggplot2)

### Data Set

The code below loads each of the dataframes in the dataset and displays the first few rows.

In [None]:
df_PM25 <- readRDS("Data/Sample_PM25.rds")
head(df_PM25)

In [None]:
df_sources <- readRDS("Data/Source_Classification_Code.rds")
head(df_sources,3)

## Tasks
Each task below poses a question that you should address by creating one or more appropriate plots using the **ggplot2** package. After displaying your plots, give a _**brief**_ (1-2 sentence) summary of your findings that answer the question.

### Task 1: Total Emissions Overall

Based on this cross-section of U.S. data, have total emissions from PM2.5 decreased in the U.S. from 1999 to 2008? Use one or more plots to illustrate the total PM2.5 emissions for each of the years 1999, 2002, 2005, and 2008.

Use the cell(s) below to create the plot(s) described. Be sure to run the cell(s) to display all of your results.

In [None]:
#Your solution to Task 1


#### Your Findings for Task 1

(Click into this cell to enter your answer here)

### Task 2: Total Emissions in Baltimore City 

Have total emissions from PM2.5 decreased in _**Baltimore City**_, Maryland (`fips == "24510"`) from 1999 to 2008? Use one or more plots to illustrate.

Use the cell(s) below to create the plot(s) described. Be sure to run the cell(s) to display all of your results.

In [None]:
#Your solution to Task 2


#### Your Findings for Task 2

(Click into this cell to enter your answer here)

### Task 3: Different Emission Sources in Baltimore City

Of the four types of sources indicated by the `type` variable (point, nonpoint, onroad, nonroad), which of these four sources have seen decreases in emissions from 1999–2008 for _**Baltimore City**_? Which have seen increases in emissions from 1999–2008? Use a **single plot with _facets_** to illustrate.

Use the cell(s) below to create the plot described. Be sure to run the cell(s) to display all of your results.

In [None]:
#Your solution to Task 3


#### Your Findings for Task 3

(Click into this cell to enter your answer here)

### Task 4: Emissions from Coal Combustion

Based on this cross-section of U.S. data, how have U.S. emissions _from coal combustion-related sources_ changed from 1999 to 2008? Use a plot to illustrate.

Use the cell(s) below to create the plot described. Be sure to run the cell(s) to display all of your results.

In [None]:
#Your solution to Task 4


#### Your Findings for Task 4

(Click into this cell to enter your answer here)

### Task 5: Motor Vehicle Emissions in Baltimore City

How have emissions _from motor vehicle sources_ changed from 1998 to 2008 in **_Baltimore City_**? Use a plot to illustrate.

Use the cell(s) below to create the plot described. Be sure to run the cell(s) to display all of your results.

In [None]:
#Your solution to Task 5


#### Your Findings for Task 5

(Click into this cell to enter your answer here)

### Task 6: Compare Baltimore City and Los Angeles County

Compare emissions from motor vehicle sources in Baltimore City with emissions from motor vehicle sources in **_Los Angeles County_**, California (`fips == "06037"`). Which city has seen greater changes over time in motor vehicle emissions? Consider both the overall _amount_ of change **AND** the _**percentage**_ of change.  Use one or more plots to illustrate.

Use the cell(s) below to create the plot(s) described. Be sure to run the cell(s) to display all of your results.

In [None]:
#Your solution to Task 6


#### Your Findings for Task 6

(Click into this cell to enter your answer here)