# Lesson Three: Introduction to Plotting Data
Congrats, you've made it to week three! This week will be awesome for those of you who are more artistically inclined, as we will be exploring different ways to visualise data. 

![Stonks](https://media.giphy.com/media/YnkMcHgNIMW4Yfmjxr/giphy.gif)

# Section One: Importing Packages
Let's jump right back in and import the necessary packages. Once again, we'll use Pandas in addition to **Matplotlib**. Matplotlib's pyplot package will allow us to create all the necessary plots for today.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd 

In [None]:
data = pd.read_csv('../data/dsny_data.csv')

In [None]:
# TODO: Examine the head of the dataset 
data.head()

## Plotting Quantitative Data
For starters, we'll start with some of the most common plots for some of the most common data types: quantitative data. This data deals with independent and dependent variables; we recommend you check out the video below on independent and dependent variables to learn more about them. 

In [None]:
%%html
`<iframe width="560" height="315" src="https://www.youtube.com/embed/l0jTMDtX4WY" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>`

### Line Chart 
A **line chart** plots the relationship between two variables as a collection of lines connecting points. Line charts are very useful when dealing with data collected over a time period, and we want to view how that data changes over time. Below, we'll plot the mean relative humidity over the first 200 values from the dataset. 

In [None]:
plt.plot(data['time'][:200], data['relative_humidity_mean'][:200])
plt.show()

### Histogram
A **histogram** tells us how much data falls into a certain range of numbers. Say we wanted to examine the frequency of all names. Think of a histogram as a collection of bar graphs whose heights are determined by how many values fall into certain **bins**. We can use a histogram instead of a line chart this time to view how the data is **distributed**. Let's plot the mean temperatures over the course of the two years.

In [None]:
plt.hist(data['temperature_mean'])
plt.show()

### Scatterplot
A **scatterplot** plots the relationship between two variables as a collection of points. This type of plot is very similar to a line plot, and is used for many of the same purposes. It's mostly a matter of opinion, so we'll show you how to go about plotting the CO2 flux storage data over the last 200 samples taken.

In [None]:
plt.scatter(data['time'][-200:], data['CO2_flux_storage'][-200:])
plt.plot()

## Making Breathtaking Plots 
There's one thing you may have noticed about the plots above: they all look extremely boring. And awful. There's not a whole lot going on, and the plots themselves don't tell us a lot about the data itself. What's being plotted? What does each **axis** represent?

### Adding Labels
One of the most important things to do when plotting data is to label your plot. The plots that we've worked with so far today have had two dimensions: the **x-axis** and the **y-axis**.We can add a title using the ``.title()`` method, an x-axis label using the ``.xlabel()`` method, and a y-label using the ``.ylabel()`` method. Here's the before:

In [None]:
plt.plot(data['time'][:200], data['relative_humidity_mean'][:200])
plt.show()

And here's the after:

In [None]:
plt.plot(data['time'][:200], data['relative_humidity_mean'][:200])
plt.title('Date versus Mean Relative Humidity at the DSNY NEON Site')
plt.xlabel('Date')
plt.ylabel('Relative Humidity (%)')
plt.show()

One last thing we'll do is to limit the number of labels on each axis. As you can see above, there are way too many overlapping labels to be able to read each date. Using the plot from above, we can achieve this using the `xticks()` method.

In [None]:
plt.plot(data['time'][:200], data['relative_humidity_mean'][:200])
plt.title('Date versus Mean Relative Humidity at the DSNY NEON Site')
plt.xlabel('Date')
plt.ylabel('Relative Humidity (%)')
locs, labels = plt.xticks()
plt.xticks(locs[0::80], rotation=20)
plt.show()

### Adding Plot Styles 
As you can tell, our plots are pretty, well, boring. They let us explore our data pretty well, but they're just not that visually appealing. To fix this, we can use plot styles. To achieve this, we must use ``plt.style.context()``. 

My personal favorites are the fivethirtyeight and seaborn styles. These styles are nods to two different organizations. [FiveThirtyEight](https://fivethirtyeight.com) is a website that discusses statistics for nearly every topic, especially politics, economics, and sports. Their unique style for creating plots can be used in python by calling `plt.style.use('fivethirtyeight')`. 

![Five Thirty Eight Super Bowl Plot](https://fivethirtyeight.com/wp-content/uploads/2019/01/paine-superbowlduds-1.png?w=575)

[Seaborn](https://seaborn.pydata.org/) is another Python package that is capable of creating advanced data plots. The kind of plots you can create with the Seaborn library are pretty neat, though pretty tricky if you're just starting out. Luckily, we can use their unique and appealing style without having to use the package itself by calling `plt.style.use('seaborn')`.

![Seaborn Plot](https://seaborn.pydata.org/_images/regression_marginals.png)

For a complete list of the available styles, be sure to check out [this website](https://matplotlib.org/3.2.1/gallery/style_sheets/style_sheets_reference.html). We'll start by changing our plot to the seaborn style.

In [None]:
with plt.style.context('seaborn'):
    plt.plot(data['time'][:200], data['relative_humidity_mean'][:200])
    plt.show()

Wow, our plots are already looking much better than before! What if we want to use the fivethirtyeight style? 

In [None]:
with plt.style.context('fivethirtyeight'):
    plt.plot(data['time'][:200], data['relative_humidity_mean'][:200])
    plt.show()

Last but not least, let's use the ``.figure()`` method to tell matplotlib we want to make our figure bigger. Let's start by making our plot 10 inches by 5 inches.

In [None]:
with plt.style.context('seaborn'):
    plt.figure(figsize=(10, 5))
    plt.hist(data['temperature_mean'])
    plt.title('Histogram of Mean Temperatures at the DSNY NEON Site')
    plt.xlabel('Mean Temperature (¬∞Celsius)')
    plt.ylabel('Count')
    plt.show()

# Practice
Great job today, we definitely threw a lot of information at you. That being said, make sure to practice to perfect your Python plotting skills.

### **Question One** 
Create a line chart, showing the date on the x-axis and the last 1000 CO2 flux storage values from the dataset on the y-axis.

In [None]:
# TODO: Plot your data


### **Question Two** 
Recreate the plot from above, but this time, label your axes and provide a title. For the date axis, use "Date". For the precipitation axis, use "CO2 Flux in Storage". For the plot title, use "CO2 Flux in Storage for the DSNY NEON Site".

In [None]:
# TODO: Plot your data


### **Question Three** 
Let's go ahead and create a histogram, showing the amount of precipitation at the DSNY site.

In [None]:
# TODO: Create a histogram of the DSNY precipitation data


Yikes, only one column in our plot? Let's try making more bins using the `bins` argument. Let's try creating 20 bins. Furthermore, let's limit our range to be between one and three [0,3]. *Remember*: use `shift + tab` to see how to achieve this. 

In [None]:
# TODO: Recreate the histogram with 100 bins


## üèÖChallenge
Time for the challenge question! This one will require you to do a bit of investigating the matplotlib package for yourself. For this, we're going to plot our DSNY data onto one plot. We'll plot the mean, minimum, and maximum temperature values. Furthermore, we'll be plotting this data **only** for the day of July 11, 2020.

Make sure to plot the minimums in the color red, maximums in the color purple, and the mean in the color green. Furthermore, we'll go ahead and include a plot legend using ``plt.legend()``. Be sure to pass in the *label* argument when creating each line plot: for the minimum, maximum, and mean line plots, use "Minimum", "Maximum", and "Mean", respectively.

Finally, make sure to label your axes. Label the date axis with "Date", the temperature axis with "Temperature (¬∞ Celsius)", and add the title "Minimum, Maximum, and Mean Temperatures at the DSNY NEON Site for July 11, 2020"

In [None]:
# TODO: Divide the dataset such that only dates from June 2020
# to July 2020 are present


In [None]:
# TODO: Create the unique data plot


Okay that's all we have for this week. Please feel free to reach out to us through email or attend our weekly Office Hours for questions or help on the practice problems.