# Project 1: Planet Earth's Global Temperature

In this project, we will download temperature data from NASA’s Goddard Institute for Space Studies (GISS) and analyze it using pandas (and possibly seaborn).

### Introduction

First, we must understand the data collection method. This method was original documented by Hansen and Lebedeff in [1987](https://pubs.giss.nasa.gov/docs/1987/1987_Hansen_ha00700d.pdf) and the study has been repeated many times, including in [2010](https://pubs.giss.nasa.gov/docs/2010/2010_Hansen_ha00510u.pdf). A summary of the data collection method is as follows.
1. Thousands of meteorological stations around the world (land and sea) measure the surface air temperature. All measurements discussed here will be in Celsius.
1. The world average temperature throughout each month for all stations is calculated. Systematic affects are corrected and uncertainty is estimated.
1. The "**reference monthly average temperature**" is calculated by averaging over all stations and measurements during each month of the three decade period 1951-1980.
1. The "**temperature anomaly**" is defined as the world monthly average temperature minus the reference monthly average temperature.

### Accessing temperature anomaly data

The GISS temperature anomaly data can be found here. https://data.giss.nasa.gov/gistemp/

From this link please find and download the CSV file
"Global-mean monthly, seasonal, and annual means, 1880-present, updated through most recent month". The file name should be `GLB.Ts+dSST.csv`.

### Data analysis

#### Import packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#### Read in the data

Use panda's `read_csv()` function with the argument `skiprows=1` to read in the data and setup a `DataFrame`.

#### Inspect the data

Print the `DataFrame` you created.  You should see a "Year" column and a column for each month ("Jan", "Feb", etc.).  There are also columns for select averages.
* "J-D" = 1 year average temperature (January -> December)
* "D-N" = 1 year average temperature (December -> November)
* "DJF" = 3 month average temperature (Dec, Jan, Feb)
* "MAM" = 3 month average temperature (Mar, Apr, May)
* "JJA" = 3 month average temperature (Jun, Jul, Aug)
* "SON" = 3 month average temperature (Sep, Oct, Nov)       

#### Data preparation 

Set the "Year" column as the `DataFrame` index.

Drop the last six columns ('J-D','D-N','DJF','MAM','JJA', and 'SON'), which we will not use.

#### Data cleaning

The data is updated regularly, but some months may not be available yet.  In such cases, the missing data appears as three asterisks (`***`). Replace such entries with something that pandas understands, `NaN`.

Call `info()` on your `DataFrame`.

If the `Dtype` of each column isn't `float64`, convert the `DataFrame` using the `astype()` function.

#### Data visualization

Make a plot of the January Temperature Anomaly (column "Jan") versus year. Basically you are plotting the data contained in one column of the `DataFrame` versus the index of the `DataFrame` (the year). Label your axes and include temperature units.

Make the plot again, but with argument `kind='bar'`.

Make a new column in your new `DataFrame` containing the average temperature anomaly for the year (with column name "Yearly Avg") and plot this column versus year. Calculate this average on your own (instead of relying on the "J-D" column which we dropped above). This can be done with the pandas `mean()` function. Check the documentation for this function to ensure that you are calculating the average over a row, not a column.

Make a plot of the yearly average temperature anomaly versus year (as a line graph again). 

-----------------

Use a "Savitzky–Golay filter" to smooth the data.  The scipy package includes a [`savgol_filter` function](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html) which implements this filter.  Try different values of the `window_length` (odd values only) and `polyorder` arguments to find suitable values.  Overlay the raw (in black) and smoothed (in blue) yearly averages.  Include a legend.

Slice the `DataFrame` to obtain a new `DataFrame` with only the last 10 years worth of data.  Average over these years to find the temperature anomaly for each of the 12 months, and print the result.

------------

On a single set of axes, overlay the raw distribution (in black) with each of the 12 monthly average temperature anomalies (in various colors). Draw the yearly average with a thicker line width to make it stand out. Label your axes and draw a legend. 

It may help to draw the legend outside the axes using the following syntax:
```python
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., prop={'size'. 11})
```

You can use matplotlib [`Colormaps`](https://matplotlib.org/stable/api/matplotlib_configuration_api.html#matplotlib.colormaps) to easily vary the color of each of the 12 monthly distributions.

Plot the temperature anomaly versus month. This time, do this for every every year (so there will be hundreds of lines drawn). 

Begin by using the `transpose()` function to convert the `DataFrame` indices to columns and vice versa.

Drop the "Yearly Average" row so that we only have monthly data in our plot.

Finally, plot all the distributions on a single set of axes.  Use magenta for the most recent year, and use the "brg" `Colormap` for all others (by passing the argument `cmap="brg"` to the `plot()` function). This will ensure that one can tell the old years from the more recent years.

It can be difficult to interpret such a busy plot.  You should make a simpler plot which can be more easily interpretted.

First, create a new `DataFrame` with columns for each decade and rows for each month, which holds the average temperature anomaly.

On a single set of axes, plot the temperature anomaly versus month for each decade.

---------------

#### Bonus 

Use seaborn to plot the `DataFrame` as a heatmap. The $y$-axis should be the month and the $x$-axis should be the year. The $z$-axis (the color of the heatmap) should represent the temperature anomaly. Choose a color palette that you think will be a good way of representating the data: https://matplotlib.org/examples/color/colormaps_reference.html.