# Introduction to Climate Data Visualizations in Python
*Developed by Computing Fellows Chianna and Bryn for the Climate & History course at Barnard College. Adapted by postbacc fellow Zoë.*

###What are Data Visualizations?
Data visualizations are graphic representations of data. When datasets are very complicated or have an overwhelming amount of data in them, a data visualization can act as both an exploratory tool (to make trends in the data legible) and a communication tool (to communicate something about the data). Data visualizations are important for accessibility: a good data visualization can be more widely understood than just looking at an excel sheet for example.

Data communication plays an important role in the way we interpret climate data and understand climate change, so visualizations of climate data are particularly important.

###Why Use Programming for Data Visualizations?
Scientific research datasets are all slightly different, and need to be handled, cleaned, and analyzed accordingly.

Coding allows you to customize your data analysis to fit individual datasets and research goals, more effetcively than any off-the-shelf software package.

Python, MATLAB, and R are the most commonly used programming languages in most scientific research. Today we will be learning the basics of Python, and building up to eventually create data visualizations of our own using Altair, a comprehensive data visualization library. 


---



## Basics of Python

**Print** statements will "print" whatever is inside the parentheses to the console.

In [None]:
# Print "Hello!"


Hello!


Above, we printed a string. Strings are sequences of characters or words bounded by quotation marks.

We can save strings (or any other data type) into variables, using the "=" sign. 

In [None]:
# We assign the string "Hello" to a variable


# Print our variable


Hello


Since we're working with quantitative data, numbers and arithemtic operations are also important. We can save numbers in variables just as we did with the string above. 

In [None]:
#Assign numeric values to variables a and b 


#Perform some simple arithmetic



7
3


In [None]:
#Some more complicated arithemtic..


A: 2.0
B: 4.0
C: 7
D: 10


#### Functions
Functions invoke prewritten code behind the scenes to perform a given operation. We can "pass" data (called the `argument`) into the function in the parentheses of the function.

We can make our own functions, use Python's built-in functions, or import extra functions from a library.

Consider our above example using print.  `print()` is a function, where the thing we're printing (`"Hello"`, for example) is the argument.

In [None]:
# Let's write a short function that adds one to our argument


# Print the output


# What number will print out?


3
101


#### Lists, dictionaries, and for-loops
These data structures are the heart are the heart of programming! First we will take a look at lists. 

Lists are lists of elements enclosed by brackets and separated by commas.

In [None]:
# Create a list


# Print our list


Our first list: 
[3, 10, 500]


We'll introduce a few functions you may want to use on a list.  Learning to work with large datasets takes some work, but understanding data types like lists and arrays can come in handy when you try to clean data before visualizing it.

In [None]:
# Get the length of a list

# "Append" or add an item to a list

# Remove an item from a list

# Add two lists


Our list after adding a new element:
[3, 10, 500, 6]
Our list after removing an element:
[3, 500, 6]
Both lists added together: 
[3, 500, 6, 4, 9, 10]


If we want to access a specific elemtn of a list, we do that by "indexing." Every element of a list has an index, and to identify that element, we use its index. In Python (and many other programming languages), indexing starts at 0, so the first item is the 0th.

```
list = ["apple", "banana", "orange" ]
           0        1         2
```
The index of apple is 0. The index of orange is 2. Let's say we want to change the third item (the item with index 2). 


```
list[2] = "grape"
```

Now our list looks like this:
```
list = ["apple", "banana", "grape" ]
           0        1         2
```




When we want to perform an operation on every element of a list, we *iterate* through the list. We can do this using a *for loop*.

In [None]:
# Write a simple for loop


3
500
6


In [None]:
# Now, write a for loop that iterates over index


[6, 1000, 12]


# Making Data Visualizations
Now, let's go ahead and work with some data visualizations.

## Import data
First, let's import our data.

The [data](https://www.ncdc.noaa.gov/cag/global/time-series/globe/land_ocean/ytd/12/1880-2019) we are using is climate change data from the NOAA (National Oceanic and Atmospheric Administration). Researchers at NOAA collected annual global temperatures since the year 1880, and then identified anomalies in the annual global temperature data for years between 1901-2000. The baseline of this anomaly plot will be the average temperature of the years 1901-2000. 

To import our dataset in a form that works with Python, we will use a built-in package of functions for data analysis called Pandas.  Using Pandas, we'll put the data in a DataFrame, which is how Pandas represents data tables.

First, let's import Pandas.

In [None]:
# Import Pandas and give it a nickname


Now, read in the data from our online source.  We'll work with the data in JSON form.

In [None]:
# Read in data

# Define data source
data_url = "https://www.ncdc.noaa.gov/cag/global/time-series/globe/land_ocean/ytd/12/1880-2019/data.json"

# Read JSON into a Pandas dataframe


# Display first few rows of DataFrame


Unnamed: 0,description,data
title,"Global Land and Ocean Temperature Anomalies, J...",
units,Degrees Celsius,
base_period,1901-2000,
missing,-999,
1880,,-0.12


The first four rows don't seem to have useful data for us.  Let's do a little data cleaning and leave them out for now.

In [None]:
# Skip the first four rows and rename the column that holds temperature


# Display first few rows of DataFrame


Unnamed: 0,description,Temperature (°C)
1899,,-0.16
1900,,-0.08
1901,,-0.16
1902,,-0.26
1903,,-0.38


We can ask Pandas to give us a summary of the data by writing `dataFrame.describe()`.  This will give us the total count and mean.  Let's also go ahead and do some more data-cleaning.

In [None]:
# Display information about the DataFrame


# Let's also convert our index column to a column called "Year" and drop the empty description column


  


Unnamed: 0,Temperature (°C),Year
1899,-0.16,1899
1900,-0.08,1900
1901,-0.16,1901
1902,-0.26,1902
1903,-0.38,1903


## Let's start visualizing!
Today we're going to be using a data visualization library called Altair.  Compared to other plotting libraries for Python, like matplotlib, Altair is a comprehensive visualization grammar (like ggplot2)—it's a little bit "smarter" making it a good choice for producing clean data visualizations!

It's okay if the syntax we use for this part of the workshop doesn't click immediately.  One of the important skills when you're learning to code is to learn how to search for resources!

### Make a line plot
First, let's make a line plot of the data with Year on the x-axis and Temperature on the y-axis.

The Chart() function of the altair package uses our Pandas DataFrame to make a graph object: 
```
alt.Chart(df)
```
The mark_line() function specifies the type of graph to be created as a line graph: 
```
alt.Chart(df).mark_line()
```
The encode() function identifies which column of our DataFrame holds data for the x axis, and which holds data for the y axis
```
 alt.Chart(df).mark_line().encode(
    x='Year:O',
    y='Temperature (°C)'
)
```
The properties() function allows us to assign a title and the size of the graph
```
alt.Chart(df).mark_line().encode(
    x='Year:O',
    y='Temperature (°C)'
).properties(
    title='Global Temperature Anomaly (1880-2020)',
    width = 700
    )
```
Let's try making a line plot with our data!

First, just as we did with Pandas earlier, we need to import Altair.

In [None]:
# Import Altair


Now, plot our data using Altair's `.Chart()`

In [None]:
# Create a line chart where x is year and y is temperature


That looks alright! Let's play around with some of the chart properties so it looks less stretched.

In [None]:
# Adjust the scale by setting a width and height


While a line chart, as seen above, does show that there is a positive correlation between time and temperature, the intensity of the temperature difference is not as well visualized in this plot. An *anomaly plot* might make more sense for visualizing anomaly data, so we will try that next.

To make the anomaly plot, we can use most of the same code as above, but we replace `mark_line()`, which made the graph a line graph, with `mark_area()`, which will create an anomoly plot instead. 


In [None]:
# Create an area plot to represent anomaly data


Try treating the year as quantitative data.

In [None]:
# Encode the year as temporal data


Now, let's add color to show that a more positive temperature anomally value correlates with warmer temperatures and a more negative temperature anomally value correlates with a cooler temperature. 

If you would like to change the colors we have previously picked out, feel free to look through this website: [Color-Hex](https://www.color-hex.com/). Simply replace the '#Color' we have below with the color code that you like best. 

In [None]:
# Add some color to our anomaly plot


We can also change the color of the area.  Let's make it a gradient to highlight differences between increasing and decreasing trends.

In [None]:
# Add gradient scheme to area


In [None]:
# Make a bar chart with conditional coloring


We can also change the width of our bars.

In [None]:
# We can adjust the width to spread the bars out more


# Different Ways of Displaying the Same Data

What if we want to represent data differently?

Below, we have a few different ways to represent the same data as before. Data visualizations can be very helpful, but they can also mislead the viewer and not show the data in the most accessible and intuitive way. 

**Scatter Plot**

Scatter plots are useful if the data should be interpretted as discrete data rather than ordered continuous data.

In [None]:
# Create a scatter plot


**Bubble Plot**

Bubble plots are like scatter plots, but they have an extra dimension (the size of the bubble) that represents a third piece of data.

In [None]:
# Make a bubble plot

#Activity:

Get into groups and take 15 minutes to create 2 graphs. One of them should illustrate global warming and one should be misleading.

Examples of other types of graphs in Altair: https://altair-viz.github.io/gallery/index.html \\
Documentation for encoding adjustments: https://altair-viz.github.io/user_guide/encoding.html