# Plotting with Matplotlib

**Introduction to Python Programming for Earth Scientists**
**Session #10,  Oct 2023 **

### Today's Schedule

- Student presentations of python packages/functions
- Walk through code assignment #2
- Classwork: Review Numpy Arrays, Plotting with Matplotlib

### Learning goals for Matplotlib

- Learn about Matplotlib and be able to import the pyplot sub-package.
- Know how to do line plotting basics for lists or arrays.
- Know how to annotate a plot: add axis labels, add a title to a plot, vary the line/marker style and color.
- Be able to create a x,y scatter plot, and know how it is different. 
- Learn how to plot multiple quantities on the same plot, and use a legend
- Learn how to plot histograms with variable number of bins

### Geoscience topics

- CO2 concentration measurements at Manua Loa, Hawaii
- Ice cores as a proxy for longterm climate evolution


## <span style="color: green;">REVIEW: numpy arrays</span>

### About arrays

Numpy `ndarrays` vs. Python `lists`:

- A list can contain object of different types; in an ndarray, all objects have the same type (default `float`)

- An `ndarray` can have multiple dimensions 

- When a mathematical operation is performed on one or more ndarrays. 
It is (usually) applied to *all* elements of the array at once


### Ways to create new arrays

- The `array()` function turns a list into an array. Ex: `myarr = np.array([1, 3, 2])`

- Functions like `zeros(), ones(),` and `empty()` create an array of given size. Ex: `myarr = np.ones(5)`

- The `arange()` acts like `range()` but generates an array. Ex: `myarr = np.arange(1, 10, 2)`

- The `linspace()` and `logspace()` functions also create arrays with sequences of values. Ex: `myarr = linspace(0, 100, 25) # create a sequence of 25 numbers from 0 to 100`



## <span style="color: green;">IN-CLASS PRACTICE: Length of day</span>


From [this wonderful site](https://www.timeanddate.com/sun/usa/boulder) we learn that Boulder's days are about 15 hours long at the summer solstice and 9 hours long at the winter solstice. That's a range of 6 hours and a period of one year (by definition!). Our formula for the length of day ("LOD", here represented in math by the symbol $L$) is therefore:

$$L = L_\text{mean} + R \sin(2\pi t / P)$$

where $L_\text{mean} = 12$ hours is the mean day length, $R = 6$ hours is the annual range, $t$ is the time in days since the vernal equinox ($\sim$March 21), and $P$ is the period in days (= 365, or really 365.25 if you want to want to be more accurate and factor in leap years).

In [None]:
import numpy as np

# make an array to represent days of the year

# define variables for `period` (= 365 days), `mean_lod` (= 12 hours), 
# and `lod_range` (= 6 hours) 

# use `np.sin()` to calculate the length of day for each day of the year


# get the plotting package!

import matplotlib.pyplot as plt

#plt.plot(NAME_OF_TIME_VARIABLE_HERE, NAME_OF_LOD_VARIABLE_HERE)
#plt.xlabel('Time from vernal equinox (days)')
#plt.ylabel('Length of day (hours)')

## Ha! Perfect transition! what is Matplotlib?

Matplotlib is a plotting library. It allows you to make plots for reports/papers from your data.
We most often the subpackage pyplot

In [None]:
# Load the matplot lib plotting package

import matplotlib.pyplot as plt


### Line plots and their annotation: the Keeling curve as an example

CO2 concentrations in the atmosphere first started to be measured in 1958 by Dr. Keeling (Scripps Oceanography). The carbon dioxide data on Mauna Loa now constitute the longest record of direct measurements of CO2 in the atmosphere. NOAA now maintains a longterm monitoring program of sampling of CO2 across the globe.<br>

Data are reported as a 'dry air mole fraction' defined as the number of molecules of carbon dioxide divided by the number of all molecules in air after water vapor has been removed. The mole fraction is expressed as parts per million (ppm). Example: 0.000400 is expressed as 400 ppm.

https://gml.noaa.gov/ccgg/trends/data.html

In [None]:
# Line plot using lists
# I downloaded this data on carbon dioxide from NOAA https://gml.noaa.gov/ccgg/trends/data.html

year = [1960, 1970, 1980, 1990, 2000, 2010, 2020]
CO2ppm = [316.91, 325.68, 338.76, 354.45, 369.71, 390.1, 414.21]

print (year, CO2ppm)

In [None]:
# plot the CO2 concentration

plt.plot(CO2ppm)

In [None]:
# so that shows just the list array indexes on the x-axis and the list values on the y-axis


In [None]:
# plot the CO2 over time (specific list for x, and specific list for y)

plt.plot(year, CO2ppm)

plt.ylabel('CO2 in ppm')
plt.xlabel('year')


In [None]:
# Annotate the plot with title and add when points were measured

# to show were data points are....
plt.plot(year,CO2ppm,'-o')

plt.ylabel('CO2 in ppm')
plt.xlabel('year')
plt.title('decadal CO2 concentration at Manua Loa Observatory, Hawaii ')

In [None]:
# here is an arbitrary example 
# evenly sampled time at 200m intervals
t = np.arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.show()

### Scatter Plots

In [None]:
# Here we load a more detailed time series of the CO2 concentration from file into an array

import numpy as np
    
# here I use numpy functions loadtxt(), we'll learn more about this in the next class
# it creates an array with data

arr = np.loadtxt("C02_ManuaLoa.csv", delimiter = ",")

# what's in this array?
print(len(arr))
print(arr[0],arr[-1] )


In [None]:
# I will use the scatter plot, it prints x versus y as single data points. 
# Scatter plots give a better idea of how many data points were used to establish the plot.

plt.scatter(arr[:,0],arr[:,1])
plt.ylabel('CO2 in ppm')
plt.xlabel('year')
plt.title('yearly CO2 concentration at Manua Loa Observatory, Hawaii ')
plt.show()

### <span style="color: green;">IN-CLASS PRACTICE</span>

In [None]:
# use the help(plt.scatter) to find out what you can change in scatter plots



In [None]:
# Using the example codes above, change the Keeling curve plot

# create a subset array of the years 2000 - 2022



# Challenge: plot a scatter plot of just 2000-2022
# change the markers to show green triangles



In [None]:
# Print what was the CO2 concentration in 2000 and 2022? 
# How much is the difference over those 23 years?
# Create an array with linspace() that is a linear function of CO2 concentration increase that starts at 2000 and 2022
# Plot this data into the same plot with a line 


###  Histograms

In [None]:
# here is an arbitrary dataset

mylist= [0.1, 0.01, 0.4, 0.03, 0.05, 0.03, 0.06, 0.02, 0.01, 0.42, 0.02, 0.05, 0.1, 0.02, 0.055, 0.2, 0.03, 0.05, 0.06, 0.02, 0.01, 0.42, 0.02, 0.05, 0.1, 0.02 ]

# let's see how it is distributed

plt.hist(mylist)

In [None]:
# you can also plot the cumulative distribution
plt.hist(mylist, cumulative=1)

# or reverse
#plt.hist(mylist, cumulative=-1)

## <span style="color: purple;">Motivating Problem</span>

### <span style="color: purple;">Ice Cores as Climate Data Records</span>

Modern-day observations can show how greenhouse gasses have been changing over the last few decades. But how do we know whether this is unusual over Earth's climate history?

This is where the accumulation zone of glaciers comes in handy - if there is no melt in the upper regions of a glacier, it keeps accumulating layers of annual snow fall in those areas. This is even more so the case for ice-sheets that have a dome shape and flow most rapidly in the lower regions.

Clues to past climates can be revealed by drilling into glaciers and ice sheets. The extracted ice cylinders, sometimes taken from several kilometers below the surface, show evidence of atmospheric composition, volcanic eruptions, dust storms, even wind patterns.

Since the 1960's scientists have been collecting ice cores from Greenland, Antarctica and mountain glaciers. 
The cores are preserved as 1 m long segments, investigated for gas contents, water chemistry and isotopes, dust traces, etc and then put in freezer storage. 

In 1993, after five years of drilling, the Greenland 'GISP2' project penetrated through the ice sheet and 1.55 meters into bedrock, recovering an ice core of 3053 meters depth.

Here we will take a first look at the data of one of the segments of this core. 
One of the important analysis that is done for an ice core is counting of the annual layers of ice, each year leaves a distinct layer (even if these layers get compressed over time).

The data is available here:
https://icecores.org/inventory/gisp2

### <span style="color: green;">IN-CLASS PRACTICE</span>

In [None]:
# load a pre-processed txt file 

import  numpy as np
import matplotlib.pyplot as plt
LT = np.loadtxt("greenland_gisp2_LT.txt")

# what's in this array? 
# Find out what is the length of this array and print it's first and last value.
# What is the mean of the arry values and what is the maximum? 



In [None]:
# plot a histogram of the ice core layer thickness with 20 bins
# label the x and y-axis
# add a title


In [None]:
# plot all data as single data points


# what is are two main observations if you see the thicknesses over the entire core segment?



In [None]:
# construct a new array that creates a depth record, based on the thickness data

# plot the layer thickness as a function of depth


### documentation on matplotlib

https://matplotlib.org/stable/