# 6.0: Plotting extended

<img alt="matplotlib topo" align="right" style="width:40%" src="https://matplotlib.org/3.1.1/_images/sphx_glr_topographic_hillshading_001.png">

Not all datasets fit into simple scatter, line or histogram plots. Python still has your back.
We won't discuss it here, but Python is also good for plotting maps, but it is a little
more advanced.  The best library is [cartopy](https://scitools.org.uk/cartopy/docs/latest/).



Here we will cover:
1. Density plotting with colourmaps - avoid overplotting
2. Multi-axis plotting: scatter plotting geochemical data
3. Circular plotting (rose/wind plots): plotting focal mechanisms from the NZ Moment Tensor database
4. 3D plotting

For more examples of what matplotlib can do, have a look at their
[gallery](https://matplotlib.org/3.1.1/gallery/index.html). The image on the
right is generated with matplotlib, and the [source code is here](https://matplotlib.org/3.1.1/gallery/specialty_plots/topographic_hillshading.html#sphx-glr-gallery-specialty-plots-topographic-hillshading-py)

In [1]:
%matplotlib widget

## 6.1: Density plotting

Overplotting data can lead to miss-interpretation.  Overplotting is when you have
many overlapping points on a plot. A simple example is a 2-D normal distribution
sampled at discrete points:

In [2]:
import numpy as np
import matplotlib.pyplot as plt

# Set a random state so that I can test the output
np.random.seed(42)

n = 100000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)

fig, ax = plt.subplots()
ax.scatter(x, y)
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

There are *many* overlapping point here.  We would be better to represent these data by their
density.  Matplotlib offers a few ways to do this, one of the simplest is the `hexbin`, which
grids the data into hexagonal bins and uses colour to represent the number of points within
that bin:

In [3]:
fig, ax = plt.subplots()
collection = ax.hexbin(x, y)
colorbar = fig.colorbar(collection)
colorbar.set_label("Count")

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

If you find yourself with a dense 2D dataset, plotting the density of points can be really helpful!

**Exercise:** Use the [hist2d](https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.axes.Axes.hist2d.html#matplotlib.axes.Axes.hist2d) method to make a similar plot, but with square "bars".

In [4]:
# Your answer here

## 6.2: Multi-axis plotting

### 6.2.1: Subplots

We keep using the:
```python
fig, ax = plt.subplots()
```
syntax for starting a plotting session, but we haven't explored the strength of subplots.
As the name suggests, subplots let you make multiple plots in one.  To explore this we will
use a dataset provided by Colin Wilson that contains a range of geochemical data. These data are 
from the Huckleberry Ridge Tuff and were published in [Swallow et al., 2019](https://academic.oup.com/petrology/article/60/7/1371/5524670). 

We will read it
in using pandas and make some scatter plots (as any *good/bad/indifferent* geochemist would/should). 

*Note:* The author is **not a geochemist** and doesn't know what plots are useful!

In [5]:
import pandas as pd

geochem = pd.read_csv(
    "data/Edited Swallow et al J Petrol data for plotting.csv",
    index_col="Sample")
print(geochem)

        SiO2 (wt%)  TiO2 (wt%)  Al2O3 (wt%)  Fe2O3 (T) (wt%)  MnO (wt%)  \
Sample                                                                    
YP114        72.98        0.26        13.56             2.96       0.05   
YP307        76.50        0.13        11.85             1.70       0.04   
YP359        74.56        0.17        12.83             2.06       0.04   
YP363        76.61        0.12        12.06             1.73       0.04   
YP414        76.08        0.14        12.14             1.79       0.04   
...            ...         ...          ...              ...        ...   
YP564        76.64        0.17        12.23             2.13       0.02   
YP603        76.93        0.19        11.77             2.29       0.04   
YP081        75.68        0.10        12.81             1.84       0.04   
YP133        76.69        0.10        12.04             1.55       0.02   
YP600        76.39        0.10        12.54             1.59       0.03   

        MgO (wt%)  CaO (

Lets make a plot of P2O5 against SiO2:

In [6]:
fig, ax = plt.subplots()
ax.scatter(geochem["SiO2 (wt%)"], geochem["P2O5 (wt%)"])
ax.set_ylabel("$P_2O_5$ (wt%)")
ax.set_xlabel("$SiO_2$ (wt%)")
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Lets work towards plotting the first four major elements in the table ($TiO_2$, $Al_2O_3$, $Fe_2O_3 (T)$, $MnO$) against $Si0_2$.

To do this we will need four axes.  We can make four axes using the `plt.subplots` command.  Lets get help for that function.  You can get to the docs for a given function by typing `function?`.  Run the cell below and you should see a pop-up of the docs.

In [7]:
plt.subplots?

The arguments that we care about at the moment at:
- `nrows`, the number of rows of subplots
- `ncols`, the number of columns of subplots
- `sharex`, whether axes should have the same x-axis.  We will be having $SiO_2$ on all the x-axes, so we will set this to be `True`.

Lets make a 2x2 grid of subplots:

In [8]:
fig, ax = plt.subplots(nrows=2, ncols=2, sharex=True)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

This returned two things as usual, `fig`: the `Figure` containing all our axes, and `ax`, which is a list of the subplot axes.  We can index those axes as usual.

`ax` is indexed by column then row:

In [9]:
fig, ax = plt.subplots(nrows=2, ncols=2, sharex=True)
for column in range(2):
    for row in range(2):
        ax[column][row].set_title(f"ax[{column}][{row}]")

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Okay, so we can make some axes, handy. Now lets plot the data:

In [10]:
fig, ax = plt.subplots(nrows=2, ncols=2, sharex=True)

elements = [["TiO2 (wt%)", "Al2O3 (wt%)"], ["Fe2O3 (T) (wt%)", "MnO (wt%)"]]
for column in range(2):
    for row in range(2):
        element_name = elements[column][row]
        ax[column][row].scatter(geochem["SiO2 (wt%)"], geochem[element_name])
        ax[column][row].set_ylabel(element_name)
        ax[column][row].set_xlabel("$SiO_2$ (wt%)")

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

The y-axis labels overlap, we can shift the ones in column 1 to be on the right hand-side:

In [11]:
fig, ax = plt.subplots(nrows=2, ncols=2, sharex=True)

elements = [["TiO2 (wt%)", "Al2O3 (wt%)"], ["Fe2O3 (T) (wt%)", "MnO (wt%)"]]
for column in range(2):
    for row in range(2):
        element_name = elements[column][row]
        ax[column][row].scatter(geochem["SiO2 (wt%)"], geochem[element_name])
        ax[column][row].set_ylabel(element_name)
        ax[column][row].set_xlabel("$SiO_2$ (wt%)")
        if row == 1:
            ax[column][row].yaxis.tick_right()
            ax[column][row].yaxis.set_label_position("right")

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

You can do all your normal customisation to subplots as well.  Note that because we set `sharex=True`, if you zoom in one one axis it will adjust the x-axis of all other plots as well, making it easy to see trends across multi-variate data.

**Exercise:** Turn off the tick labels for row 1 because they are unnecessary, and make the subplots sit tight next to one-another.  Have a look at [this page](https://matplotlib.org/3.1.3/gallery/subplots_axes_and_figures/shared_axis_demo.html) for help on hiding the xticks, and [this page](https://matplotlib.org/3.1.3/tutorials/intermediate/tight_layout_guide.html) for making axes tight together.

In [12]:
# Your answer here

### 6.2.2: Twinning axes

Sometimes it can be good to have multiple different datasets on one axes.  We can set up two y-axes on the same `Axes` object to allow us to plot different scales on the one figure.  Lets give this a go by plotting $P_2O_5$ and $K_2O$ both against $SiO_2$:

In [13]:
fig, ax = plt.subplots()
ax.scatter(geochem["SiO2 (wt%)"], geochem["P2O5 (wt%)"],
           label="$P_2O_5 (wt%)")
ax.set_xlabel("$SiO_2$ (wt%)")
ax.set_ylabel("$P_2O_5$ (wt%)")

ax1 = ax.twinx()
ax1.scatter(
    geochem["SiO2 (wt%)"], geochem["K2O (wt%)"], color="orange",
    label="$K_2O$ (wt%)")
ax1.set_ylabel("$K_2O$ (wt%)")

fig.legend()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Cool, but wouldn't it be nice if the axes labels were the same colour as the data so you could quickly see which scale related to which data?

**Exercise:** Make the scales the same colour as the data.  Check out [this page](https://matplotlib.org/3.1.1/gallery/subplots_axes_and_figures/two_scales.html#sphx-glr-gallery-subplots-axes-and-figures-two-scales-py) for help.

In [14]:
# Your answer here

## 6.3: Circular plotting

So far we have just been plotting nice real numbers. Circular data is a little different, mostly because it is periodic.  This introduces a few more issues, including issues in circular statistics that we will cover in the [next notebook](7-Geoscience-data-wrangling.ipynb).  Lets have a play with plotting some circular data.

### 6.3.1: New Zealand Moment Tensor database: 

To demonstrate plotting circular values we are
going to play around with the New Zealand Centroid Moment Tensor database, maintained 
by John Ristau of GNS.  This dataset is publicly available
on the [GeoNet github page](https://github.com/GeoNet/data). Centroid Moment Tensors are a little
like focal mechanisms: they are a way of modeling the faulting style of an earthquake.  They are
a little more complex than focal mechanisms because they allow for *non-double couple* forces, and
so can also describe explosions and implosions and any combination thereof.

To start off, we will write a little function to download the data from the website and 
read it into a pandas dataframe. We only care about the column `"strike1"` for this example,
but feel free to explore the database more at your leisure.

In [15]:
import requests
import pandas as pd

def get_geonet_cmt():
    """ Download GeoNet CMT catalogue and save to the Data directory. """
    response = requests.get(
        "https://raw.githubusercontent.com/GeoNet/data/master/"
        "moment-tensor/GeoNet_CMT_solutions.csv")
    with open("data/GeoNet_CMT_solutions.csv", "wb") as f:
        f.write(response.content)
    return pd.read_csv("data/GeoNet_CMT_solutions.csv", parse_dates=["Date"])

In [16]:
cmt_solutions = get_geonet_cmt()
print(cmt_solutions)

         PublicID                Date  Latitude  Longitude  strike1  dip1  \
0         2103645 2003-08-21 12:12:00  -45.1929   166.8300      213    56   
1         2169849 2003-08-21 14:12:00  -45.3592   166.8152      212    68   
2         2206498 2003-08-21 19:56:00  -45.2900   166.8020      252    53   
3         2218435 2003-08-22 00:02:00  -45.0656   166.9658      232    68   
4         2254800 2003-08-22 15:29:00  -45.1861   166.9908      247    48   
...           ...                 ...       ...        ...      ...   ...   
2408  2020p092995 2020-02-04 09:47:00  -37.6295   176.5458      190    57   
2409  2020p118012 2020-02-13 15:57:00  -41.6551   174.2750      339    67   
2410  2020p118387 2020-02-13 19:16:00  -41.6959   174.1826      318    83   
2411  2020p135192 2020-02-20 00:30:00  -38.7424   174.0895       31    88   
2412  2020p147099 2020-02-24 10:14:00  -44.7103   167.4464      175    53   

      rake1  strike2  dip2  rake2  ...  VR         Tva  Tpl  Taz        Nva

To make a circular plot, we will introduce the `projection` argument used when we initialise a figure.  We will make use of this below as well when making 3D plots.

To make a circular plot we use `projection="polar"` parsed to `subplot_kw` as a dictionary.

In [17]:
fig, ax = plt.subplots(subplot_kw=dict(projection="polar"))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Lets plot all the data to start with as a scatter.  To do this, we need to convert our angles to radians:

In [18]:
import numpy as np

strikes = np.radians(cmt_solutions.strike1)

Then we can make a scatter plot of strike against something else - in this case I plot against magnitude:

In [19]:
fig, ax = plt.subplots(subplot_kw=dict(projection="polar"))
ax.scatter(strikes, cmt_solutions.ML, label="Magnitude")
ax.legend()
ax.set_xlabel("Strike")
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Overplotting much?  Lets make it a histogram:

In [20]:
fig, ax = plt.subplots(subplot_kw=dict(projection="polar"))
ax.hist(strikes, bins=90)
ax.set_xlabel("Strike")
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

There appears to be a bias in strikes of our moment tensors! I wonder why that might be...?

**Exercise:** Make a circular histogram of the rakes (column `"rake1"`).

In [21]:
# Your answer here

## 6.4: 3D plotting

Some data are best in 3-D! Lets try and visualise where the earthquakes in the CMT catalogue are happening.  We won't bother converting from lat-long to distances in km (although we will learn about how to do that in the [next notebook](7-Geoscience-data-wrangling.ipynb)), and we will just make scatter plots of lat-long-depth.

To do this we use the `projection` keyword argument again, this time setting it to `"3d"`.  We have to add an extra step and import the `Axes3D` toolkit before to start plotting:

In [22]:
from mpl_toolkits.mplot3d import Axes3D

fig, ax = plt.subplots(subplot_kw=dict(projection="3d"))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

This should have popped up an interactive 3d plot that you can move around (after clicking the four-way arrow button).

Lets plot some data:

In [23]:
fig, ax = plt.subplots(subplot_kw=dict(projection="3d"))
ax.scatter(
    xs=cmt_solutions.Latitude, ys=cmt_solutions.Longitude, 
    zs=cmt_solutions.CD)
ax.set_xlabel("Latitude")
ax.set_ylabel("Longitude")
ax.set_zlabel("Depth (km)")
ax.invert_zaxis()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

This looks silly because we have longitudes both sides of 180. Lets wrap around 180 using the modulo `%` operator:

In [24]:
cmt_solutions.Longitude %= 360

In [25]:
fig, ax = plt.subplots(subplot_kw=dict(projection="3d"))
ax.scatter(
    xs=cmt_solutions.Latitude, ys=cmt_solutions.Longitude, 
    zs=cmt_solutions.CD)
ax.set_xlabel("Latitude")
ax.set_ylabel("Longitude")
ax.set_zlabel("Depth (km)")
ax.invert_zaxis()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Much better.  The modulo operator is really handy, you can read about it [here](https://docs.python.org/3.3/reference/expressions.html#binary-arithmetic-operations). 

We can see deep seismicity associated with the two subduction zones.

We can add some more useful information to our plot by scaling our dot-size by magnitude. We can set the `s` argument to one of our magnitude columns to scale them:

In [26]:
fig, ax = plt.subplots(subplot_kw=dict(projection="3d"))
ax.scatter(
    xs=cmt_solutions.Latitude, ys=cmt_solutions.Longitude, 
    zs=cmt_solutions.CD, s=cmt_solutions.ML)
ax.set_xlabel("Latitude")
ax.set_ylabel("Longitude")
ax.set_zlabel("Depth (km)")
ax.invert_zaxis()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

The difference in dot-size is not that obvious, play around with scaling the magnitude column until you can see the differences.

**Exercise:** Using your knowledge of [pandas](5-Pandas-introduction.ipynb), select a region of interest and make a 3D scatter plot of earthquake locations, scaled by magnitude.

In [27]:
# Your answer here

## 6.-1: Summary

<img alt="Puysegur Seismicity" align="right" style="width:40%" src="images/Summary_map_raw.png">

That covers some more options for plotting, but we still haven't touched every plotting option, and we haven't touched map plotting.  I'm leaving that for the GMT class, but for keen Pythonistas, check out [cartopy](https://scitools.org.uk/cartopy/docs/latest/#).  The map to the right was made using cartopy, subplots, and various other Python things that we have discussed.  I added fault labels later in inkscape, and moved the legend to the top layer... If anyone wants the source-code let me know.

Our final notebook comes next, with some [playing around with data](7-Geoscience-data-wrangling.ipynb)