# 7.0: Some more data.

Finn has provided us with some fault observation data from some of his PhD work in Afar.  This work has been published in [Illsley-Kemp et al., 2018](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2018GC007947).
One of the cool aspects of this paper is in integrating geological (fault data, with observations made from satellite imagery) with seismological data.  Here we are just going to look at the fault data.

To cover:
1. Getting azimuths from lat and long
2. Plotting circular histograms
3. Getting mean, median from circular datasets

In [1]:
%matplotlib widget

## 7.1: Reading in the data

You know the drill - use pandas to read the csv file:

In [2]:
import pandas as pd

fault_data = pd.read_csv("data/ErtaAleFaultData.csv")
print(fault_data)

      Fault ID  Relative Fault Length   StartX   StartY     EndX     EndY  \
0          404               0.023976  40.7928  13.3795  40.7869  13.4021   
1          405               0.008062  40.7944  13.3777  40.7985  13.3710   
2          406               0.003065  40.7949  13.3788  40.7965  13.3763   
3          407               0.007603  40.8025  13.3736  40.7980  13.3795   
4          408               0.002570  40.7992  13.3757  40.8006  13.3735   
...        ...                    ...      ...      ...      ...      ...   
1148      2394               0.001679  40.7322  13.4801  40.7312  13.4815   
1149      2395               0.001257  40.7340  13.4756  40.7348  13.4747   
1150      2396               0.000753  40.7476  13.4591  40.7480  13.4585   
1151      2397               0.001192  40.7477  13.4587  40.7482  13.4576   
1152      2398               0.001272  40.7479  13.4580  40.7484  13.4568   

          Strike  
0     165.848768  
1     148.976987  
2     148.018224  

## 7.2: Calculating fault strike

Finn has already provided us with fault strikes in this data.  Lets check that he did it right!

We are going to use a really fast library called [`geographiclib`](https://geographiclib.sourceforge.io/1.50/python/) to do this, which handles projecting geographic data into different reference frames.  Its **really handy** for spatial datasets.

To start with, we need to define our projection, we are going to assume that Finn measured these points on the [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System) ellipsoid.

In [3]:
from geographiclib.geodesic import Geodesic

geodesic = Geodesic.WGS84

To get the distance and azimuth of the great-circle between two points, we use the `.Inverse` method of the `Geodesic` we just initialised.

In [7]:
output = geodesic.Inverse(-42, 180, -44, 160)
print(output)

{'lat1': -42, 'lon1': 180, 'lat2': -44, 'lon2': 160.0, 'a12': 14.774397005931048, 's12': 1641750.093999188, 'azi1': -104.53812062231566, 'azi2': -90.8214605033783}


This gives us a load of output as a `dict`. Remember that we can access items in a dictionary using their *key*,
e.g. to get the "`lat1`" value we would write `output["lat1"]`.

The attributes returned to us are:
- lat1 = φ1, latitude of point 1 (degrees)
- lon1 = λ1, longitude of point 1 (degrees)
- lat2 = φ2, latitude of point 2 (degrees)
- lon2 = λ2, longitude of point 2 (degrees)
- azi1 = α1, azimuth of line at point 1 (degrees)
- azi2 = α2, (forward) azimuth of line at point 2 (degrees)
- s12 = s12, distance from 1 to 2 (meters)
- a12 = σ12, arc length on auxiliary sphere from 1 to 2 (degrees)

The attributes we want are the length of the fault segment and the strike.  These are returned as `"s12"` and `"azi1"`. `"azi1"` could be less than 0 degrees, lets make sure that it stays with 0-360 using the modulo `%` operator and put it all in a function:

In [8]:
def get_distance_azimuth(lat1, lon1, lat2, lon2):
    """
    Get the distance and azimuth between two points on the Earth
    
    Parameters
    ----------
    lat1
        Latitude in degrees of point 1
    lon1
        Longitude in degrees of point 1
    lat2
        Latitude in degrees of point 2
    long2
        Longitude in degrees of point 2
        
    Returns
    -------
    distance in m
    azimuth in degrees clockwise from North
    """
    result = Geodesic.WGS84.Inverse(lat1, lon1, lat2, lon2)
    azim = result['azi1'] % 360
    return result['s12'], azim

Now lets test that we get something sensible. Lets start with two points on the same meridian.  If we start with a point north of the second point we should get an strike of 180:

In [10]:
get_distance_azimuth(-42, 140, -44, 140)

(222185.4901439271, 180.0)

And if we reverse the order we should get a strike of 0:

In [11]:
get_distance_azimuth(-44, 140, -42, 140)

(222185.4901439271, 0.0)

Now lets try two points at the same latitude, starting with point one west of point 2, we should get a strike of 90:

In [12]:
get_distance_azimuth(-42, 140, -42, 160)

(1653225.6571097176, 96.72911798388331)

That isn't what we expected! The reason is that a line of latitude is a small circle, and we are getting the great-circle distance.  If we do the same but at 0 degrees latitude (the equator), the line of latitude is a great circle and we should get a strike of 90 degrees:

In [13]:
get_distance_azimuth(0, 140, 0, 160)

(2226389.8158654715, 90.0)

And if we reverse the order of the points we should get a strike of 270:

In [14]:
get_distance_azimuth(0, 160, 0, 140)

(2226389.8158654715, 270.0)

Winner.  This may seem like a silly example (and it is), but you should **always test your codes**. Before you trust code to give you the right result make sure it gives you the right result in as many cases as possible. Writing tests often takes me longer than writing the code, but debugging code without tests is impossible.

### Calum's general rule for running code:
> If it isn't tested, it is **wrong**.

This isn't always true, but statistically... it is pretty close. That isn't to say that well-tested code is correct in every situation, which is why programmers (me included) are constantly adding test-cases for edge-cases that break code.

Right, now we have a function, lets see if we can use some pandas fu to quickly apply it to our dataframe.  To do this we are going to have to make use of a Python feature known as [lambdas](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions).  Lambdas are used to create small anonymous functions.  For us they mean that we can give multiple arguments to our function from our columns:

In [20]:
fault_data.apply(lambda row: get_distance_azimuth(
    row['StartY'], row['StartX'], row['EndY'], row['EndX']), axis=1)

0        (2580.6981006507976, 345.6637034413693)
1         (864.1066924974543, 149.0719370551293)
2        (326.39711716686503, 147.9282997784391)
3         (814.6539282950362, 323.2498676126684)
4         (286.7712136715904, 148.0747150806057)
                          ...                   
1148     (188.97963057445023, 325.0453104791867)
1149    (131.97495630725822, 138.97898118198674)
1150     (79.26154625429713, 146.87585655718598)
1151      (133.197751808977, 156.01653923690648)
1152     (143.37661158004232, 157.8138424661303)
Length: 1153, dtype: object

That returns a `Series` of `tuple`s, ideally we would unpack those and concatenate them onto our existing fault data.  To do that, we need to convert our `Series` to a `list`, then make a new dataframe from that `list` of `tuple`s:

In [23]:
strike_series = fault_data.apply(lambda row: get_distance_azimuth(
    row['StartY'], row['StartX'], row['EndY'], row['EndX']), axis=1)
strike_df = pd.DataFrame(
    strike_series.tolist(), columns=["Fault Length (m)", "Fault Strike (degrees)"])
print(strike_df)

      Fault Length (m)  Fault Strike (degrees)
0          2580.698101              345.663703
1           864.106692              149.071937
2           326.397117              147.928300
3           814.653928              323.249868
4           286.771214              148.074715
...                ...                     ...
1148        188.979631              325.045310
1149        131.974956              138.978981
1150         79.261546              146.875857
1151        133.197752              156.016539
1152        143.376612              157.813842

[1153 rows x 2 columns]


Now we can concatenate that `DataFrame` onto our origin fault `DataFrame`:

In [26]:
fault_data_extended = pd.concat([fault_data, strike_df], axis=1)
print(fault_data_extended)

      Fault ID  Relative Fault Length   StartX   StartY     EndX     EndY  \
0          404               0.023976  40.7928  13.3795  40.7869  13.4021   
1          405               0.008062  40.7944  13.3777  40.7985  13.3710   
2          406               0.003065  40.7949  13.3788  40.7965  13.3763   
3          407               0.007603  40.8025  13.3736  40.7980  13.3795   
4          408               0.002570  40.7992  13.3757  40.8006  13.3735   
...        ...                    ...      ...      ...      ...      ...   
1148      2394               0.001679  40.7322  13.4801  40.7312  13.4815   
1149      2395               0.001257  40.7340  13.4756  40.7348  13.4747   
1150      2396               0.000753  40.7476  13.4591  40.7480  13.4585   
1151      2397               0.001192  40.7477  13.4587  40.7482  13.4576   
1152      2398               0.001272  40.7479  13.4580  40.7484  13.4568   

          Strike  Fault Length (m)  Fault Strike (degrees)  
0     165.8487

We can see that our strikes are similar to Finn's, although some are flipped 180 degrees, which in this instance isn't significant.  Although Finn didn't provide us with the units for `Relative Fault Length` we can see that it definitely isn't in m or km; possibly degrees?

Enough checking, lets make some plots.

## 7.3 Circular histograms

You know how this works from the [previous notebook](6-More-plotting.ipynb), so without further ado:

In [40]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(subplot_kw=dict(projection="polar"))
ax.hist(np.radians(fault_data_extended["Fault Strike (degrees)"]), bins=90)
ax.set_theta_direction(-1)
ax.set_theta_offset(np.pi/2.0)
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Lets plot Finn's strikes just to visually check that they are similar:

In [43]:
fig, ax = plt.subplots(subplot_kw=dict(projection="polar"))
ax.hist(np.radians(fault_data_extended["Fault Strike (degrees)"]),
        bins=90, color="blue", label="Us")
ax.hist(np.radians(fault_data_extended["Strike"]),
        bins=90, color="orange", label="Finn")
ax.set_theta_direction(-1)
ax.set_theta_offset(np.pi/2.0)
ax.legend()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

This looks like Finn has fewer observations than us, but we know that isn't true. So what is going on here?

We set the number of bins (`bins=90`), but because our calculations range from 0-360, whereas Finn's range 0-180, the bin-width calculated from the range of the data / the number of bins, is smaller for Finn's data than ours.

What we can do to fix this is set `bins` as a list of bin edges:

In [44]:
bins = np.radians(np.arange(0, 360, 5))

This produced a numpy array of bin edges in radians every 5 degrees from 0 to 360 degrees. Lets fix our plot:

In [46]:
fig, ax = plt.subplots(subplot_kw=dict(projection="polar"))
ax.hist(np.radians(fault_data_extended["Strike"]),
        bins=bins, color="orange", label="Finn")
ax.hist(np.radians(fault_data_extended["Fault Strike (degrees)"]),
        bins=bins, color="blue", label="Us")
ax.set_theta_direction(-1)
ax.set_theta_offset(np.pi/2.0)
ax.legend()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

That makes more sense.  We have fewer observations in the 0-180 range than Finn because they are flipped by 180 degrees.

## 7.4: Circular statistics

We mentioned that statistics for circular (periodic) data are a little odd. Thankfully scipy has our back with a [few useful circular statistical functions](https://scipy.github.io/devdocs/stats.html#circular-statistical-functions).

Lets calculate the mean and standard deviation of the strikes, remembering to convert to and from radians:

In [48]:
from scipy.stats import circmean, circstd

mean = circmean(np.radians(fault_data_extended["Strike"]))
std = circstd(np.radians(fault_data_extended["Strike"]))

mean = np.degrees(mean)
std = np.degrees(std)

print(f"The mean of the strikes is {mean}.  The standard deviation is {std}")

The mean of the strikes is 155.47105348293826.  The standard deviation is 16.069104143668483


In Finn's paper he reports a mean of 157 for Erte-Ale - Finn has provided us with a subset of those data, so it isn't surprising that we are a little off, but still within one standard deviation.

**Exercise**: Compute the mean and standard deviation for the strikes we calculated. Do they match?

In [49]:
# Your answer here

**Exercise:** Now consider the New Zealand CMT dataset. Using sections of code from this notebook and the [previous notebook](6-More-plotting.ipynb) compute the mean and standard deviation of the strikes of the moment tensors in two regions:
1. In Fiordland, bounded by (165.4,-46.8,169.2,-43.9)
2. Surrounding Cook Strait, bounded by (173.4,-42.9,175.8,-40.7)

In [50]:
# Your answer here

## 7.-1: Summary

That is it for now.  Hopefully that has introduced you to at least one useful thing, and started you on your way to [zen](https://www.python.org/dev/peps/pep-0020/).

Let me know what you think, either in person, or as an issue on the [github page](https://github.com/calum-chamberlain/ESCI451-Python). Be nice and constructive though!

# -1: Some other interesting things

- [Stereonet plotting: mplstereonet](https://pypi.org/project/mplstereonet/)
- [Map plotting: Cartopy](https://scitools.org.uk/cartopy/docs/latest/)
- [Modeling crustal deformation: Pylith](https://geodynamics.org/cig/software/pylith/)
- [Handling seismic data in Python: Obspy](https://github.com/obspy/obspy/wiki)
- [Quasi-dynamic rate-and-state friction modeling of faults: Qdyn](https://github.com/ydluo/qdyn)
- [Clustering and Machine Learning: Sklearn](https://scikit-learn.org/stable/index.html)
- [Machine learning: Pytorch](https://pytorch.org/)
- ... Many many more useful earth-science Python packages. Search online for what you want and you might be surprised.