## Clustering Fold Planes

- Find out about Machine Learning
- Learn about using the **scikit-learn** python package for clustering analysis
- Apply clustering analysis to an earth science example (clustering fold directions)

This notebook is modified from Lecture 24 from Lisa Tauxe's course [Python for Earth Science Students](https://nbviewer.jupyter.org/github/ltauxe/Python-for-Earth-Science-Students/tree/master/).

### Import the scientific python packages we will need

In [None]:
!pip install mplstereonet
import mplstereonet
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image
%matplotlib inline

### The Orocopio Mountains Dataset
The dataset poles_data contains a dataset of poles to bedding planes from the Orocopio mountains on southern California. If a rock is composed of sediments that are layed down flat on top of one another, then we would expect the pole to the plane to be vertical (because the plane itself is horizontal). If instead the plane is tilted, we might expect the pole to the plane to be in some other direction. We'll look at a data set of poles from bedding planes measured in the  Orocopio Mountains.

In [None]:
poles_data=pd.read_csv('data/Orocopio_Poles_Data.csv')
poles_data.head()

Now we'll add new columns to our dataframe with the strike and dip of the bedding planes.

In [None]:
poles_data['strike']=
poles_data['dip']=
poles_data.head()

We'll us the package `mplstereonet` to plot the planes and poles on a stereonet. First we set `projection='stereonet'` and then we can use `ax.plane` and `ax.pole` to plot.

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='stereonet')
ax.plane(poles_data.iloc[0]['strike'], poles_data.iloc[0]['dip'], 'g-', linewidth=1, label='plane')
ax.pole(poles_data.iloc[0]['strike'], poles_data.iloc[0]['dip'], 'go', markeredgecolor='k', markersize=8, label='pole')
ax.grid()
plt.legend()
plt.show()

In [None]:
fig = plt.figure()
ax = fig.add_subplot(121, projection='stereonet')
ax.plane(poles_data['strike'], poles_data['dip'], 'g-', linewidth=1)
ax.set_title('Planes')
ax.grid()
ax = fig.add_subplot(122, projection='stereonet')
ax.pole(poles_data['strike'], poles_data['dip'], 'co', markeredgecolor='k', markersize=5)
ax.set_title('Poles')
ax.grid()
plt.show()

This is interesting! It seems that there are two 'clusters' with of bedding planes in different directions in this dataset, one to the north-east and one to the south-west. We want a way of separating these two clusters, but first let's think about what causes this. Is there some spatial relationship between where the different directions are found?

#### Quiver Plots and **plt.imshow( )**

To illustrate this we can use a 'quiver plot'.  It draws an arrow with the direction of the plane on a plot. To do this, we need to convert the data from azimuth and plunge to x, y and z. We'll write a have the handy function **dir2cart()** to convert to cartesian coordinates.  

In [None]:
def dir2cart(Az,Pl):
    """
    converts polar directions to cartesian coordinates
    Inputs: 
        Dir[Azimuth,Plunge]:  directions in degreess
    Output:
        [X,Y,Z]: cartesian coordinates
    """
    Az=np.radians(Az)
    Pl=np.radians(Pl)
    return [np.cos(Az)*np.cos(Pl),np.sin(Az)*np.cos(Pl),np.sin(Pl)]

<font color=goldenrod>**_Code for you to write_**</font>

Call the function `dir2cart` to convert `Pol_Az` and `Pole_Plunge` into `u,v,w` cartesian coordinates.

In our coordinate system, $w$ is straight up, so planes with a steeper direction will have a smaller $u$ and $v$ components and a larger $w$ component, and so the arrows on the quiver plot will appear shorter in length.  

We will plot the quiver plot on top of a satellite image of the area, using the **plt.imread( )** and **plt.imshow( )** functions in **matplotlib**. These take an image and convert it into a coordinate system we can plot data onto.

In [None]:
img = plt.imread('data/GoogleEarthImage.png') #Reads in our image as a numpy array
extent = [-115.7115, -115.6795, 33.5442, 33.5651] #Sets the corners of the image in lat/lon for plotting
plt.figure(figsize=(9,13)) #Creates a new figure object to put the image on
plt.imshow(img, origin='upper', extent=extent) #Plots the satellite image.;

#Now let's plot the quivers onto the image 
#plt.quiver takes 4 arguments, x and y (locations of arrows), 
# and u and v (lengths of arrows in u and v directions).  
# We can also set the color so we can see the vectors better

plt.quiver(poles_data['Lon'],poles_data['Lat'],u,v,color='cyan');

This plot tells us an interesting story. Along the center of the satellite image runs a linear feature. To the north of this feature, we see that the arrows are pointing to the north-east. To the south-west of this image, the arrows are pointing south-west. What could be the cause of this pattern?

One probable cause would be a fold or anticline. For an illustration, see the image below. In an anticline, the horizontal layers are tilted away from the axis of the fold, so that the poles to the plane (arrows) are pointing away from the fold axis (dotted line). 

In [None]:
Image('images/Fold_Diagram.png',width=300)

#### Clustering our data
Instead of "eyeballing" as we did at first, what if we wanted to automatically sort the two different directions into two different groups? How would we most easily do that? We don't really want to have to _train_ this dataset as we don't really care which group is which in this case, we just want some way of splitting the data into sensible groups. As such we might want to use some kind of _unsupervised_ machine learning process.

The **scikit-learn** package has a module called **sklearn.cluster** that allows us to solve this problem. There are many algorithms for different 'shapes' of clusters. Let's try converting our data into a format **scikit-learn** understands, then use the **Kmeans** clustering algorithm on them.

**scikit-learn** requires our data to be in a format in which  each datapoint has a set of _features_ which are a bit like coordinates.

In [None]:
from sklearn.cluster import KMeans

In [None]:
input_data=np.array([poles_data['Pole_Az'],poles_data['Pole_Plunge']]).T
print(input_data[0:5])

Note with **Kmeans**; you are not required  to choose the number of clusters.  However letting it work on its own generally doesn't work too well as it will try to find clusters with very similar sizes. If we try it with this example, we get a lot of clusters which don't really tell us much. Now let's do the clustering:

In [None]:
kmeans = KMeans() #unspecified number of clusters
fit=kmeans.fit(input_data) #Fits the kmeans algorithm to our input data
clusternumbers=kmeans.predict(input_data) #Gives the cluster numbers for each of our clusters

In [None]:
#Plots the equal area with colors for clusters
fig = plt.figure()
ax = fig.add_subplot(111, projection='stereonet')
for clust in np.arange(0,np.max(clusternumbers)):
    ax.pole(poles_data[clusternumbers==clust]['strike'], 
            poles_data[clusternumbers==clust]['dip'], 'o', 
            markeredgecolor='k', markersize=8)
ax.grid()
plt.show()

In [None]:
extent = [-115.7115, -115.6795, 33.5442, 33.5651]
img = plt.imread('data/GoogleEarthImage.png')
plt.figure(figsize=(9,13))
plt.imshow(img, origin='upper', extent=extent)
plt.quiver(poles_data['Lon'],poles_data['Lat'],u,v,clusternumbers,cmap='tab10');
plt.axis('Off'); #Turn off the plotting axes with tick marks, etc.

Lesson learned:  unsupervised does not mean just letting **scikit-learn** loose with no guidance!  Look at the data and provide some intellegient boundaries.  

<font color=goldenrod>**_Code for you to write_**</font>

Repeat the cluster analysis but use the arguement `n_clusters=2` to `KMeans()` when setting up the cluster algorithm to tell it there are two clusters. Then plot a stereonet of the poles colorcoded by `clusternumber`.

Hmm, it seems like this didn't work exactly as expected. Notice how there seems to be a change in cluster across the 0 degree Azimuth line? Let's plot Azimuth against plunge on an x,y plot to see why this didn't seem to work very well. 

In [None]:
fig = plt.figure()
plt.plot(poles_data[clusternumbers==0]['Pole_Az'], poles_data[clusternumbers==0]['Pole_Plunge'],
         'o',c='darkblue', markeredgecolor='k', markersize=8)
plt.plot(poles_data[clusternumbers==1]['Pole_Az'], poles_data[clusternumbers==1]['Pole_Plunge'], 
         'o',c='darkred', markeredgecolor='k', markersize=8)
plt.xlabel('Strike')
plt.ylabel('Dip');

The **Kmeans** algorithm treats data as if they were cartesian. But in geology, we often use directions that go from 0 to 360 which doesn't behave the same way as other cartesian data sets.  For example,an azimuth of 340 is closer to 200 than to 0 under this scheme. A simple solution to this would be to convert our azimuths and plunges to cartesian coordinates (as we did for the quiver plot) before clustering. Let's try again:

In [None]:
kmeans = KMeans(n_clusters=2) #This tells us that we are using a clustering algorithm with 2 clusters
input_data2=np.array([u,v,w]).transpose() # make and array with u,v,w as the first, second and third rows
fit=kmeans.fit(input_data2) #Fits the kmeans algorithm to our input data
clusternumbers=kmeans.predict(input_data) #Gives the cluster numbers for each of our clusters

#Plots the equal area with colors for clusters
fig = plt.figure()
ax = fig.add_subplot(111, projection='stereonet')
ax.pole(poles_data[clusternumbers==0]['strike'], poles_data[clusternumbers==0]['dip'], 
        'o',c='darkblue', markeredgecolor='k', markersize=8)
ax.pole(poles_data[clusternumbers==1]['strike'], poles_data[clusternumbers==1]['dip'], 
        'o',c='darkred', markeredgecolor='k', markersize=8)
ax.grid()
plt.show()

Much better! Let's see how it looks on the satellite image!

In [None]:
extent = [-115.7115, -115.6795, 33.5442, 33.5651]
img = plt.imread('data/GoogleEarthImage.png')
plt.figure(figsize=(9,13))
plt.imshow(img, origin='upper', extent=extent)
plt.quiver(poles_data['Lon'],poles_data['Lat'],u,v,clusternumbers,cmap='RdBu'); #5th argument controls arrow color
plt.axis('Off'); #Turn off the plotting axes with tick marks, etc.

It seems that there's something a bit more complicated going on here than just a single fold axis going down the middle, but we can see the broad trend and could probably even draw the axis in a lot of places now.


We can also determine the fold axis quantitatively by finding where the fold planes intersect.

<font color=goldenrod>**_Code for you to write_**</font>

Plot a stereonet of the two clusters of fold-limb planes as planes rather than poles. Make an eye-ball prediction for where the planes (on-average) intersect.

To do this not by-eye We'll us the package `mplstereonet` function `fit_girdle`. Which takes strike and dip measurements as input and outputs the best-fitting intersection point.

In [None]:
fold_axis_strike, fold_axis_dip = mplstereonet.fit_girdle(poles_data['strike'], poles_data['dip'])

#plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='stereonet')
ax.pole(fold_axis_strike, fold_axis_dip, '^',c='k', label='Beta axis (Intersection of Planes)', markersize=10)
ax.set_title('Fold axis')
ax.grid()
plt.show()

To help visualize what this mean let's find the best-fit planes for the two clusters, using `fit_pole`.

In [None]:
limb1_strike, limb1_dip=mplstereonet.fit_pole(poles_data[clusternumbers==0]['strike'], poles_data[clusternumbers==0]['dip'],measurement='poles')
limb2_strike, limb2_dip=mplstereonet.fit_pole(poles_data[clusternumbers==1]['strike'], poles_data[clusternumbers==1]['dip'],measurement='poles')

#plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='stereonet')
ax.plane(limb1_strike, limb1_dip,c='darkblue',label='Fold limb 1')
ax.plane(limb2_strike, limb2_dip,c='darkred',label='Fold limb 2')
ax.pole(fold_axis_strike, fold_axis_dip, '^',c='k', markersize=15,label='Fold axis')
ax.grid()
plt.legend()
plt.show()

So our initial interpretation that there are two 'clusters' of bedding planes in different directions in this dataset, one to the north-east and one to the south-west was correct. But we are able be more quantitative than that by using cluster analysis to determine which poles group together and regression to find the best-fitting fold limb planes.

### Turn in the Notebook

**Export as HTML and upload to bCourses.**