# Module 2, Week 2 In Class Exercise

Declustering seismic data.


**Last week we:**
- pandas DataFrames, indexing, and data cleaning.
- Load marine geophysical data (bathymetry and marine magnetic anomalies) from two oceanic ridges.
- Select data and drop rows with NaNs.
- Plot bathymetry data and evaluate spreading rate.
- Declare a function to detrend and filter magnetic anomalie data.
- Plot marine magnetic anomaly data and compare spreading rates.

**Our goals for today:**
- Load a Bay Area seismic catalog.
- Compute the distance and time interval between earthquakes, and use these to indentify aftershocks.
- Remove the aftershocks from the catalog (decluster).


## Setup

Run this cell as it is to setup your environment.

In [None]:
import math
import datetime
import numpy as np
from scipy import stats
import matplotlib
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import pandas as pd

## Gutenberg Richter Earthquake Occurrence Statistics


The frequency of earthquake recurrence as a function of magnitude has been a focus of seismological research since Gutenberg and Richters pioneering work (Gutenberg and Richter, 1949). The evidence shows that the numbers of earthquakes in a given time period scales logarthmically with magnitude. To first order there are 10 times more magnitude 5 earthquakes compared to magnitude 6 events, and 10 times more magnitude 4 earthquakes compared to magnitude 5s.

Gutenberg and Richter found that when the logarithm of the number of earthquakes is plotted vs. magnitude that the distribution mqy be plotted as the line, $log(N)=a+bM$, where N is the number of earthquakes, M is the magnitude and a and b are the slope and intercept of a line. For the example described above the b-value is equal to -1 (there are 10 times fewer earthquakes for an increase of one magnitude unit). An important point to keep in mind that these parameters are based on a primary earthquake catalog in which aftershocks have been removed. The process of aftershock removal is called declustering.

Why is this important? The a- and b-values are often used to characterize the rates of earthquakes to identify regional variability. The b-value (slope parameter) is often used to distinquish between 'normal' and 'swarm-like' earthquake behavior. In geothermal areas it has been observed that the earthquake distribution is richer in small earthquakes indicating a b-value significantly less than -1. 

Gutenberg Richter is also used to characterize seismic hazard in a region by defining the annual rate of earthquake occurrence. In this module you will analyze a earthquake catalog downloaded from the Northern California Earthquake Data Center for a 100 km radius around the Berkeley Campus. You will learn how to decluster the seismicity catalog. In subsequent modules you will estimate the Gutenberg Richter a- and b- values, estimate the annual recurrence rates of large earthquake in the region, and utilize the Gutenberg Richter coefficients and their uncertainty to estimate the strong ground shaking hazard for campus.

## Load the Earthquake Catalog

Load the .csv data file of all the earthquakes 1900 - 2018 in the ANSS (Advanced National Seismic System) catalog from 100 km around Berkeley.

In [None]:
# read data
# This catalog is a M0+ search centered at Berkeley radius=100km. 
# A big enough radius to include Loma Prieta but exclude Geysers.
data=pd.read_csv('anss_catalog_1900to2018all.txt', sep=' ', delimiter=None, header=None,
                 names = ['Year','Month','Day','Hour','Min','Sec','Lat','Lon','Mag'])

#  create data arrays
year=data.Year.values
month=data.Month.values
day=data.Day.values
hour=data.Hour.values
mn=data.Min.values
sec=data.Sec.values
lat=data.Lat.values
lon=data.Lon.values
mag=data.Mag.values
nevt=len(year)        #number of events 


In [None]:
data.head()

In [None]:
#Make a Map of the earthquake catalog

#Set Corners of Map
lat0=36.75
lat1=39.0
lon0=-123.75
lon1=-121.0
tickstep=0.5 #for axes
latticks=np.arange(lat0,lat1+tickstep,tickstep)
lonticks=np.arange(lon0,lon1+tickstep,tickstep)


plt.figure(1,(10,10))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_extent([lon0, lon1, lat0, lat1], crs=ccrs.PlateCarree())
ax.coastlines(resolution='10m',linewidth=1)
ax.set_aspect('auto')
ax.set_xticks(lonticks)
ax.set_yticks(latticks, crs=ccrs.PlateCarree())
ax.set(xlabel='Longitude', ylabel='Latitude',
       title='Raw Catalog')


x=lon
y=lat
z=mag

#Sort Descending to plot largest events on top
indx=...   #determine sort index
x=x[indx]            #apply sort index
y=y[indx]
z=np.exp(z[indx])    #exponent to scale marker size

c = plt.cm.plasma(z/max(z)) # colormap scales with magnitude
plt.scatter(..., ..., s=(z), facecolors=c, alpha=0.4, edgecolors=c, marker='o', linewidth=2) # plot circles on EQs
plt.plot(-122.2727,37.8716,'rs',markersize=8)  # plot red square on Berkeley


plt.show()

## Declustering - window method

For each earthquake in the catalog with magnitude M, the subsequent earthquakes are determined to be aftershocks if they occur within a distance L(M) in km and time interval T(M) in days. An example of aftershock windows from Gardner and Knopoff (1974) is shown below.

<img src="Figures/aftershock_windows.png" width=600>

In [None]:
mag_range = np.arange(2.5,8.5,0.5)
dist_win = [19.5, 22.5, 26, 30, 35, 40, 47, 54, 61, 70, 81, 94]
time_win = [6, 11.5, 22, 42, 83, 155, 290, 510, 790, 915, 960, 985]

# Create a DataFrame of these window bounds




Let's plot the window bounds.

In [None]:
fig = plt.figure(1,(20,8))
grid = plt.GridSpec(1, 2, wspace=0.2, hspace=0.3)

ax0=fig.add_subplot(grid[0,0])
ax1=fig.add_subplot(grid[0,1])

ax0.plot(...,...,'ro');
ax0.set_xlabel('Magnitude ', fontsize=16);
ax0.set_ylabel('Distance, km', fontsize=16);
ax0.grid()


ax1.plot(...,...,'ro');
ax1.set_xlabel('Magnitude ', fontsize=16);
ax1.set_ylabel('Time, days', fontsize=16);
ax1.grid()

An approximation of the aftershock window bounds of Gardner and Knopoff (1974) is shown in the equation below. Using this approximation makes programming the windowing algorithm easier. 

<img src="Figures/window_approx.png" >

Use `np.power(base, exponent)` to compute the distance and time interval bounds as functions of `mag_range`.

In [None]:
d =np.power(...)     # aftershock distance as a function of magnitude
t1=np.power(...) # aftershock time interval as a function of magnitude for M < 6.5
t2=np.power(...) # aftershock time interval as a function of magnitude for M >= 6.5


Add these approximated window bounds to the figure for comparison. 

In [None]:
fig = plt.figure(1,(20,8))
grid = plt.GridSpec(1, 2, wspace=0.2, hspace=0.3)

ax0=fig.add_subplot(grid[0,0])
ax1=fig.add_subplot(grid[0,1])

ax0.plot(mag_range,d,'bs',label='approximation');
ax0.plot(mag_range,dist_win,'ro',label='original window');
ax0.set_xlabel('Magnitude ', fontsize=16);
ax0.set_ylabel('Distance, km', fontsize=16);
ax0.set_title('Aftershock Identification Windows', fontsize=16);
ax0.legend(fontsize=16);
ax0.grid()

ax1.plot(mag_range[:8],t1,'bs');
ax1.plot(mag_range[8:],t2,'bs');
ax1.plot(mag_range,time_win,'ro');
ax1.set_xlabel('Magnitude ', fontsize=16);
ax1.set_ylabel('Time, days', fontsize=16);
ax1.grid()

To build our algorithm to identify aftershocks using these windows we need to convert the year-month-day formate of dates to a timeline in number of days. We'll do this using the function `datetime.date()` which for a given year, month, and day returns a datetime class object, which can be used to compute the time interval in days.

In [None]:
#Determine the number of days from the first event
days=np.zeros(nevt) # initialize the size of the array days
for i in range(0,nevt,1):
    d0 = datetime.date(year[0], month[0], day[0])
    d1 = datetime.date(year[i], month[i], day[i])
    delta = d1 - d0
    days[i]=delta.days # fill days in with the number of days since the first event (7/1/1911)


In [None]:
# plot magnitude vs. time
fig, ax = plt.subplots(figsize=(10,10))
ax.plot(days, mag,'o',alpha=...,markersize=5)
ax.set(xlabel='Days', ylabel='Magnitude',
       title='Raw Event Catalog')
ax.grid()

fig.savefig("figure1.png")
plt.show()

print(f'Number={nevt:d} MinMag={min(mag):.2f} MaxMag={max(mag):.2f}')

We also need a function to compute the great circle distance in km between earthquakes. We'll use the haversine formula for the great circle distance which is works well conditioned for small distances.

<img src="Figures/great_circle_eqn.png" >


<img src="Figures/Illustration_of_great-circle_distance.svg" >
Great-circle distance shown in red between two points on a sphere, P and Q. 
Source: https://en.wikipedia.org/wiki/Great-circle_distance

In [None]:
#This function computes the spherical earth distance between to geographic points and is used in the
#declustering algorithm below
def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)

    All args must be of equal length.
    
    The first pair can be singular and the second an array

    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2]) # convert degrees lat, lon to radians

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = ...  # great circle inside sqrt

    c = ...   # great circle angular separation
    km = 6371.0 * c   # great circle distance in km, earth radius = 6371.0 km
    return km

__Declustering Algorithm Time!__

We'll build our `for` loop for indentifying aftershocks in the seismic catalog.

In [None]:
#Decluster the Catalog  Note: This cell may take a few minute to complete
cnt=0 # initialize a counting variable
save=np.zeros((1,10000000),dtype=int) # initialize a counting variable
for i in range(0,nevt,1):   # step through EQ catalog
    # logical if statements to incorporate definitions of Dtest and Ttest aftershock window bounds
    Dtest=...   # distance bounds
    if mag[i] >= 6.5:
        Ttest=...  # aftershock time bounds for M >= 6.5
    else:
        Ttest=...  # aftershock time bounds for M < 6.5
    
    ...    # time interval in days to subsequent earthquakes in catalog
    ...   # magnitudes of subsequent earthquakes in catalog
    ... # distance in km to subsequent EQs in catalog
    
    ...   # counts the number of potential aftershocks, 
                                        # the number of intervals <= Ttest bound
    ...  # if there are potential aftershocks
        ... # indices of potential aftershocks <= Ttest bound
        ...   # loops over the aftershocks         
            ... # test if the event is inside the distance window 
                                                # and that the event is smaller than the current main EQ
                ...  # index value of the aftershock
                ... # increment the counting variable

                
...   # This is an array of indexes that will be used to delete events flagged 
                                      # as aftershocks    


Use `np.delete(array,indices_to_delete)` to delete the aftershock events.

In [None]:
# delete the aftershock events
declustered_days=np.delete(days,af_ind)  #The aftershocks are deleted from the days array 
declustered_mag=np.delete(mag,af_ind)    #The aftershocks are deleted from the mag array 
declustered_lon=np.delete(lon,af_ind)    #The aftershocks are deleted from the lon array 
declustered_lat=np.delete(lat,af_ind)    #The aftershocks are deleted from the lat array 
n=len(declustered_days)
  

In [None]:
#Plot DeClustered Catalog
fig, ax = plt.subplots(figsize=(10,10))
ax.plot(..., ...,'o',alpha=0.2,markersize=5)
ax.set(xlabel='days', ylabel='magnitude',
       title='Declustered Event Catalog')
ax.grid()

plt.show()

print(f'Number={n:d} MinMag={min(declustered_mag):.2f} MaxMag={max(declustered_mag):.2f}')

Reminder of the raw catalog:

<img src="figure1.png" >

In [None]:
# Plot comparison between raw and declustered catalogs
fig, ax = plt.subplots(figsize=(10,10))
ax.plot(..., ...,'o',alpha=...,label='Raw events',markersize=5)
ax.plot(..., ...,'o',alpha=...,label='Main shocks',markersize=6)
ax.set(xlabel='days', ylabel='magnitude',
       title='Declustered Event Catalog')
ax.grid()
ax.legend()

plt.show()

At approximately what dates (in day since first EQ) do you observe aftershock sequences?

_Write your answer here._

When are the aftershock sequence (and events in general) missing from this catalog?

_Write your answer here._

In [None]:
#Make a Map of Main shock events

#Set Corners of Map
lat0=36.75
lat1=39.0
lon0=-123.75
lon1=-121.0
tickstep=0.5 #for axes
latticks=np.arange(lat0,lat1+tickstep,tickstep)
lonticks=np.arange(lon0,lon1+tickstep,tickstep)

plt.figure(1,(10,10))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_extent([lon0, lon1, lat0, lat1], crs=ccrs.PlateCarree())
ax.set_aspect('auto')
ax.coastlines(resolution='10m',linewidth=1) #downloaded 10m, 50m
ax.set_xticks(lonticks)
ax.set_yticks(latticks, crs=ccrs.PlateCarree())
ax.set(xlabel='Longitude', ylabel='Latitude',
       title='Declustered Catalog')


x=declustered_lon
y=declustered_lat
z=declustered_mag

#Sort Descending to plot largest events on top
indx=np.argsort(z)   #determine sort index
x=x[indx]            #apply sort index
y=y[indx]
z=np.exp(z[indx])    #exponent to scale size

c = plt.cm.plasma(z/max(z))
plt.scatter(..., ..., s=(z/2), facecolors=c, alpha=0.4, edgecolors=c, marker='o', linewidth=2)
plt.plot(-122.2727,37.8716,'rs',markersize=8)


plt.show()

<img src="Figures/fault_map.png" width=700>
Map of Bay Area faults. 
Source: https://pubs.er.usgs.gov/publication/fs20163020

What faults have been active since 1911?

_Write your answer here._