# Module 3, Week 2 In Class Exercise

Least-squares linear regression

**Before class reading: Mathematics in Geology Sections 6.1.1 - 6.2.7 & 7.3.3, Linear Algebra and Its Applications Section 6.5  **

**Last week we:**
- Learn how to deal with bivariate data (fitting lines, curves).
- Apply line fitting to determine the spreading rate of various ocean ridges

**Our goals for today:**
- Least-squares problems in matrix form
- Computational efficency of solving linearized equations with loops, vectors, and built-ing functions
- Least-squares fitting of the Gutenberg-Richter law Bay Area catalog


## Setup

Run this cell as it is to setup your environment.

In [2]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import math
import datetime
import time

## A Little Background on Fitting Geophysical Models to Data

**What is a model?**

**What is inversion/regression in Earth Science?**
- statistical sampling (Bayesian Inferance, Monte Carlo, Genetic Algorithm, Particle Swarm (flocking), etc.
- linearized inversion (start with a solution and interatively update the solution solving for derivatives of model parameters)
- linear inversion

**Examples of inversion/regression**

<center> <h4>Linear Regression</h4> </center>

<img src="./example_linear_regression.png"><br/><br/><br/>

<center> <h4>Fitting a Non-Linear Function and Data</h4> </center>

<img src="./example_gmpe_fit.png"><br/><br/><br/>

<center> <h4>A Case Where the Model is Comprised of Synthetic Seismograms</h4> </center>

<img src="./example_moment_tensor_long_valley.png"><br/><br/><br/>

<center> <h4>Fitting a Variety of Geophysical Data for the Earthquake Source</h4> </center>

<img src="./example_finite_source.png"><br/><br/><br/>

In all of these examples the same basic method is used to fit the models to the data, although the time of data and the complexity of the models varies.

**Least Squares as a method to perform inversion/regression i.e. fit a model to data**

#### a+b*z=y(z)

a + b*1.2= 3.2

a + b*3.0= 5.0

a - b*2.0=-1.0

etc.

#### Am=D

How do we solve this system of linear equations? 

$\mathbf {m=(A^T A)^{-1} A^T D}$

What do the "T"s and "-1"s mean?

## First some basic operations in Matrix Algebra

A matrix is a rectangular array of elements.

$A=\begin{bmatrix}
a_{1} &  a_{2}\\ 
a_{3} &  a_{4}
\end{bmatrix}$



In [2]:
M=np.random.randint(1,9,4).reshape(2,2) #arguments low, high, number - returns an array - reshape makes matrix
M


array([[2, 7],
       [7, 3]])

In [3]:
A=np.random.randint(1,9,4).reshape(2,2)
A

array([[5, 2],
       [6, 7]])

Basic operators:

In [4]:
# addition
M+A

array([[ 7,  9],
       [13, 10]])

In [5]:
# subtraction
M-A

array([[-3,  5],
       [ 1, -4]])

In [6]:
# elementwise multiplication
M*A


array([[10, 14],
       [42, 21]])

The _transpose_ of a matrix is found by swapping the rows and columns. The diagonal elements remain the same (for a square matrix).

$A^{T}=\begin{bmatrix}
a_{1} &  a_{3}\\ 
a_{2} &  a_{4}
\end{bmatrix}$

In [7]:
def matrix_transpose(M):
    m,n = np.shape(M) # m is number of rows, n is number of columns
    M_transpose = np.zeros((n,m))
    for i in range(0,m):
        for j in range(0,n):
            M_transpose[j][i] = ...
            
    return M_transpose

In [8]:
matrix_transpose(A)

TypeError: float() argument must be a string or a number, not 'ellipsis'

Multiplication:

In [9]:
# multiplcation by a scalar
A*6

array([[30, 12],
       [36, 42]])

Matrix multiplication is a bit more complicated $AB=C$ = $C_{i,j}=\Sigma_{k=1}^{n} A_{i,k}B_{k,j}$. Note that the product will have the same number of rows as $A$ and columns as $B$, and that $A$ needs to have as many columns and $B$ has rows.

$A=\begin{bmatrix}
a_{1} &  a_{2}\\ 
a_{3} &  a_{4}
\end{bmatrix}$

$B=\begin{bmatrix}
b_{1} &  b_{2}\\ 
b_{3} &  b_{4}
\end{bmatrix}$

$C=\begin{bmatrix}
(a_{1} \times b_{1}+a_{2} \times b_{3}) &  (a_{1} \times b_{2}+a_{2} \times b_{4})\\ 
(a_{3} \times b_{1}+a_{4} \times b_{3}) &  (a_{3} \times b_{2}+a_{4} \times b_{4})
\end{bmatrix}$



The following table relates the matrix notation above to a coding notation. Lets finish writing the code for the matrix multiply below. Note that there is a triple nested for loop. The index i increments the slowest, j increments faster, and k increments the fast (for each j).

<img src="./matrix_multiply_recursion_table.png"><br/><br/><br/>

In [None]:
def matrix_mult(A,B):
    m1,n1 = np.shape(A) # m is number of rows, n is number of columns
    m2,n2 = np.shape(B) # m is number of rows, n is number of columns
    C = np.zeros((m1,n2))
    if n1 == m2:
        for i in range(0,m1):
            for j in range(0,n2):
                for k in range(0,n1):
                    C[i][j] = ...
            
        return C

    else:
        print('Number of columns in matrix 1 does not match number of rows in matrix 2')
        return

In [None]:
matrix_mult(A,M)

A matrix raised to a power ($M^{2}$) follows the rules of matrix multiplication $M^{2}=\Sigma_{k=1}^{n} M_{i,k}M_{k,j}$. This only works for square matrices.

The _determinate_ of a 2x2 matrix is equal to $|A| = \begin{vmatrix}
a_{1} &  a_{2}\\ 
a_{3} &  a_{4}
\end{vmatrix} = a_{1}a_{4} - a_{2}a_{3}$.

In [None]:
def matrix_det(M):
    m,n = np.shape(M) # m is number of rows, n is number of columns
    if m == 2 & n == 2:
        det = ...
        return det
    else:
        print('Not a 2x2 matrix')
        return

In [None]:
matrix_det(M)

Division in matrix algbra is $A/M = M^{-1}A$ where $M^{-1}$ is the _inverse_ of M. $MM^{-1} = I$ where $I=\begin{bmatrix}
1 &  0\\ 
0 &  1
\end{bmatrix}$ is the _identity_ matrix, $AI = A$. For a 2x2 matrix the inverse is:

<img src="inverse_2x2.png" width=200>
>Source: Mathematics in Geology, J. Ferguson.

Not all matrices are invertable, for example if $|A|=0$.

In [None]:
def matrix_inv(M):
    m,n = np.shape(M) # m is number of rows, n is number of columns
    inv = np.zeros((m,n))
    if m == 2 & n == 2:
        inv[0][0] = .../matrix_det(M)
        inv[0][1] = .../matrix_det(M)
        inv[1][0] = .../matrix_det(M)
        inv[1][1] = .../matrix_det(M)
        return inv
    else:
        print('Not a 2x2 matrix')
        return

In [None]:
matrix_inv(M)

A series of linear equations $a + b*{x_1} = d_{1}$, $a + b*{x_2} = d_{2}$, ... $a + b*{x_n} = d_{n}$ where $n$ is the number of data can be described with the series of matrices: $\begin{bmatrix}
1 &  x_{1}\\ 
1 &  x_{2}\\ 
. & . \\
. & . \\
. & . \\
1 &  x_{n}
\end{bmatrix} \begin{bmatrix}
a \\
b
\end{bmatrix} = \begin{bmatrix}
d_{1} \\ d_{2} \\ . \\ . \\ . \\ d_{n} \\
\end{bmatrix}$ = $Am = D$

The least-squares approximation can be formulated as:

$m = (A{^T}A)^{-1}A{^T}D$.

The least-squares approximation minimizes the squared difference between the observations and the model where the prediction


### Lets formulate the least squares inversion for the line fitting example from last week

In [None]:
# make up some randomly scattered linearly related data
x = np.random.randn(100)*5
m = 2
b = np.random.rand(100)*10
y = m*x+b


m, b = np.polyfit(x,y,1)
modelY=np.polyval([m, b],x)

print ('slope: %7.3f, intercept: %4.1f'%\
    (m, b))

# now plot the data and the best-fit line 
plt.figure(1,(5,5)) 
plt.plot(x,y,'o')
plt.plot(x,modelY,'k-') 
plt.xlabel('X', fontsize=16);
plt.ylabel('Y', fontsize=16);
plt.grid()

In [None]:
#Now write out the least squares matrix equation and invert for a and b of the line

D=y.reshape(len(y),1)
tmp=np.ones(len(x))

A=np.column_stack((tmp,x))
#A[1:5,:]

invATA=matrix_inv(matrix_mult(matrix_transpose(A),A))
ATD=matrix_mult(matrix_transpose(A),D)
m=matrix_mult(invATA,ATD)

print ('slope: %7.3f, intercept: %4.1f'%\
    (m[1], m[0]))




### Gutenberg Richter Earthquake Occurrence Statistics


Gutenberg and Richter found that when the logarithm of the number of earthquakes is plotted vs. magnitude that the distribution is linear, and a suitable model is log(N)=A+Bm, where N is the number of earthquakes, m is the magnitude and A and B are the slope and intercept of a line. For the example described above the B-value is equal to -1 (there are 10 times fewer earthquakes for an increase of one magnitude unit). An important point to keep in mind that these parameters are based on a primary earthquake catalog in which aftershocks have been removed. The process of aftershock removal is called declustering.

Why is this important? The A- and B-values are often used to characterize the rates of earthquakes to identify regional variability. The B-value (slope parameter) is often used to distinquish between 'normal' and 'swarm-like' earthquake behavior. In geothermal areas it has been observed that the earthquake distribution is richer in small earthquakes indicating a B-value significantly less than -1. 

Gutenberg Richter is also used to characterize seismic hazard in a region by defining the annual rate of earthquake occurrence. In this module you will analyze a earthquake catalog downloaded from the Northern California Earthquake Data Center for a 100 km radius around the Berkeley Campus. You will estimate the Gutenberg Richter A- and B- values, and estimate the annual recurrence rates of large earthquake in the region.

### Load and decluster earthquake catalog

In [None]:
# read data
# This catalog is a M0+ search centered at Berkeley radius=100km. 
# A big enough radius to include Loma Prieta but exclude Geysers.
all_events_data=pd.read_csv('anss_catalog_1900to2018all.txt', sep=' ', delimiter=None, header=None,
                 names = ['Year','Month','Day','Hour','Min','Sec','Lat','Lon','Mag'])

#  create data arrays
AE_year=all_events_data.Year.values
AE_month=all_events_data.Month.values
AE_day=all_events_data.Day.values
AE_hour=all_events_data.Hour.values
AE_mn=all_events_data.Min.values
AE_sec=all_events_data.Sec.values
AE_lat=all_events_data.Lat.values
AE_lon=all_events_data.Lon.values
AE_mag=all_events_data.Mag.values
n_tot=len(AE_year)        #number of events 




In [None]:
#Determine the number of days from the first event
AE_days=np.zeros(n_tot) # initialize the size of the array days
for i in range(0,n_tot,1):
    d0 = datetime.date(AE_year[0], AE_month[0], AE_day[0])
    d1 = datetime.date(AE_year[i], AE_month[i], AE_day[i])
    delta = d1 - d0
    AE_days[i]=delta.days # fill days in with the number of days since the first event (7/1/1911)

In [None]:
#This function computes the spherical earth distance between to geographic points and is used in the
#declustering algorithm below
def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)

    All args must be of equal length.
    
    The first pair can be singular and the second an array

    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2]) # convert degrees lat, lon to radians

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2  # great circle inside sqrt

    c = 2 * np.arcsin(np.sqrt(a))   # great circle angular separation
    km = 6371.0 * c   # great circle distance in km, earth radius = 6371.0 km
    return km

In [None]:
#Decluster the Catalog  Note: This cell may take a few minute to complete
cnt=0 # initialize a counting variable
save=np.zeros((1,10000000),dtype=int) # initialize a counting variable
for i in range(0,n_tot,1):   # step through EQ catalog
    # logical if statements to incorporate definitions of Dtest and Ttest aftershock window bounds
    Dtest=np.power(10,0.1238*AE_mag[i]+0.983)   # distance bounds
    if AE_mag[i] >= 6.5:
        Ttest=np.power(10,0.032*AE_mag[i]+2.7389)  # aftershock time bounds for M >= 6.5
    else:
        Ttest=np.power(10,0.5409*AE_mag[i]-0.547)  # aftershock time bounds for M < 6.5
    
    a=AE_days[i+1:n_tot]-AE_days[i]    # time interval in days to subsequent earthquakes in catalog
    m=AE_mag[i+1:n_tot]   # magnitudes of subsequent earthquakes in catalog
    # distance in km to subsequent EQs in catalog
    b=haversine_np(AE_lon[i],AE_lat[i],AE_lon[i+1:n_tot],AE_lat[i+1:n_tot]) 
    
    icnt=np.count_nonzero(a <= Ttest)   # counts the number of potential aftershocks, 
                                        # the number of intervals <= Ttest bound
    if icnt > 0:  # if there are potential aftershocks
        itime=np.array(np.nonzero(a <= Ttest)) + (i+1) # indices of potential aftershocks <= Ttest bound
        for j in range(0,icnt,1):   # loops over the aftershocks         
            if b[j] <= Dtest and m[j] < AE_mag[i]: # test if the event is inside the distance window 
                                                # and that the event is smaller than the current main EQ
                save[0][cnt]=itime[0][j]  # index value of the aftershock
                cnt += 1 # increment the counting variable

                
AF_ind=np.delete(np.unique(save),0)   # This is an array of indexes that will be used to delete events flagged 
                                      # as aftershocks    

In [None]:
# delete the aftershock events
declustered_days=np.delete(AE_days,AF_ind)  #The aftershocks are deleted from the days array 
declustered_mag=np.delete(AE_mag,AF_ind)    #The aftershocks are deleted from the mag array 
n_main=len(declustered_days)

# select the aftershock events
aftershock_days=AE_days[AF_ind]  #The aftershocks are selected from the days array 
aftershock_mag=AE_mag[AF_ind]    #The aftershocks are selected from the mag array 
n_after=len(aftershock_days)

Now that we have three earthquake catalogs (raw, declustered, and aftershocks) we will set up the magnitude and $log_{10}$(Number of events per year) arrays which will be fitting.

In [None]:
# the observed log10 number of events per year as function of magnitude (data)
min_mag=np.min(AE_mag)
max_mag=np.max(AE_mag)
m_all_events=np.arange(...,...,0.1)
N_all_events=np.zeros(len(m_all_events))
numyr=(max(AE_days)-min(AE_days))/365
for i in range(0,len(m_all_events),1):
    N_all_events[i]=np.log10(np.count_nonzero(...)/numyr)
    

min_mag=np.min(declustered_mag)
max_mag=np.max(declustered_mag)
m_declustered=...
N_declustered=np.zeros(len(m_declustered))
numyr=(max(declustered_days)-min(declustered_days))/365
for i in range(0,len(m_declustered),1):
    N_declustered[i]=...    

min_mag=np.min(aftershock_mag)
max_mag=np.max(aftershock_mag)  
m_aftershock=n...
N_aftershock=np.zeros(len(m_aftershock))
numyr=(max(aftershock_days)-min(aftershock_days))/365
for i in range(0,len(m_aftershock),1):
    N_aftershock[i]=...

In [None]:
# plot the observed relationship between M and log10N
plt.figure(1,(10,10))
plt.plot(m_all_events,N_all_events,'o',color='red',label='Catalog of All Events');
plt.plot(m_declustered,N_declustered,'o',color='green',label='Declustered Catalog');
plt.plot(m_aftershock,N_aftershock,'o',color='mediumblue',label='Aftershock Catalog');
plt.xlim(0, 7);
plt.ylim(-2.1, 3);
plt.xlabel('Magnitude', fontsize=16);
plt.ylabel('Number of earthquakes, $log_{10}$ N', fontsize=16);
plt.legend(fontsize=16)
plt.grid()

Solve the least squares problem for the a-value and b-value of the Gutenberg Richter law: $log(N)=a+bm$. 

$\begin{bmatrix}
1 & m_{1} \\ 
1 & m_{2} \\ 
. & . \\
. & . \\
. & . \\
1 & m_{n}
\end{bmatrix} \begin{bmatrix}
a \\
b
\end{bmatrix} = \begin{bmatrix}
log(y_{1}) \\ log(y_{2}) \\ . \\ . \\ . \\ log(y_{n}) \\
\end{bmatrix} = A M= log(N)$. 

The least squares solution for the model parameters is $M = (A{^T}A)^{-1}A{^T}log(N)$. We'll also keep track of how long it takes the computer to solve for the model parameters with `time.time()` which returns the current time.

In [None]:
# Solve for Model Parameters
# Catalog of all events
A_all_events = np.column_stack((np.ones(len(m_all_events)), m_all_events))
trans_A_all_events = matrix_transpose(A_all_events)
ATA_all_events = matrix_mult(trans_A_all_events,A_all_events)
ATN_all_events = matrix_mult(trans_A_all_events,matrix_transpose([N_all_events]))
inv_all_events = matrix_inv(ATA_all_events)
soln_all_events = matrix_mult(inv_all_events,ATN_all_events)
x_all_events = m_all_events
y_all_events = matrix_mult(A_all_events,soln_all_events)


print(A_all_events)



In [None]:
# Declustered events
A_declustered = np.column_stack(..., ...)
trans_A_declustered = matrix_transpose(...)
ATA_declustered = matrix_mult(...,...)
ATN_declustered = matrix_mult(...,matrix_transpose([...]))
inv_declustered = matrix_inv(...)
soln_declustered = matrix_mult(...)
x_declustered = m_declustered
y_declustered = matrix_mult(...)


# Aftershock events
A_aftershock = ...
trans_A_aftershock = ...
ATA_aftershock = ...
ATN_aftershock = ...
inv_aftershock = ...
soln_aftershock = ...
x_aftershock = ...
y_aftershock = ...

In [None]:
# Plot results
fig = plt.figure(1,(20,5))
grid = plt.GridSpec(1, 3, wspace=0.4, hspace=0.3)

ax0=fig.add_subplot(grid[0,0])
ax1=fig.add_subplot(grid[0,1])
ax2=fig.add_subplot(grid[0,2])

ax0.plot(m_all_events,N_all_events,'o',color='red');
ax0.plot(x_all_events,y_all_events,'k-');
ax0.set_xlim(0, 7);
ax0.set_ylim(-2.1, 3);
ax0.text(4,2.5,'A value: %7.3f'%(soln_all_events[0]), fontsize=16)
ax0.text(4,2.2,'B value: %7.3f'%(soln_all_events[1]), fontsize=16)
ax0.set_xlabel('Magnitude', fontsize=16);
ax0.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax0.set_title('Catalog of All Events', fontsize=16);
ax0.grid()

ax1.plot(m_declustered,N_declustered,'o',color='green');
ax1.plot(x_declustered,y_declustered,'k-');
ax1.set_xlim(0, 7);
ax1.set_ylim(-2.1, 3);
ax1.text(4,2.5,'A value: %7.3f'%(soln_declustered[0]), fontsize=16)
ax1.text(4,2.2,'B value: %7.3f'%(soln_declustered[1]), fontsize=16)
ax1.set_xlabel('Magnitude', fontsize=16);
ax1.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax1.set_title('Declustered Catalog', fontsize=16);
ax1.grid()

ax2.plot(m_aftershock,N_aftershock,'o',color='mediumblue');
ax2.plot(x_aftershock,y_aftershock,'k-');
ax2.set_xlim(0, 7);
ax2.set_ylim(-2.1, 3);
ax2.text(4,2.5,'A value: %7.3f'%(soln_aftershock[0]), fontsize=16)
ax2.text(4,2.2,'B value: %7.3f'%(soln_aftershock[1]), fontsize=16)
ax2.set_xlabel('Magnitude', fontsize=16);
ax2.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax2.set_title('Aftershock Catalog', fontsize=16);
ax2.grid()

## Numpy Linear Algebra Functions

There are many numpy functions that simplify these matrix operations!

Basic operations: `np.add(A,B)`, `np.subtract(A,B)`, `np.divide(A,B)`

Matrix multiply: ` A@B`, `np.dot(A,B)`

Transpose of matrix $A$: `np.transpose(A)`

Reshaping and combining arrays into matrices: `A.flatten`, `np.hstack((a,b))`, `np.vstack((a,b))`,`np.hsplit(a,2)`, and `np.vsplit(a,2)`

Determinate: `np.linalg.det(A)`

Inverse: `np.linalg.inv(A)`

Solve inverse problem: `np.linalg.solve(A,b)`

### For the declustered catalog we want to determine the Gutenberg-Richter parameters for the part of the catalog that is considered complete (M>= magnitude of completeness)

#### Regenerate the declustered catalog considering only magnitudes greater than the magnitude of completeness

In [None]:
#Write code hear to recompute the A_declustered matrix considering only M greater than the magnitude of completeness
min_mag=...
max_mag=np.max(declustered_mag)
m_declustered=np.arange(min_mag,max_mag,0.1)
N_declustered=np.zeros(len(m_declustered))
numyr=(max(declustered_days)-min(declustered_days))/365
for i in range(0,len(m_declustered),1):
    N_declustered[i]=np.log10(np.count_nonzero(declustered_mag >= m_declustered[i])/numyr) 

A_declustered = np.column_stack((np.ones(len(m_declustered)), m_declustered))

In [None]:
# Solve for Model Parameters
# Catalog of all events
trans_A_all_events = np.transpose(A_all_events)
ATA_all_events = trans_A_all_events@A_all_events
ATN_all_events = trans_A_all_events@np.transpose([N_all_events])
soln_all_events =np.linalg.solve(ATA_all_events,ATN_all_events)
x_all_events = m_all_events
y_all_events = A_all_events@soln_all_events


# Declustered events
trans_A_declustered = ...
ATA_declustered = ...
ATN_declustered = ...
soln_declustered = ...
x_declustered = ...
y_declustered = ...


# Aftershock events
trans_A_aftershock = ...
ATA_aftershock = ...
ATN_aftershock = ...
soln_aftershock = ...
x_aftershock = ...
y_aftershock = ...



In [None]:
# Plot results
fig = plt.figure(1,(20,5))
grid = plt.GridSpec(1, 3, wspace=0.4, hspace=0.3)

ax0=fig.add_subplot(grid[0,0])
ax1=fig.add_subplot(grid[0,1])
ax2=fig.add_subplot(grid[0,2])

ax0.plot(m_all_events,N_all_events,'o',color='red');
ax0.plot(x_all_events,y_all_events,'k-');
ax0.set_xlim(0, 7);
ax0.set_ylim(-2.1, 3);
ax0.text(4,2.5,'A value: %7.3f'%(soln_all_events[0]), fontsize=16)
ax0.text(4,2.2,'B value: %7.3f'%(soln_all_events[1]), fontsize=16)
ax0.set_xlabel('Magnitude', fontsize=16);
ax0.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax0.set_title('Catalog of All Events', fontsize=16);
ax0.grid()

ax1.plot(m_declustered,N_declustered,'o',color='green');
ax1.plot(x_declustered,y_declustered,'k-');
ax1.set_xlim(0, 7);
ax1.set_ylim(-2.1, 3);
ax1.text(4,2.5,'A value: %7.3f'%(soln_declustered[0]), fontsize=16)
ax1.text(4,2.2,'B value: %7.3f'%(soln_declustered[1]), fontsize=16)
ax1.set_xlabel('Magnitude', fontsize=16);
ax1.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax1.set_title('Declustered Catalog', fontsize=16);
ax1.grid()

ax2.plot(m_aftershock,N_aftershock,'o',color='mediumblue');
ax2.plot(x_aftershock,y_aftershock,'k-');
ax2.set_xlim(0, 7);
ax2.set_ylim(-2.1, 3);
ax2.text(4,2.5,'A value: %7.3f'%(soln_aftershock[0]), fontsize=16)
ax2.text(4,2.2,'B value: %7.3f'%(soln_aftershock[1]), fontsize=16)
ax2.set_xlabel('Magnitude', fontsize=16);
ax2.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax2.set_title('Aftershock Catalog', fontsize=16);
ax2.grid()

## Built in Least-squares solvers

As we learned last week there are also built in least-squares solvers: `np.linalg.lstsq(A,B)`, `np.polyfit()`, `np.polyval()`, `stats.linregress`, and `statsmodels.OLS()`.

In [None]:
# Solve for Model Parameters
# Catalog of all events
soln_all_events =np.polyfit(...,...,1) # np.polyfit(x,y,order)
x_all_events = m_all_events
y_all_events = np.polyval(...,...) # np.polyval(model_params,x)


# Declustered events
soln_declustered = ...
x_declustered = ...
y_declustered = ...


# Aftershock events
soln_aftershock = ...
x_aftershock = ...
y_aftershock = ...


In [None]:
# Plot results
fig = plt.figure(1,(20,5))
grid = plt.GridSpec(1, 3, wspace=0.4, hspace=0.3)

ax0=fig.add_subplot(grid[0,0])
ax1=fig.add_subplot(grid[0,1])
ax2=fig.add_subplot(grid[0,2])

ax0.plot(m_all_events,N_all_events,'o',color='red');
ax0.plot(x_all_events,y_all_events,'k-');
ax0.set_xlim(0, 7);
ax0.set_ylim(-2.1, 3);
ax0.text(4,2.5,'A value: %7.3f'%(soln_all_events[1]), fontsize=16)
ax0.text(4,2.2,'B value: %7.3f'%(soln_all_events[0]), fontsize=16)
ax0.set_xlabel('Magnitude', fontsize=16);
ax0.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax0.set_title('Catalog of All Events', fontsize=16);
ax0.grid()

ax1.plot(m_declustered,N_declustered,'o',color='green');
ax1.plot(x_declustered,y_declustered,'k-');
ax1.set_xlim(0, 7);
ax1.set_ylim(-2.1, 3);
ax1.text(4,2.5,'A value: %7.3f'%(soln_declustered[1]), fontsize=16)
ax1.text(4,2.2,'B value: %7.3f'%(soln_declustered[0]), fontsize=16)
ax1.set_xlabel('Magnitude', fontsize=16);
ax1.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax1.set_title('Declustered Catalog', fontsize=16);
ax1.grid()

ax2.plot(m_aftershock,N_aftershock,'o',color='mediumblue');
ax2.plot(x_aftershock,y_aftershock,'k-');
ax2.set_xlim(0, 7);
ax2.set_ylim(-2.1, 3);
ax2.text(4,2.5,'A value: %7.3f'%(soln_aftershock[1]), fontsize=16)
ax2.text(4,2.2,'B value: %7.3f'%(soln_aftershock[0]), fontsize=16)
ax2.set_xlabel('Magnitude', fontsize=16);
ax2.set_ylabel('Number of eathquakes, $log_{10}$ N', fontsize=16);
ax2.set_title('Aftershock Catalog', fontsize=16);
ax2.grid()

How did the B-value for the Bay Area change between the raw and declustered catalogs?

_Write your answer here._

Is the slope (B-value) of the model line for aftershocks steeper or shallower than that for the declustered catalog? Does this indicate there are more or fewer small aftershock events relative to larger events? Is this intuitive? 

_Write your answer here._

In the declustered catalog what is the predicted occurance rate of M6+ earthquakes for the Bay Area? What is the log(N) value i.e. `y_declustered` for M6? Raise 10 to the power of `y_declustered` at M6 to find the number of M6+ earthquakes per year. Divide 1 by this power of 10 to find the number of years per earthquake.

In [None]:
#Compute your answer here.