# Earthquake locations and travel times

**This assignment has two main parts:**

1) Revisit the global earthquake catalog, make a map that shows magnitude and make interpretations about where the largest earthquakes happen. [10 points]

2) Plot the seismograph associated with a 2020 earthquake and make interpretations related to seismic wave travel time. [15 points]

**To do these things, you will need to be using the python libraries we have used thus far** (and a new one):

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.dates as mdates
import cartopy.crs as ccrs
from cartopy.io.shapereader import Reader

from geopy import distance

## The global earthquake catalog

Let's use the USGS API to import a global earthquake catalog. Update the start_day and end_day (using year-month-day; e.g. ```'2023-02-06'```) to get a query url that will return earthquakes that occured over the past 10 years.

In [None]:
start_day = ''
end_day = ''
standard_url = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=magnitude'

query_url = standard_url + '&starttime=' + start_day + '&endtime=' + end_day + '&minmagnitude=5.0'
query_url

We can now make a Dataframe called earthquake_data that brings in data from that url.

In [None]:
earthquake_data = pd.read_csv(query_url)
earthquake_data.head()

### Make a map of these earthquake locations

In addition to plotting the earthquake locations, we can plot the location of plate boundaries. I took the plate boundaries provided by the US Geological Survey (USGS) and split them by their categorization into trenches (subduction zones), ridges (spreading centers) and transform (strike-slip boundaries like the San Andreas fault).

The code below makes a map where these different plate boundaries are represented by different color lines.

In [None]:
plt.figure(1,(15,15)) # make a big figure 

ax = plt.axes(projection=ccrs.Robinson(180))
ax.set_global()

ax.coastlines()
ax.gridlines()

data = Reader('./data/Plate_Boundaries_transform.shp')
ax.add_geometries(data.geometries(), crs=ccrs.PlateCarree(), 
                  edgecolor='orange', facecolor='none',
                  linewidth=3)

data = Reader('./data/Plate_Boundaries_trenches.shp')
ax.add_geometries(data.geometries(), crs=ccrs.PlateCarree(), 
                  edgecolor='darkblue', facecolor='none',
                  linewidth=3)

data = Reader('./data/Plate_Boundaries_ridges.shp')
ax.add_geometries(data.geometries(), crs=ccrs.PlateCarree(), 
                  edgecolor='red', facecolor='none',
                  linewidth=3)

plt.title('Map of plate boundaries (red=ridge; blue=trench; orange=transform)')
# make patches to add to a legend
trans = mpatches.Rectangle((0, 0), 1, 1, facecolor="orange")
con = mpatches.Rectangle((0, 0), 1, 1, facecolor="darkblue")
div = mpatches.Rectangle((0, 0), 1, 1, facecolor="red")
labels = ['Transform','Convergent','Divergent']
plt.legend([trans, con, div], labels)

plt.show()

**Make a map where these plate boundaries are shown and the recent earthquake locations are also plotted.**

Use the code that is in the code cell above and add code that also plots the earthquake locations in the code cell below. Look at previous notebooks for the plotting function and remember that you can access columns in a pandas dataframe using ```Dataframe_name['column_name']```

**Make a map that categorizes earthquakes by magnitude.** 

Here is my suggestion for how this can be done:
- Filter the global earthquake catalog dataframe to make a new dataframe that only has earthquakes between magnitude 5 and 6, another that only has earthquakes between magnitude 6 and 7, etc. Make a map in a separate colors/sizes. A dataframe can be filtered using syntax like this where `dataframe` is the name of the dataframe that has your data and `dataframe_new` being whatever name you want to assign to your new dataframe: 

A Dataframe can be filtered by multiple conditions like this. Be sure to follow the formatting in terms of the placement of the `()` and `&`:

`dataframe_new = dataframe[(dataframe['column_a'] >= 5) & (dataframe['column_a'] < 6)]`

In the code cell below, create four filtered dataframes that categorize earthquakes by magnitude:

In [None]:
earthquakes_mag5_6 = ...
earthquakes_mag6_7 = ...
earthquakes_mag7_8 = ...
earthquakes_mag8_9 = ...

Make a map where earthquakes of greater magnitude are plotted with larger symbols. In `plt.scatter` the `s=` parameter can be used to set symbol size. I would recommend making each symbol be twice as large as the previous on (e.g. `s=8` for `earthquakes_mag5` and `s=16` for `earthquakes_mag6`). You should label each.

In [None]:
plt.figure(1,(15,15)) # make a big figure 
ax = plt.axes(projection=ccrs.Robinson())
ax.set_global()

ax.coastlines()
ax.gridlines()

plt.scatter(earthquakes_mag5_6['longitude'],earthquakes_mag5_6['latitude'],
            transform=ccrs.PlateCarree(),s=20,label='magnitude 5 to 6')

# Add plt.scatter() for the rest of the magnitude dataframes. 
# Make sure to label them so that they show up in the legend

plt.legend()        
plt.show()

After you have made such a map, answer this question:
- *At what type of plate boundaries do the largest earthquakes occur?*

**WRITE YOUR ANSWER HERE**


## Analyze an Earthquake Seismogram

An interesting earthquake in 2020 occured 100km SSE of Perryville, Alaska at 55.0683°N 158.5543°W and was a magnitude 7.8 event.

Below is a map of the earthquake location and the location of the Columbia College, Columbia, CA, USA seismic station that recorded a seismograph we will be analyzing.

Go ahead and **add plate boundaries to this map as well.**

In [None]:
# Earthquake location
Earthquake_lat = 55.0683
Earthquake_lon = -158.5543

# Station Location Columbia College, Columbia, CA, USA
station_lat = 38.03455
station_lon = -120.38651

plt.figure(1,(10,10))

ax = plt.axes(projection=ccrs.Orthographic(central_longitude=-130,central_latitude=60))
ax.set_global()

plt.scatter(Earthquake_lon,Earthquake_lat,s=100,marker='*',
            color='red', edgecolor='black',transform=ccrs.PlateCarree())
plt.text(Earthquake_lon+5,Earthquake_lat,'Earthquake',fontsize=14,color='red',
         transform=ccrs.PlateCarree())

plt.scatter(station_lon,station_lat,s=100,marker='^',
            color='green', edgecolor='black',transform=ccrs.PlateCarree())
plt.text(station_lon+5,station_lat,'Columbia College',fontsize=12,color='green',
         transform=ccrs.PlateCarree())

plt.plot([Earthquake_lon,station_lon],[Earthquake_lat,station_lat],
         color='red',transform=ccrs.Geodetic())

ax.coastlines()
ax.stock_img()
ax.gridlines()

plt.show()

*At what type of plate boundary did this earthquake occur?*

**Write your answer here**


More geologic context about this quake can be found here: https://www.iris.edu/hq/files/programs/education_and_outreach/retm/tm_200722_alaska/200722_Alaska.pdf

### Distance between seismograph and earthquake

We can use the Geopy Python package to calculate the distance between the earthquake and the seismic station. The module `distance` of the `geopy` library was  imported at the beginning of this notebook. You can use `distance.distance(location1, location2)` function where each location is (latitude,longitude).

You can read more about this function here: https://geopy.readthedocs.io/en/stable/index.html?highlight=distance#module-geopy.distance

In the code cell below, define the locations, use the `distance.distance()` function to calculate the distance, and then assign the value (in km) to a variable called `earthquake_seismograph_distance_km`.

In [None]:
seismic_station_location = (...,...)  #define the seismic station location (latitude, longitude)
earthquake_location = (...,...) #define the earthquake location (latitude, longitude)
earthquake_seismograph_distance = distance.distance(seismic_station_location, earthquake_location)
earthquake_seismograph_distance_km = earthquake_seismograph_distance.km
print(earthquake_seismograph_distance_km)

### Load a Seismogram of this Earthquake

Let's load the .csv (Comma Separated Variable) data file of this seismogram as recorded at the Columbia College, Columbia, CA, USA seismic station. Samples were taken every 0.025 seconds (40 Hz) and the record starts 60 seconds before the arrival of the first wave which is called the P wave. https://www.iris.edu/app/station_monitor/#2020-07-22T06:12:44/BK-CMB/trace/BK-CMB|11273635

In [None]:
seismogram = pd.read_csv('./data/BK.CMB.00.BHZ.Q.2020-07-22T061756.019538.csv',header=9,names=['Time','Sample'])
seismogram.head()

The `seismogram['Sample']` column is a time series of the velocity of the ground motion at the location of the seismic station due to this earthquake. Assign this column to the variable `velocity` in the code cell below.

In [None]:
velocity = 

Looking at the seismograph dataframe, we can see that the `seismogram['Time']` column is a complicated looking string. We can use the function `pd.to_datetime()` to convert this string to a pandas datetime object. We'll see more datetime features in the future, they are pretty great.

In [None]:
time = pd.to_datetime(seismogram['Time'])

Let's plot the seismogram with `time` on the x-axis and `velocity` on the y-axis.  Fill in the `...` in the code cell below to make the plot and label the axes.

In [None]:
import warnings #There is a warning regarding the AutoDateLocator that we don't need to worry about
warnings.filterwarnings("ignore") #This code will suppress the warning

fig = plt.figure(1,(10,5))
ax = fig.add_subplot()
plt.plot(...,...)
plt.xlim(min(time),max(time))

locator = mdates.AutoDateLocator(minticks=20)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
plt.xlabel(...)
plt.ylabel(...)
plt.title('Seismogram of 2020-07-22 Alaskan earthquake recorded at Columbia College, CA')

ax.grid(True)
plt.show()

Seismographs record the arrival of multiple types of waves. P (primary) waves are compressional waves that arrive first. S (secondary/shear) waves are shear waves that arrive next as illustrated in the example seismograph below. Following the P and S waves are the high amplitude surface waves:

<img src="./figures/seis_wave_travel_time.png">

The seismograph record we are looking at starts approximately 1 minute before the arrival of the P wave. 

Let's define the p time as `p_wave_time` and set it to be 1.05 minutes

In [None]:
p_wave_time = 1.05 #minutes                         

The code cell below is an **incorrect** s wave arrival time (in minutes relative to the start of the record).
It currently notes the arrival of the surface waves, but you will want it to be the arrival of the S wave.

**Use this value for the moment and generate the plot below. Then come back and adjust it to be a better S wave pick**

In [None]:
s_wave_time = 9.1

The code below assigns the arrival time of the P wave and the S wave and does conversions related to time. You can run this code without modifications.

In [None]:
#find the sample rate (sample interval)
#Using mdates.date2num converts the complex time variable to the number of days
factor=24*60*60                                                   #this is the number of seconds in a day
dt=(mdates.date2num(time[1])-mdates.date2num(time[0]))*factor     #sample rate is just difference between two samples
print(f'Sample rate={dt:.3f} seconds/sample')

#convert the ptime, and stime to samples
psamp=np.int64(p_wave_time*60/dt)
ssamp=np.int64(s_wave_time*60/dt)

Using the annotations, we can indicate when the P wave arrived and when the S wave arrived on the seismograph. Take your code from above that plots the seismograph and add this code to also plot annotations:

```
ax.annotate('P wave', (mdates.date2num(time[psamp]), velocity[psamp]), xytext=(-10, 35), 
            textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
ax.annotate('S wave', (mdates.date2num(time[ssamp]), velocity[ssamp]), xytext=(-10, 35), 
            textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
```


In [None]:
# Use this code cell to plot the seismograph with annotated P and S wave arrivals


Once you have made this plot, go back up and adjust the S wave arrival time (which is with respect to the start of the record in fractional minutes). Keep adjusting that value and moving you S wave arrival pick until it as at a spot in the record that you feel happy with. You should be looking for when the record transitions from the amplitude charecteristic of the P wave to a higher amplitude (but an amplitude that is still significantly less than that associated with the surface waves). Look at the example above for guidance.

**Keep adjusting and rerunning the code until you are happy with the S wave pick**

Once you have your s wave pick subtract the `p_wave_time` from the `s_wave_time` and assign that difference to a variable ```seismogram_s_p_difference``` in the code cell below:

## Estimate distance based on S-P time difference

The difference in P and S wave arrival times can be used to determine the distance from the recording station to the earthquake using a travel time curve if we know the velocities of the waves through the Earth.  So first we need to know how these two waves behave — particularly their velocities. Check out this short video demonstration:

https://www.iris.edu/hq/inclass/uploads/videos/A_6_seismictraveltimeirisbounc.mp4

https://www.iris.edu/hq/inclass/animation/traveltime_curves_how_they_are_created

Calculated travel times based on a standard earth models are in the the data folder as `arrival_times.csv`. The time unit is minutes. Let's import them as a dataframe.

In [None]:
travel_times = pd.read_csv('./data/arrival_times.csv')
travel_times.head()

We can make a plot of the travel times.

In [None]:
fig = plt.figure(1,(6,6))
plt.plot(travel_times['degrees_from_quake'],travel_times['P_wave_time'],label='P waves')
plt.plot(travel_times['degrees_from_quake'],travel_times['S_wave_time'],label='S waves')
plt.xlabel('Distance (degrees)', fontsize=14)
plt.ylabel('Time (minutes)', fontsize=14)
plt.legend()
plt.grid()
plt.show()

Now we need the S-P time from the model. Make a new column in the travel_times dataframe that is the difference between the two times. In pandas you can make a new column that is a calculation of other columns. So if you had a column called 'column_b' and one called 'column_a' you could make a new column like this:

```travel_times['new_column'] = travel_times['column_b'] - travel_times['column_a']```

Go ahead and make a new column called ```'S-P_difference'``` that is the difference between the S wave time and the P wave time. Then make a plot of it vs. distance from earthquake.

In [None]:
# Calculate the S-P difference


In [None]:
# Make a plot of S-P difference vs distance


Your picks of P and S wave arrival times above imply a distance between the earthquake location and the seismograph location.

Let's print the full travel_times Dataframe that now has the 'S-P_difference' column. Find the row that corresponds to the time difference you calculated between your P wave and S wave time picks (the `seismogram_s_p_difference` you calculated above).

In [None]:
print(travel_times.to_string())

Assign the angular distance implied by the S wave and P wave time difference to a variable called `ang_deg`

In [None]:
ang_deg = 

You can then convert this angular distance between the earthquake and the seismic station with the equation: 

$d = r \theta $ 

where $d$ is the distance between the two points in kilometers, $r$ is the radius (radius of Earth is 6371 kilometers), and $\theta$ is the angular separation between the points in radians. 

In [None]:
radius = 6371; # earth's radius in kilometers
ang_rad = np.deg2rad(ang_deg) # convert degrees to radians

### Write code here that calculates d using ang_rad and radius
estimated_distance = 

Now print the `estimated_distance` and compare to the `earthquake_seismograph_distance_km` that you calculated above. Calculate the percentage difference between the `estimated_distance` and the `earthquake_seismograph_distance_km`:

How does the estimate of distance compare to the distance you calculated using the `geopy` `distance.distance()`? 

**Write answer here**

If they are very different you may want to reconsider your s wave arrival time pick and recompute.

## Locating an Earthquake

The earthquake locations that we used in the first part of the assignment are determined thorugh computing the distance from the earthquake for at least three stations as illustrated below. 

<img src="./figures/IRIS_eq_tri.png">

### Turn in this notebook

Export your completed notebook to as an html file and upload to bcourses.