# Seismic Data Processing and HVSR

# Introduction

Seismology is one of the oldest and largest branches of geophysics. 
Seismic data is somewhat unique in its breadth of application: from planet-scale studies to hyper-local site characterization. In large part because of the global nature of seismic phenomena and because of the highly destructive potential of earthquakes, seismic data is highly organized and is shared globally. Countries develop their own seismometer networks, the data from which are often available to be downloaded or "streamed" online. Seismic data has many national security implications: it is used to detect nuclear tests, monitor border crossings, and explore for mineral or oil resources.

Seismic data (as well as HVSR specifically, which we will focus on here) is also one of the only types of data collected on Earth, the Moon, and Mars.

A basic understanding of seismic data, processing, and terminology is often expected in most roles having to do with geopysics.
Environmental applications of seismic geophysical data are not as common as, for example, GPR and ERT.
Understanding how to work with seismic data is still important for those working in environmental geophysics for a number of reason:
* Seismic data and terminology can be used in many environmental applications
* Many of the processing techniques used in GPR are directly analogous to (if not exactly the same as) techniques developed for seismology
* Seismic data have many near-surface applications that will often overlap with environmental investigations

## Seismic Data

Seismic data is at its core time-series data, and many of the processing techniques used in seismology are analagous to (if not exactly the same as) techniques used in and developed for broader signal processing algorithms.

You should have an understanding of basic seismic terminology, but some terms worth emphasizing are included below:

### General seismological terms:

* **Seismometer**: an instrument that measures seismic data
* **Seismic Network**: a collection of seismic stations, often managed by a single entity and with a singular purposes
* **Seismic Station**: a seismometer (or multiple seismometers) that has been given a specific identifying code. Often, this is formatted as follows:
    * NETWORK_NAME.STATION_NAME.LOCATION.CHANNEL
* **Channel**: Often, the basic data feed or measurement coming out of a seismometer (or similar instrument). Many seismometer have more than one channel.
    * These channels often consist of data read from a single geophone
    * Among the most common configurations is a three-component seismometer: a vertical geophone, a horizontal geophone facing "East", and an orthogonal horizontal geophone facing "north"
* **Geophone**: a device contained within a seismometer that measures ground motion
    * The data from a geophone is often recorded as one channel in a seismometer's data record
* **Component**: Often used interchangeably with "channel", but more specifically refers to directionality of the geophone
    * The term **channel** implies a single set of data from a single geophone with consistent parameters

### Terms or classes with a specific usage in Obspy
* **Trace**: the basic building block of seismic data in Obspy
    * A trace is a class in obspy that consists of a single stream of data and its associated metadata
    * The time component of an obspy trace is always in UTC time
    * Traces can have gaps in time, which can either be "merged" as a single trace with a "masked" array or "split" into multiple traces.
    * Much of a trace's metadata is contained in its "stats" attribute
* **Stream**: a collection of seismic data in Obspy, often the most basic data type read in from a real data source
    * Streams consist of multiple traces. For example, the data from a three component seismometer ideally consists of three traces collected into in a single stream
    * Most of the obspy functions or methods that work on streams actually perform on the individual traces in the stream.
* **UTC**: "Universal time coordinates," similar to an official scientific "time zone"
    * For the most part, it is aligned with Greenwich Mean Time (i.e., the time in England)
    * In obspy, these time coordinates are implented as `UTCDateTime` objects
    * `UTCDateTime` is a class to standardize the timing, but also has varous methods and attributes that allow manipulation from other common python time objects (such as objects in the Datetime module of the python standard library and matplotlib times for plotting)

# Seismic Data: Data in time

One of the most important aspects of seismic data is the time dimension. In fact, modern seismology depends entirely on the ability to accurately and precisely measure when ground motion occured. It is perhaps as important as the magnitude of the ground motion itself.

In order to work with time, we should first try understand the objects used for time in python.

There was a relatively large change in how python deals with time natively that was released with python version 3.9, so please ensure you have version 3.9 or greater installed.

The primary time and date module in python's standard library is datetime. The following is a non-comprehensive overview of python packages that deal with time:

### Standard library (installed with python itself)
* **datetime**: The primary python module for creating and dealing with dates and times
    * **date**: similar to and compatible with datetime, but only uses dates
    * **time**: simliar to and compatible with datetime, but only deals with times
    * **tzinfo**: module for creating timezone objects, converting between timezones, etc.
        * **timezone**: class for working with timezones and offsets from UTC, for example
    * **timedelta**: module for getting the difference between two datetime values


### 3rd-Party Libraries
* **pytz**: installed as a dependency in pandas, so it is widely used
* **tzdata**: 
* **matplotlib.pyplot.time**: submodule of matplotlib's pyplot that allows manipulation of time
* **UTCDateTime**: a module of the obspy package, this is the primary object used to represent time in obspy


Let's first familiarize ourselves with the datetime module.

The datetime module contains several submodules. One is also called `datetime`. The `date` and `time` submodules are, to put it in a simple way, essentially one half each of the `datetime` class.

You can create a time object (at midnight: 00:00:00) with the following code. You can add arguments for hours, minutes, seconds, microseconds, and timezone information.

In [106]:
import datetime
# Create a time object at midnight
midnightTime = datetime.time()
#this is the same as datetime.time(hour=0, minute=0, second=0, microsecond=0)
print(midnightTime)
oneSecond5MicrosAfterMidnight = datetime.time(hour=0, minute=0, second=1, microsecond=5)
print(oneSecond5MicrosAfterMidnight)
midnightTime

00:00:00
00:00:01.000005


datetime.time(0, 0)

Now, that may be useful enough, but...which midnight do we mean (i.e., where on earth?). This is not specified by default with native datetime objects, but we can make these objects timezone-aware.

In [154]:
import datetime
import zoneinfo
usc = zoneinfo.ZoneInfo('US/Central')
uscMidnight = datetime.time(tzinfo=usc)
uscMidnight

datetime.time(0, 0, tzinfo=zoneinfo.ZoneInfo(key='US/Central'))

A printout of available timezones can be printed using the following code (this is a set of officially-recognized timezone names):

In [None]:
import zoneinfo
zoneinfo.available_timezones()

There is actually more that can be done on `datetime.datetime` objects with timezones. 

For example, let's say that we acquire data in the field using our local time (e.g., Central Time in the U.S.).

However, our seismic data is likely to be in UTC. If we want to programatically "translate" this time, we can do so! Let's first define a `datetime` object

In [135]:
import datetime
import zoneinfo
# First, define the date and timezone
usc = zoneinfo.ZoneInfo('US/Central')

# specifying tzinfo makes datetimes timezone-aware
oct102010=datetime.datetime(2010, 10, 10, 5, 10, tzinfo=usc)
# same as: oct102010=datetime.datetime(year=2010, month=10, day=10, hour=5, minute=10, tzinfo=usc)

print(oct102010)
oct102010

2010-10-10 05:10:00-05:00


datetime.datetime(2010, 10, 10, 5, 10, tzinfo=zoneinfo.ZoneInfo(key='US/Central'))

Now, let's convert that timezone-aware datetime object to UTC

In [136]:
oct102010UTC = oct102010.astimezone(zoneinfo.ZoneInfo('UTC'))
print(oct102010UTC)
oct102010UTC

2010-10-10 10:10:00+00:00


datetime.datetime(2010, 10, 10, 10, 10, tzinfo=zoneinfo.ZoneInfo(key='UTC'))

We are able (only relatively recently) to get native datetime objects into UTC.

But, python's native datetime module is not the most robust or intuitive for specifying times in UTC, which is very important for seismic data!

So, the Obspy module has its own class for keeping track of datetimes to avoid this confusion (and to add useful functionaliy). This class is called `UTCDateTime`

For example, rather than microsecond precision out of the box, the `UTCDateTime` has nanosecond precision. There is also no confusion as to what timezone the time data is in, since it is always in UTC.

`UTCDateTime` can be called similarly to the native python `datetime.datetime` class (i.e., by specifying year, month, day, etc.), but it also has many more options to maintain compatibility with a variety of seismic systems.

For example, one of the more commonly used alternative date specifiers is called the "Julian day" which in this case essentially means the day of the year (e.g., Feb 1 would be the 32nd day of the year). The Julian Day can be used both for input and out output of `UTCDateTime` objects.

Obspy's UTCDateTime can easily determine this value, see below:

In [151]:
import obspy
oct102010UTCDT = obspy.UTCDateTime(2010, 10, 10, 5, 10)
print(oct102010UTCDT.julday)

283


# Data in traces and streams

In [156]:
import obspy
sampleStream = obspy.read()
sampleStream

3 Trace(s) in Stream:
BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples

In [158]:
# Loop through traces
for trace in sampleStream:
    print('TRACE:', trace)

TRACE: BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
TRACE: BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
TRACE: BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples


In [170]:
firstTrace = sampleStream[0]
zTrace = firstTrace
print(type(zTrace))
print(zTrace)

<class 'obspy.core.trace.Trace'>
BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples


In [176]:
eTrace = sampleStream.select(channel='EHE')
print(type(eTrace))
print(eTrace)

<class 'obspy.core.stream.Stream'>
1 Trace(s) in Stream:
BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples


In [172]:
nTrace = sampleStream.select(component='N')
print(type(nTrace))
print(nTrace)

<class 'obspy.core.stream.Stream'>
1 Trace(s) in Stream:
BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples


In [184]:
eTraceActual = eTrace[0]
eTraceActual.stats

         network: BW
         station: RJOB
        location: 
         channel: EHE
       starttime: 2009-08-24T00:20:03.000000Z
         endtime: 2009-08-24T00:20:32.990000Z
   sampling_rate: 100.0
           delta: 0.01
            npts: 3000
           calib: 1.0
    back_azimuth: 100.0
     inclination: 30.0
        response: Channel Response
	From M/S (Velocity in Meters Per Second) to COUNTS (Digital Counts)
	Overall Sensitivity: 2.5168e+09 defined at 0.020 Hz
	4 stages:
		Stage 1: PolesZerosResponseStage from M/S to V, gain: 1500
		Stage 2: CoefficientsTypeResponseStage from V to COUNTS, gain: 1.67785e+06
		Stage 3: FIRResponseStage from COUNTS to COUNTS, gain: 1
		Stage 4: FIRResponseStage from COUNTS to COUNTS, gain: 1