### Raw Data Download Notebook

*Important Note:* This notebook runs using a different environment than this project's computational notebook. ObsPy is required, along with numpy and matplotlib

Zali et al. 2024 pulls horizontal seismic data from a single horizontal component from a single station, demeaned and detrended but not converted from instrument response. The data is pulled from March 12 to June 24, ~104 days, beginning 7 days before the Geldingadalir eruption began. The data is also downsampled to the minimum sampling rate necessary to observe the local volcanic tremor.

------------------
This notebook will download 100 days of data, beginning December 12th,2004 and ending March 17th,2005, from sensor UW.ELK of the Cascade Chain Volcano Monitoring network. Mount St. Despite being famous for its 1980 eruption, the volcano experienced renewed activity in 2004, with an explosive eruption on January 16th,2005, which this dataset will look at. https://www.usgs.gov/volcanoes/mount-st.-helens/science/2004-2008-renewed-volcanic-activity 

-----------------

This notebook will also calculate the distance between the specified sensor and volcano.

In [1]:
import obspy
from obspy import UTCDateTime as utc
from obspy.clients.fdsn import Client
from obspy.clients.fdsn.header import FDSNNoDataException
client = Client('IRIS')
from obspy.core.util import AttribDict
import numpy as np
import os

### Data Description

The seismic data is downloaded from the IRIS/Earthscope Database using ObsPy. The data downloads as a trace, an array of data with attached metadata, which is then packaged into a stream, which can contain multiple traces. The daily seismic data records are then saved as mseed files, which preserve this data+metadata structure, but requires ObsPy or other specialized software to open.

In [2]:
utc('2005-01-16T00:00:00') - 50*24*60*60

2004-11-27T00:00:00.000000Z

In [3]:
#creating variables to download data
starttime = utc('2004-11-27T00:00:00') #altered from original to capture half before/after eruption days
endtime = starttime + 100 * (60*60*24)

#also add a buffer to both ends to chop off once the data has been filtered
#and downsampled, kind of arbitrary length, 5% of a day (default ObsPy taper length)
buffer = 60*60*24*0.05 #seconds

net = 'UW'
sta = 'ELK'
loc = '*' #wildcard, generally don't care about location code
cha = 'EHZ' #vertical component, unlike the horizontal component used in Zali et al. but the only data I could get for Mt. St. Helens

#create folder for numpy streams to go into and initialize filepath
!mkdir data
!mkdir data/raw
filepath = os.getcwd() + '/data/raw/'

#create arrays to save dates
dates = np.array([])

#arrays for eruption states
states = np.array([])

print(f'Downloading data from {starttime} to {endtime}')

#download the data piecemeal, here by day
for day in range(100):
    tr_length = 24*60*60

    try:
        #actually downloading
        st = client.get_waveforms(network=net,
                        station=sta,
                        location=loc,
                        channel=cha,
                        starttime=starttime-buffer,
                        endtime=starttime+buffer+tr_length)

        #instrument sampling rate (hz)
        freq = st[0].stats.sampling_rate

        #merge traces within stream, linearly interpolating any gaps
        st.merge(fill_value='interpolate')
        
        #print flag if more than one trace still exists
        if len(st) > 1:
            print('Too many traces')
        
        #generate filename, day number in front for convenience of reading in
        name = str(day+1)+'_sthelens.mseed'
        
        #save data as mseed, standard for storing seismic data. Preserves metadata and time series info
        st.write(filepath+name, format='MSEED')

        #adding date
        dates = np.append(dates, starttime.date)

        starttime += tr_length

    except FDSNNoDataException:
        print('No data for Day '+str(day))

#save dates list for future use
np.save(filepath+'date_list', dates)

print(f'data download complete, saved to {filepath}')

Downloading data from 2004-11-27T00:00:00.000000Z to 2005-03-07T00:00:00.000000Z
data download complete, saved to /Users/KatarzynaPerks/Documents/GitHub/kperks_Geldingadalir/notebooks/StHelens_analysis/data/raw/


### Data Modalities and Formats

#### Data Modalities
The dataset consists of seismic data collected from a single horizontal component (BHN) of the AV.GSTD station located on the Great Sitkin Volcano. The data is recorded continuously over a period of 100 days, capturing the seismic activity before, during, and after the volcanic eruption.

#### Data Formats
1. **MSEED (Mini-SEED) Files**:
    - The seismic data is stored in Mini-SEED format, which is a compact binary format used for storing time series data. Each file contains a day's worth of seismic data, including metadata such as the sampling rate and station information.
    - Example file name: `1_sitkin.mseed`, `2_sitkin.mseed`, ..., `100_sitkin.mseed`.
2. **Numpy Arrays**:
    - The dates corresponding to each day's seismic data are stored in a numpy array and saved as a CSV file (`date_list.csv`). This array helps in mapping the MSEED files to their respective dates.
    - Example: `dates = np.array([datetime.date(2021, 5, 16), datetime.date(2021, 5, 17), ...])`.

#### Data Processing
- The raw seismic data is downloaded using the ObsPy library from the IRIS/Earthscope Database.
- The data is merged and interpolated to fill any gaps, ensuring a continuous time series.
- The processed data is saved in the MSEED format, preserving both the time series and metadata.

This structured approach ensures that the seismic data is well-organized and easily accessible for further analysis and processing.

In [8]:
# display some data from the mseed file
print(f'Example data from {name}')
print(st[0].stats)
print()

# display the numpy array
print(f'Example data from {name}')
print(st[0].data)
print()

Example data from 100_sthelens.mseed
               network: UW
               station: ELK
              location: 
               channel: EHZ
             starttime: 2005-03-05T22:48:00.009900Z
               endtime: 2005-03-07T01:11:59.999900Z
         sampling_rate: 100.0
                 delta: 0.01
                  npts: 9504000
                 calib: 1.0
_fdsnws_dataselect_url: http://service.iris.edu/fdsnws/dataselect/1/query
               _format: MSEED
                 mseed: AttribDict({'dataquality': 'M', 'number_of_records': 10368, 'encoding': 'STEIM1', 'byteorder': '>', 'record_length': 512, 'filesize': 11845120})
            processing: ['ObsPy 1.4.1: trim(endtime=UTCDateTime(2005, 3, 7, 1, 11, 59, 999900)::fill_value=None::nearest_sample=True::pad=False::starttime=UTCDateTime(2005, 3, 5, 22, 47, 59, 999900))']

Example data from 100_sthelens.mseed
[-19 -15 -28 ... -45 -43 -31]



### Calculate Distance Between Sensor and Volcano using Obspy

In [9]:
#Input Volcano Coords (degrees, north and east positive)
v_lat = 46.191387; v_lon = -122.1956 #Mt.St.Helens

In [10]:
#download inventory object of channels that fit specifiers set above, coordinates are contained here
inv = client.get_stations(network=net, station=sta, location=loc, channel=cha, starttime=starttime, 
                    endtime=endtime, level='channel')

s_lat = inv[0][0][0].latitude; s_lon = inv[0][0][0].longitude

In [11]:
#use obspy function to calculate great circle distance
distance, _, _ = obspy.geodetics.gps2dist_azimuth(v_lat, v_lon, s_lat, s_lon)
distance /= 1000 #convert from meters to km
print('Volcano to sensor distance: '+str(round(distance, 3))+' km')

Volcano to sensor distance: 17.003 km
