### Raw Data Download Notebook

Zali et al. 2024 pulls horizontal seismic data from a single horizontal component from a single station, demeaned and detrended but not converted from instrument response. The data is pulled from March 12 to June 24, ~104 days, beginning 7 days before the Geldingadalir eruption began. The data is also downsampled to the minimum sampling rate necessary to observe the local volcanic tremor.

*Important Note:* This notebook runs using a different environment than this project's computational notebook. ObsPy is required, along with numpy and matplotlib

This notebook downloads 60 days of data, beginning 09/01/2024, from the HV.STCD sensor at the of the Hawaiian Volcano Observatory's network, located on the island of Hawaii.

> Magma intruded beneath the ground near Makaopuhi Crater—a well-known magma storage region on Kīlauea’s middle East Rift Zone on September 14. HVO published a Status Report alerting the public and partners to the activity, which was accompanied by hundreds of earthquakes and ground deformation. 

https://www.usgs.gov/observatories/hvo/science/eruption-kilauea-middle-east-rift-zone

In [15]:
import obspy
from obspy import UTCDateTime as utc
from obspy.clients.fdsn import Client

import numpy as np
import os

### Data Description

The seismic data is downloaded from the IRIS/Earthscope Database using ObsPy. The data downloads as a trace, an array of data with attached metadata, which is then packaged into a stream, which can contain multiple traces. The daily seismic data records are then saved as [mseed files](https://ds.iris.edu/ds/nodes/dmc/data/formats/miniseed/), which preserve this data+metadata structure, but requires ObsPy or other specialized software to open.

Set all of your variables for the station:

In [16]:
client = Client('IRIS')

#creating variables to download data
#starttime = utc('2024-08-15T00:00:00')
starttime = utc('2024-09-01T00:00:00')
endtime = starttime + 60 * (60*60*24)
day_range = 60


#also add a buffer to both ends to chop off once the data has been filtered
#and downsampled, kind of arbitrary length, 5% of a day (default ObsPy taper length)
buffer = 60*60*24*0.05 #seconds

net = 'HV'
sta = 'STCD'
loc = '*' #wildcard, generally don't care about location code
cha = 'HHE' #horizontal component, as used in Zali et al


In [17]:
# use obspy to display the station and channel information
inv = client.get_stations(network=net, station=sta, location=loc, channel=cha, starttime=starttime, endtime=endtime, level='response')
print(inv)

# display all of the channels in this station
for network in inv:
    for station in network:
        print(channel)


Inventory created at 2024-11-24T03:50:32.207900Z
	Created by: IRIS WEB SERVICE: fdsnws-station | version: 1.1.52
		    http://service.iris.edu/fdsnws/station/1/query?starttime=2024-09-01...
	Sending institution: IRIS-DMC (IRIS-DMC)
	Contains:
		Networks (1):
			HV
		Stations (1):
			HV.STCD (Steam Cracks Hawaii Digital)
		Channels (1):
			HV.STCD..HHE
Channel 'HNZ', Location '' (Upgrade equipment)
	Time range: 2024-08-02T00:00:00.000000Z - --
	Latitude: 19.3849, Longitude: -155.1254, Elevation: 769.0 m, Local Depth: 0.0 m
	Azimuth: 0.00 degrees from north, clockwise
	Dip: -90.00 degrees down from horizontal
	Channel types: CONTINUOUS, GEOPHYSICAL
	Sampling Rate: 100.00 Hz
	Sensor (Description): NANOMETRICS (Accelerometer)
	Response information available


In [18]:
#create folder for numpy streams to go into and initialize filepath
!mkdir data
!mkdir data/raw
filepath = os.getcwd() + '/data/raw/'

#create arrays to save dates
dates = np.array([])

print(f'Downloading data from {starttime} to {endtime}')

#download the data piecemeal, here by day
for day in range(day_range):
    
    tr_length = 24*60*60
    
    # Format the current date as YYYYMMDD for the filename
    date_str = starttime.strftime('%Y%m%d')
    filename = f"{date_str}_{net}{sta}.mseed"
    
    # if file already exists, then skip
    if os.path.exists(filepath + filename,):
        print(f'Data for day {date_str} already exists, skipping')
        starttime += tr_length
    else:
        try:
            # actually downloading
            st = client.get_waveforms(network=net,
                                    station=sta,
                                    location=loc,
                                    channel=cha,
                                    starttime=starttime-buffer,
                                    endtime=starttime+buffer+tr_length)

            # instrument sampling rate (hz)
            freq = st[0].stats.sampling_rate

            # merge traces within stream, linearly interpolating any gaps
            st.merge(fill_value='interpolate')

            ## Save data as MSEED, standard for storing seismic data
            st.write(filepath + filename, format='MSEED')

            # adding date
            dates = np.append(dates, starttime.date)

            print(f'Downloaded day {day + 1}')
            
        except Exception as e:
            print(f'Error downloading data for day {day + 1}: {e}')
        
    starttime += tr_length

#save dates list for future use
np.save(filepath+'date_list.csv', dates)

print(f'data download complete, saved to {filepath}')

Downloading data from 2024-09-01T00:00:00.000000Z to 2024-10-31T00:00:00.000000Z
Downloaded day 1
Downloaded day 2
Downloaded day 3
Downloaded day 4
Downloaded day 5
Downloaded day 6
Downloaded day 7
Downloaded day 8
Downloaded day 9
Downloaded day 10
Downloaded day 11
Downloaded day 12
Downloaded day 13
Downloaded day 14
Downloaded day 15
Downloaded day 16
Downloaded day 17
Downloaded day 18
Downloaded day 19
Downloaded day 20
Downloaded day 21
Downloaded day 22
Downloaded day 23
Downloaded day 24
Downloaded day 25
Downloaded day 26
Downloaded day 27
Downloaded day 28
Downloaded day 29
Downloaded day 30
Downloaded day 31
Downloaded day 32
Downloaded day 33
Downloaded day 34
Downloaded day 35
Downloaded day 36
Downloaded day 37
Downloaded day 38
Downloaded day 39
Downloaded day 40
Downloaded day 41
Downloaded day 42
Downloaded day 43
Downloaded day 44
Downloaded day 45
Downloaded day 46
Downloaded day 47
Downloaded day 48
Downloaded day 49
Downloaded day 50
Downloaded day 51
Downloaded

### Data Modalities and Formats

#### Data Modalities
The dataset consists of seismic data collected from a single horizontal component (HHE) of the HV.UWB. The data is recorded continuously over a period of 60 days, capturing the seismic activity before, during, and after the volcanic eruption.

#### Data Formats
1. **MSEED (Mini-SEED) Files**:
    - The seismic data is stored in Mini-SEED format, which is a compact binary format used for storing time series data. Each file contains a day's worth of seismic data, including metadata such as the sampling rate and station information.
    - Example file name: `1_YYYYMMDD_hvuwb.mseed`, `2_YYYYMMDD_hwuwb.mseed`, ..., `60_YYYYMMDD_hvuwb.mseed`.
2. **Numpy Arrays**:
    - The dates corresponding to each day's seismic data are stored in a numpy array and saved as a CSV file (`date_list.csv`). This array helps in mapping the MSEED files to their respective dates.
    - Example: `dates = np.array([datetime.date(2021, 5, 16), datetime.date(2021, 5, 17), ...])`.

#### Data Processing
- The raw seismic data is downloaded using the ObsPy library from the IRIS/Earthscope Database.
- The data is merged and interpolated to fill any gaps, ensuring a continuous time series.
- The processed data is saved in the MSEED format, preserving both the time series and metadata.

This structured approach ensures that the seismic data is well-organized and easily accessible for further analysis and processing.

In [19]:
from obspy import read
import glob
import os

# Folder containing mseed files
#folder_path = '/path/to/your/folder'

filepath = os.getcwd() + '/data/raw/'

# Get the first mseed file in the folder
first_file = glob.glob(f"{filepath}/*.mseed")[0]

# Read and display the file
st = read(first_file)
print(st)

1 Trace(s) in Stream:
HV.STCD..HHE | 2024-10-19T22:48:00.000000Z - 2024-10-21T01:12:00.000000Z | 100.0 Hz, 9504001 samples
