### Raw Data Download Notebook

*Important Note:* This notebook runs using a different environment than this project's computational notebook. ObsPy is required, along with numpy and matplotlib

Zali et al. 2024 pulls horizontal seismic data from a single horizontal component from a single station, demeaned and detrended but not converted from instrument response. The data is pulled from March 12 to June 24, ~104 days, beginning 7 days before the Geldingadalir eruption began. The data is also downsampled to the minimum sampling rate necessary to observe the local volcanic tremor.

------------------

This notebook will download 100 days of data, beginning 06/24/21 and ending 10/02/21, from sensor 9F.HOPS of the Iceland Reykjanes experiment 2021 near the eruptive site at Fagradalsfjall on the Reykjanes peninsula.

![HOPS station information](HOPS_info.png)

In [1]:
import obspy
from obspy import UTCDateTime as utc
from obspy.clients.fdsn import Client

import numpy as np
import os

In [2]:
def process(st, buffer, freq, max_target_frequency): #freq is the original sampling frequency
    st.merge(fill_value='interpolate')
    tr = st[0].copy()

### Data Description

The seismic data were obtained from the [GEOFON data centre of the GFZ German Research Centre for Geosciences](https://geofon.gfz-potsdam.de/) using ObsPy. The data downloads as a trace, an array of data with attached metadata, which is then packaged into a stream, which can contain multiple traces. The daily seismic data records are then saved as mseed files, which preserve this data+metadata structure, but requires ObsPy or other specialized software to open.

TODO:
do a try/exception 
obspy has a FDSNNoDataException

In [3]:
# DOWNLOAD DATA INTO INDIVIDUAL DAY FILES #

client = Client('GEOFON')

NUM_DAYS = 30 #number of days to download

#creating variables to download data
starttime = utc('2021-06-24T00:00:00')
endtime = starttime + NUM_DAYS * (60*60*24)

#also add a buffer to both ends to chop off once the data has been filtered
#and downsampled, kind of arbitrary length, 5% of a day (default ObsPy taper length)
buffer = 60*60*24*0.05 #seconds

net = '9F*'
sta = 'HOPS'
loc = '*' #wildcard, generally don't care about location code
cha = 'HHE' #download east-west component

#create folder for numpy streams to go into and initialize filepath
!mkdir data
!mkdir data/raw
filepath = os.getcwd() + '/data/raw/'

#create arrays to save dates
dates = np.array([])

print(f'Downloading data from {starttime} to {endtime}')

#download the data piecemeal, here by day
for day in range(NUM_DAYS):
    print(f'downloading data for day {day+1}')
    tr_length = 24*60*60

    #actually downloading
    st = client.get_waveforms(
        network=net,
        station=sta,
        location=loc,
        channel=cha,
        starttime=starttime - buffer,
        endtime=starttime + buffer + tr_length
    )

    #instrument sampling rate (hz)
    freq = st[0].stats.sampling_rate

    #merge traces within stream, linearly interpolating any gaps
    st.merge(fill_value='interpolate')
    
    #generate filename, day number in front for convenience of reading in
    name = str(day+1)+'_9fhops.mseed'
    
    #save data as mseed, standard for storing seismic data. Preserves metadata and time series info
    st.write(filepath+name, format='MSEED')

    #adding date
    dates = np.append(dates, starttime.date)

    starttime += tr_length

#save dates list for future use
np.save(filepath+'date_list.csv', dates)

print(f'data download complete, saved to {filepath}')

mkdir: cannot create directory ‘data’: File exists
mkdir: cannot create directory ‘data/raw’: File exists
Downloading data from 2021-06-24T00:00:00.000000Z to 2021-07-24T00:00:00.000000Z
downloading data for day 1
downloading data for day 2
downloading data for day 3
downloading data for day 4
downloading data for day 5
downloading data for day 6
downloading data for day 7
downloading data for day 8
downloading data for day 9
downloading data for day 10
downloading data for day 11
downloading data for day 12
downloading data for day 13
downloading data for day 14
downloading data for day 15
downloading data for day 16
downloading data for day 17
downloading data for day 18
downloading data for day 19
downloading data for day 20


KeyboardInterrupt: 

In [None]:
# DOWNLOAD DATA INTO SINGLE FILE #

# client = Client('GEOFON')

# NUM_DAYS = 10 #number of days to download

# #creating variables to download data
# starttime = utc('2021-06-24T00:00:00')
# endtime = starttime + NUM_DAYS * (60*60*24)

# #also add a buffer to both ends to chop off once the data has been filtered
# #and downsampled, kind of arbitrary length, 5% of a day (default ObsPy taper length)
# buffer = 60*60*24*0.05 #seconds

# net = '9F*'
# sta = 'HOPS'
# loc = '*' #wildcard, generally don't care about location code
# cha = 'HHE' #download east-west component

# #create folder for numpy streams to go into and initialize filepath
# !mkdir data
# !mkdir data/raw
# filepath = os.getcwd() + '/data/raw/'

# #create arrays to save dates
# dates = np.array([])

# print(f'Downloading data from {starttime} to {endtime}')

# #download the data piecemeal, here by day
# # Initialize an empty stream to hold the entire dataset
# full_stream = obspy.Stream()

# for day in range(NUM_DAYS):
#     print(f'downloading data for day {day+1}')
#     tr_length = 24*60*60

#     # Actually downloading
#     st = client.get_waveforms(
#         network=net,
#         station=sta,
#         location=loc,
#         channel=cha,
#         starttime=starttime - buffer,
#         endtime=starttime + buffer + tr_length
#     )

#     # Merge traces within stream, linearly interpolating any gaps
#     st.merge(fill_value='interpolate')
    
#     # Append the day's stream to the full stream
#     full_stream += st

#     # Adding date
#     dates = np.append(dates, starttime.date)

#     starttime += tr_length

# # Save the full stream as a single mseed file
# full_stream.write(filepath + 'full_9fhops.mseed', format='MSEED')

# #save dates list for future use
# np.save(filepath+'date_list.csv', dates)

# print(f'data download complete, saved to {filepath}')

### Data Modalities and Formats

#### Data Modalities
The dataset consists of seismic data collected from a single horizontal component (HHE) of the 9F.HOPS station located on the Reykjanes Peninsula in southwest Iceland. The data is recorded continuously over a period of 100 days, capturing the seismic activity before, during, and after the volcanic eruption.

#### Data Formats
1. **MSEED (Mini-SEED) Files**:
    - The seismic data is stored in Mini-SEED format, which is a compact binary format used for storing time series data. Each file contains a day's worth of seismic data, including metadata such as the sampling rate and station information.
    - Example file name: `1_9fhops.mseed`, `2_9fhops.mseed`, ..., `100_9fhops.mseed`.
2. **Numpy Arrays**:
    - The dates corresponding to each day's seismic data are stored in a numpy array and saved as a CSV file (`date_list.csv`). This array helps in mapping the MSEED files to their respective dates.
    - Example: `dates = np.array([datetime.date(2021, 5, 16), datetime.date(2021, 5, 17), ...])`.

#### Data Processing
- The raw seismic data is downloaded using the ObsPy library from the IRIS/Earthscope Database.
- The data is merged and interpolated to fill any gaps, ensuring a continuous time series.
- The processed data is saved in the MSEED format, preserving both the time series and metadata.

This structured approach ensures that the seismic data is well-organized and easily accessible for further analysis and processing.

In [None]:
# display some data from the mseed file
print(f'Example data from {name}')
print(st[0].stats)
print()

# display the numpy array
print(f'Example data from {name}')
print(st[0].data)
print()