# ORFEUS Webinar

Date: 19th of June 2024

Topic: EIDAWS-WFCatalog and the FDSNWS-Availability.

## Agenda

1. EIDAWS-WFCatalog recap
1. FDSNWS-Availability introduction
1. Example workflow using FDSNWS-Availability
1. FDSNWS-Availability client demo

## EIDAWS-WFCatalog recap

EIDAWS-WFCatalog provides **detailed information on the contents of waveform data** including quality control parameters.

Information can be included on sample metrics, record header flags, and timing quality (refer to [orfeus-eu.org/data/eida/webservices/wfcatalog](https://orfeus-eu.org/data/eida/webservices/wfcatalog/)) for detailed information.

EIDAWS-WFCatalog can serve as an index for data discovery as it has support for range filtering on all available metrics:

- Quadratic mean
- Standard deviation
- Minimum
- Maximum
- Availability
- Gaps
- Sum of gaps
- Overlaps
- Sum of overlaps
- Median
- Mean
- Lower quartile
- Upper quartile

Example (`NL.HGN.02.BHZ.D.2024.001`)

URL: [orfeus-eu.org/eidaws/wfcatalog/1/query?network=NL&station=HGN&channel=BHZ&start=2024-01-01&end=2024-01-02&include=all](https://orfeus-eu.org/eidaws/wfcatalog/1/query?network=NL&station=HGN&channel=BHZ&start=2024-01-01&end=2024-01-02&include=all)

Response:
```json
[
    {
        "version": "1.0.0",
        "producer": {
            "name": "ORFEUS ODC/KNMI",
            "agent": "ObsPy mSEED-QC",
            "created": "2024-01-08T05:59:13.395Z"
        },
        "station": "HGN",
        "network": "NL",
        "location": "02",
        "channel": "BHZ",
        "num_gaps": 1,
        "num_overlaps": 0,
        "sum_gaps": 1.819538,
        "sum_overlaps": 0,
        "max_gap": 1.819538,
        "max_overlap": null,
        "record_length": [
            512
        ],
        "sample_rate": [
            40
        ],
        "percent_availability": 99.99789405324074,
        "encoding": [
            "STEIM2"
        ],
        "num_records": 9476,
        "start_time": "2024-01-01T00:00:00.000Z",
        "end_time": "2024-01-02T00:00:00.000Z",
        "format": "miniSEED",
        "quality": "D",
        "sample_min": -542404,
        "sample_max": 516487,
        "sample_mean": 682.47420721728,
        "sample_median": 655,
        "sample_stdev": 22389.627479927443,
        "sample_rms": 22400.02655653422,
        "sample_lower_quartile": -798,
        "sample_upper_quartile": 2112,
        "miniseed_header_percentages": {
            "timing_quality_mean": 99.93035035880118,
            "timing_quality_median": 100,
            "timing_quality_lower_quartile": 100,
            "timing_quality_upper_quartile": 100,
            "timing_quality_min": 80,
            "timing_quality_max": 100,
            "timing_correction": 0,
            "io_and_clock_flags": {
                "short_record_read": 0,
                "station_volume": 0,
                "start_time_series": 0,
                "end_time_series": 0,
                "clock_locked": 99.99789405310595
            },
            "data_quality_flags": {
                "amplifier_saturation": 0,
                "digitizer_clipping": 0,
                "spikes": 0,
                "glitches": 0,
                "missing_padded_data": 0,
                "telemetry_sync_error": 0,
                "digital_filter_charging": 0,
                "suspect_time_tag": 0
            },
            "activity_flags": {
                "calibration_signal": 0,
                "time_correction_applied": 0,
                "event_begin": 0.5072337940887168,
                "event_end": 0,
                "positive_leap": 0,
                "negative_leap": 0
            }
        }
    }
]
```

EIDAWS-WFCatalog is using ObsPy's `MSEEDMetadata` at its core:

In [None]:
from obspy.signal.quality_control import MSEEDMetadata

md = MSEEDMetadata(["NL.HGN.02.BHZ.D.2024.001"])
md.meta

## FDSNWS-Availability introduction

Definition from [fdsnws-availability-1.0.pdf](https://fdsn.org/webservices/fdsnws-availability-1.0.pdf):

> web service interface for the exchange of time series data availability

Exposes following methods:

- `/query` (full resolution time series listings with an option to merge overlapping time spans)
- `/queryauth` (optional, same as `/query`, but authenticated)
- `/extent` (time series listings with only the earliest and latest data available)
- `/extentauth` (optional, same as `/extent`, but authenticated)
- `/version`

Deployed across EIDA in 2023/2024.

For complete specification including list of available parameters, please refer to [fdsnws-availability-1.0.pdf](https://fdsn.org/webservices/fdsnws-availability-1.0.pdf).

### Example outputs



#### `/query`

URL: [orfeus-eu.org/fdsnws/availability/1/query?net=NA&start=2024-01-01](https://orfeus-eu.org/fdsnws/availability/1/query?net=NA&start=2024-01-01)

```
#Network Station Location Channel Quality SampleRate Earliest                    Latest                     
NA       SABA             BHE     D       40.0       2024-01-01T00:00:03.994000Z 2024-01-02T00:00:00.000000Z
NA       SABA             BHE     D       40.0       2024-01-02T00:00:01.944000Z 2024-01-03T00:00:00.000000Z
(...)
NA       SABN             BHE     D       40.0       2024-01-01T23:29:23.074000Z 2024-01-02T00:00:00.000000Z
NA       SABN             BHE     D       40.0       2024-01-02T00:00:22.024000Z 2024-01-02T01:58:10.824000Z
(...)
NA       SABQ             BHE     D       40.0       2024-01-01T00:00:02.199000Z 2024-01-02T00:00:00.000000Z
NA       SABQ             BHE     D       40.0       2024-01-02T00:00:00.199000Z 2024-01-03T00:00:00.000000Z
(...)
```

#### `/extent`

URL: [orfeus-eu.org/fdsnws/availability/1/extent?net=NA&start=2024-01-01](https://orfeus-eu.org/fdsnws/availability/1/extent?net=NA&start=2024-01-01)

```
#Network Station Location Channel Quality SampleRate Earliest                    Latest                      Updated              TimeSpans Restriction
NA       SABA             BHE     D       40.0       2024-01-01T00:00:03.994000Z 2024-06-16T00:00:00.000000Z 2024-06-16T07:36:02Z 19        OPEN       
NA       SABA             BHN     D       40.0       2024-01-01T00:00:04.119000Z 2024-06-16T00:00:00.000000Z 2024-06-16T07:36:38Z 19        OPEN       
(...)
NA       SABN             BHE     D       40.0       2024-01-01T23:29:23.074000Z 2024-04-09T00:00:54.575000Z 2024-04-10T07:35:27Z 5         OPEN       
NA       SABN             BHN     D       40.0       2024-01-01T23:29:04.449000Z 2024-05-19T00:01:02.500000Z 2024-05-20T07:32:01Z 5         OPEN       
(...)
NA       SABQ             BHE     D       40.0       2024-01-01T00:00:02.199000Z 2024-06-16T00:00:00.000000Z 2024-06-16T07:31:00Z 9         OPEN       
NA       SABQ             BHN     D       40.0       2024-01-01T00:00:05.674000Z 2024-06-16T00:00:00.000000Z 2024-06-16T07:31:55Z 9         OPEN       
(...)
```

Output format defaults to `text`, but data can be also requested as `geocsv` and `json`:

URL: [orfeus-eu.org/fdsnws/availability/1/query?net=NA&sta=SABA&channel=BHZ&start=2024-01-01&end=2024-01-02&format=json](https://orfeus-eu.org/fdsnws/availability/1/query?net=NA&sta=SABA&channel=BHZ&start=2024-01-01&end=2024-01-02&format=json)

```json
{
  "created": "2024-06-17T06:49:47Z",
  "version": 1,
  "datasources": [
    {
      "network": "NA",
      "station": "SABA",
      "location": "",
      "channel": "BHZ",
      "quality": "D",
      "samplerate": 40,
      "timespans": [
        [
          "2024-01-01T00:00:01.944000Z",
          "2024-01-02T00:00:00.000000Z"
        ],
        [
          "2024-01-02T00:00:04.669000Z",
          "2024-01-02T00:00:00.000000Z"
        ]
      ]
    }
  ]
}
```

## Example workflow using FDSNWS-Availability

1. Retrieve seismic event information from FDSNWS-Event catalogue offered by [GFZ](https://www.gfz-potsdam.de/) EIDA Node:
    * Date:
        * 📅 start date = 2024-01-01
        * 📅 end date = 2024-06-01
    * Event characteristics:
        * 🎚️ minimum magnitude = 5
    * Coordinates:
        * 🌐 minimum latitude = 40°N
        * 🌐 maximum latitude = 45°N
        * 🌐 minimum longitude = 17°E
        * 🌐 maximum longitude = 25°E
1. Using [FDSNWS-Station](https://www.orfeus-eu.org/data/eida/webservices/station/) web service, list all stations available in the region defined using epicentral distance.
1. Using [FDSNWS-Availability](https://www.orfeus-eu.org/data/eida/webservices/availability/) service exclude stations with missing data.
1. Using [FDSNWS-Dataselect](https://www.orfeus-eu.org/data/eida/webservices/dataselect/) web service, download miniSEED files containing the waveforms from a time window defined using theoretial seismic wave arrival times.

### Imports

In [None]:
import io
from datetime import timedelta

import pandas as pd
import requests
from obspy import UTCDateTime
from obspy.clients.fdsn.client import Client
from obspy.geodetics import locations2degrees
from obspy.taup import TauPyModel

### Config

In [None]:
# Define start and end dates for the event search
START = "2024-01-01"
END = "2024-06-01"

# Define event minimum magnitude and bounding box for event search
MAG_MIN = 2.1
LAT_MIN = 52
LAT_MAX = 55
LON_MIN = 4
LON_MAX = 8

# Max radius of stations from the epicenter (degrees)
RADIUS_MAX = 0.25

# FDSNWS-Availability URL
FDSNWS_AVAILABILITY_URL = "https://orfeus-eu.org/fdsnws/availability/1/query"
# Encoding for the HTTP requests
ENCODING = "utf-8"

In [None]:
# Global instances
CLIENT_ODC = Client("ODC")
CLIENT_KNMI = Client("KNMI")

# We use the iasp91 reference model
TAUP_MODEL = TauPyModel(model="iasp91")

### Define helping functions

In [None]:
def getPArrival(event, station):
    """Function calculating theoretical seismic wave arrival time at station

    Args:
        event (object): ObsPy event object
        station (object): ObsPy station object

    Returns:
        UTCDateTime: Theoretical P-arrival time
    """
    # Determine the arc distance using the haversine formula
    arcDistanceDegrees = locations2degrees(
        event.origins[0].latitude,
        station.latitude,
        event.origins[0].longitude,
        station.longitude,
    )

    # Calculate the theoretical P-arrival time
    arrivals = TAUP_MODEL.get_travel_times(
        source_depth_in_km=1e-3 * event.origins[0].depth,
        distance_in_degree=arcDistanceDegrees,
        phase_list=["P"],
    )

    # Add the theorical P-arrival delta to the event time
    return UTCDateTime(event.origins[0].time) + arrivals[0].time

### Event catalogue

In [None]:
catalog = CLIENT_KNMI.get_events(
    starttime=START,
    endtime=END,
    minmagnitude=MAG_MIN,
    minlatitude=LAT_MIN,
    maxlatitude=LAT_MAX,
    minlongitude=LON_MIN,
    maxlongitude=LON_MAX,
)

event = catalog[0]
print(event)

Plot the event:

In [None]:
catalog_plot_ortho = catalog.plot(projection="ortho")
# catalog_plot_local = catalog.plot(projection="local")

### Station inventory

In [None]:
inventory = CLIENT_ODC.get_stations(
    # startbefore=event.origins[0].time,
    # endafter=event.origins[0].time,
    latitude=event.origins[0].latitude,
    longitude=event.origins[0].longitude,
    maxradius=RADIUS_MAX,
)

print(inventory)

Plot the inventory:

In [None]:
inventory_plot = inventory.plot(projection="local")

### Availability information

FDSNWS-Availability web service has been deployed across EIDA in late 2023. Service specifications can be found on [fdsn.org/webservices/fdsnws-availability-1.0.pdf](https://fdsn.org/webservices/fdsnws-availability-1.0.pdf).

Unfortunately, FDSNWS-Availability is not yet impmenented in ObsPy - we are going to use `requests` library to request the information.

We are goin to send a POST request to the `FDSNWS_AVAILABILITY_URL` with following body listing stations discovered in previous step:
```
NET STA LOC CHA START END
```

As a response, we are expecting list of continuous segments:
```
#Network Station Location Channel Quality SampleRate Earliest Latest
NET      STA     LOC      CHA     M       20.0       START    END
```

In [None]:
POST_DATA = "format=geocsv\n"

# Build availability POST data
for network in inventory.networks:
    for station in network.stations:
        # Get the P-arrival time for the station
        p_arrival = getPArrival(event, station)
        # Define a 15 seconds window before the P-arrival and a 60 seconds window after the P-arrival
        window_start = p_arrival - timedelta(seconds=1000)
        window_end = p_arrival + timedelta(seconds=1000)
        # We are only interested in broadband high gain data
        POST_DATA += f"{network.code} {station.code} * HH? {window_start.isoformat()} {window_end.isoformat()}\n"

# Send availability request
availability_response = requests.post(
    FDSNWS_AVAILABILITY_URL, data=POST_DATA.encode(ENCODING)
)

# Filter out rows starting with a hash
availability = "\n".join(
    [
        line
        for line in availability_response.text.split("\n")
        if not line.startswith("#")
    ]
)

Content of our request:

In [None]:
# First 10 lines of the availability request
print("\n".join(POST_DATA.split("\n")[:10]))

Content of the response:

In [None]:
# First 10 lines of the availability response
print("\n".join(availability.split("\n")[:10]))

### Process FDSNWS-Availability response

- Convert text to CSV
- Load into Pandas DataFrame
- Group on channel code
- Ensure there are 3 components for each station

In [None]:
# Convert response to CSV:
csv_data = availability.replace("|", ",")

# Load it into a pandas DataFrame
df = pd.read_csv(io.StringIO(csv_data), dtype=str)

# Fill NaN values with empty strings (required for grouping)
df = df.fillna('')

# Round the 'Start' and 'End' columns to the nearest second by removing the milliseconds
df["Earliest"] = df["Earliest"].str[:19]
df["Latest"] = df["Latest"].str[:19]

# Add a column with the first two characters of the 'Channel' column for easier grouping
df["Channel_First_Two"] = df["Channel"].str[:2]

# And aggregate the data tdfo get the number of available channels per station
df = (
    df.groupby(
        ["Network", "Station", "Location", "Channel_First_Two", "Earliest", "Latest"]
    )
    .size()
    .reset_index(name="Count")
)

# Remove rows where 'Count' is less than 3
df = df[df["Count"] >= 3]

# Display the first 10 rows
df.head(10)

### Downloading and plotting the data

In [None]:
# Let's make a summary of our event
print(f"""
Time: {event.origins[0].time}
Latitude: {event.origins[0].latitude}
Longitude: {event.origins[0].longitude}
Depth: {event.origins[0].depth}
Magnitude: {event.magnitudes[0].mag}
""")

In [None]:
for e in df.itertuples():
    st = CLIENT_KNMI.get_waveforms(
        network=e.Network,
        station=e.Station,
        location=e.Location,
        channel=e.Channel_First_Two + "?",
        starttime = UTCDateTime(e.Earliest),
        endtime = UTCDateTime(e.Latest)
    )

    print(f"Downloading data for {e.Network}.{e.Station}.{e.Channel_First_Two} between {e.Earliest} and {e.Latest}...")

    # Apply lowpass filter and plot the data
    st.filter("lowpass", freq=0.5)
    st.plot()


## FDSNWS-Availability client demo