# 1. Introduction
This notebook contains (slightly theoretical) workflow for web services exposed by the EIDA (European Integrated Data Archive) Federation, focused on the EIDAWS-WFCatalog service and its functionalities allowing users to filter out low quality and low coverage data before downloading it to the local machine.

In this example we are going to:

1. Retrieve seismic event information from FDSNWS-Event catalogue offered by [GFZ](https://www.gfz-potsdam.de/) EIDA Node:
    * Date:
        * 📅 start date = 2020-01-01
        * 📅 end date = 2020-06-01
    * Event characteristics:
        * 🎚️ minimum magnitude = 5
    * Coordinates:
        * 🌐 minimum latitude = 40°N
        * 🌐 maximum latitude = 45°N
        * 🌐 minimum longitude = 17°E
        * 🌐 maximum longitude = 25°E
1. Using [FDSNWS-Station](https://www.orfeus-eu.org/data/eida/webservices/station/) web service, list all stations available in the same bounding box we used for our event search:
    * 🌐 minimum latitude = 40°N
    * 🌐 maximum latitude = 45°N
    * 🌐 minimum longitude = 17°E
    * 🌐 maximum longitude = 25°E
1. Using [EIDAWS-WFCatalog](https://www.orfeus-eu.org/data/eida/webservices/wfcatalog/) service, we are going to exclude all stations which do not meet following criteria:
    * At least 95% data coverage on the day of the event
    * Maximum of 5 gaps
    * Sum of gaps lower than 50 seconds
    * No overlaps
1. Using [FDSNWS-Dataselect](https://www.orfeus-eu.org/data/eida/webservices/dataselect/) web service, we are going to download miniSEED files containing the waveforms from a time window starting 5 minutes before and ending 15 minutes after, relative to the event origin time
1. Using [FDSNWS-Station](https://www.orfeus-eu.org/data/eida/webservices/station/) web service, we are going to download StationXML file containing channel level station and instrumentation metadata
1. At the end we are going to quickly demo the GUI-based tools based on the [EIDAWS-WFCatalog](https://www.orfeus-eu.org/data/eida/webservices/wfcatalog/) web service, available via [data quality page](https://www.orfeus-eu.org/data/eida/quality/) on the [orfeus-eu.org](https://orfeus-eu.org).

# 1.1. Approach

For downloading metadata (FDSNWS-Station), waveforms (FDSNWS-Dataselect) and metrics (EIDAWS-WFCatalog) we are going to use [EIDAWS-Federator](http://eida-federator.ethz.ch/) gateway. This means that data will be requested using:

```python
read_events(
    pathname_or_url=None,
    format=None,
    **kwargs
)
```

and

```python
read(
    pathname_or_url=None,
    format=None,
    headonly=False,
    starttime=None,
    endtime=None,
    nearest_sample=True,
    dtype=None,
    apply_calib=False,
    check_compression=True,
    **kwargs
)
```

rather than using the `RoutingClient` utilizing the [EIDAWS-Routing](https://www.orfeus-eu.org/data/eida/webservices/routing/):

```python
from obspy.clients.fdsn import RoutingClient
from obspy import UTCDateTime

rsClient = RoutingClient("eida-routing")
st = rsClient.get_waveforms(
    network="Z3",
    channel="HHZ",
    starttime=UTCDateTime(2016, 3, 1),
    endtime=UTCDateTime(2016, 3, 1, 0, 2, 0)
)
```

⚠️ Please keep in mind that [EIDAWS-Federator](http://eida-federator.ethz.ch/) only provides acces to open data, so if your intention is to download restricted datasets, please refer to [EIDA/userfeedback](https://github.com/EIDA/userfeedback) GitHub repository for instructions.

⚠️ ObsPy (as of v1.2.2) does not implement methods to work with EIDAWS-WFCatalog, this service needs to be called directly.

# 2. Finding event
Lets start by finding an event. Below two completely different approaches have been presented:
* Downloading event XML document directly from FDSNWS-Event web service
* Downloading event information using ObsPy library


In [None]:
# Import libraries required to process raw web service response
import requests
import xml.etree.ElementTree as ET

In [None]:
# Set the encoding and XML namespace
ENCODING = "utf-8"
NSMAP = {'mw': 'http://quakeml.org/xmlns/bed/1.2'}

# Define start and end dates for event and station searches
start = "2020-01-01"
end = "2020-06-01"

# Define event minimum magnitude
min_mag = 5

# Define the bounding box for event and station searches
min_lat = 40
max_lat = 45
min_lon = 17
max_lon = 25

In [None]:
# Build URL to retrieve event information from event catalogue hosted by GFZ
events_url = (
    f"http://geofon.gfz-potsdam.de/fdsnws/event/1/query?"
    f"start={start}&end={end}&"
    f"minmag={min_mag}&"
    f"minlat={min_lat}&minlon={min_lon}&"
    f"maxlat={max_lat}&maxlon={max_lon}"
)

# With our original parameters, following URL should be built:
# http://geofon.gfz-potsdam.de/fdsnws/event/1/query?start=2020-01-01&end=2020-06-01&minmag=5&minlat=40&minlon=17&maxlat=45&maxlon=25

In [None]:
# Fire the request
r = requests.get(events_url, timeout=10)
r.encoding = ENCODING

# Check the response
if r.status_code == 200:
    # Parse the XML response
    event_root = ET.fromstring(r.text)
    for event_element in event_root.findall("./mw:eventParameters/*", namespaces=NSMAP):
        # Get event ID from event_element attribute
        event_id = event_element.get("publicID")
        
        # Find event magnitude element
        event_magnitude_element = event_element.find("./mw:magnitude/mw:mag/mw:value", namespaces=NSMAP)
        
        # Get magnitude value from event_magnitude_element
        event_magnitude = event_magnitude_element.text
        
        # Get event origin time element
        event_origin_time_element = event_element.find("./mw:origin/mw:time/mw:value", namespaces=NSMAP)
        
        # Get origin time value from event_origin_time_element
        event_origin_time = event_origin_time_element.text
        
        print(f"Event with ID: {event_id} happened on {event_origin_time} and had magnitude equal to: {event_magnitude}")
else:
    print("Something went wrong...")

And now with ObsPy...

In [None]:
from obspy import read_events

# Read events using the same URL
evts = read_events(events_url)

# Print retrieved events
for e in evts:
    print(e)

ℹ️ Users are free to choose how they download data from EIDA, but for convenience we will continue using ObsPy throughout the rest of this notebook. 🦥

# 3. Finding stations

In [None]:
from obspy import read_inventory

# Build URL using EIDA Federator and our defined parameters
stations_url = (
    f"http://eida-federator.ethz.ch/fdsnws/station/1/query?"
    f"start={start}&end={end}&"
    f"minlat={min_lat}&minlon={min_lon}&"
    f"maxlat={max_lat}&maxlon={max_lon}"
)

inv = read_inventory(stations_url)

print(inv)

Let's try to visualize our event and surrounding stations. Please notice that stations in this area are hosted by various EIDA nodes:
- NOA
- NIEP
- GFZ
- INGV
- ODC

![](img/eventAndStations.png)

⚠️ This is a static image and will not change after adjusting the input parameters...

# 4. Filter stations which do not meet our quality criteria
We want to download data from stations which have:
1. At least 95% data availability on the day of the event
1. Maximum of 5 gaps
1. Sum of gaps lower than 50 seconds
1. No overlaps

First, lets see what default EIDAWS-WFCatalog response offers...

http://eida-federator.ethz.ch/eidaws/wfcatalog/1/query?start=2020-01-28T20:10:10.670309&end=2020-01-28T20:30:10.670309&network=HL&station=KZN&channel=HHZ&include=all

```json
[{
    "version": "1.0.0",
    "producer": {
        "name": "EIDA NODE",
        "agent": "ObsPy mSEED-QC",
        "created": "2020-01-29T08:39:49.718Z"
    },
    "station": "KZN",
    "network": "HL",
    "location": "",
    "channel": "HHZ",
    "num_gaps": 3,
    "num_overlaps": 0,
    "sum_gaps": 5,
    "sum_overlaps": 0,
    "max_gap": 2,
    "max_overlap": null,
    "record_length": [512],
    "sample_rate": [100],
    "percent_availability": 99.99421296296296,
    "encoding": ["STEIM2"],
    "num_records": 16260,
    "start_time": "2020-01-28T00:00:00.000Z",
    "end_time": "2020-01-29T00:00:00.000Z",
    "format": "miniSEED",
    "quality": "D",
    "sample_min": -21263,
    "sample_max": 25253,
    "sample_mean": 77.38187557150297,
    "sample_median": 79,
    "sample_stdev": 1157.057209432005,
    "sample_rms": 1159.6419018669262,
    "sample_lower_quartile": 37,
    "sample_upper_quartile": 121,
    "miniseed_header_percentages": {
        "timing_quality_mean": 10,
        "timing_quality_median": 10,
        "timing_quality_lower_quartile": 10,
        "timing_quality_upper_quartile": 10,
        "timing_quality_min": 10,
        "timing_quality_max": 10,
        "timing_correction": 0,
        "io_and_clock_flags": {
            "short_record_read": 0,
            "station_volume": 0,
            "start_time_series": 0,
            "end_time_series": 0,
            "clock_locked": 0
        },
        "data_quality_flags": {
            "amplifier_saturation": 0,
            "digitizer_clipping": 0,
            "spikes": 0,
            "glitches": 0,
            "missing_padded_data": 0,
            "telemetry_sync_error": 0,
            "digital_filter_charging": 0,
            "suspect_time_tag": 0
        },
        "activity_flags": {
            "calibration_signal": 0,
            "time_correction_applied": 0,
            "event_begin": 0,
            "event_end": 0,
            "positive_leap": 0,
            "negative_leap": 0
        }
    }
}]
```

In [None]:
import json
from datetime import datetime, timedelta

In [None]:
# Convert string representation of origin time to datetime object
dt_origin = datetime.strptime(event_origin_time, "%Y-%m-%dT%H:%M:%S.%fZ")

# Define relative time windows using event origin time
dt_start = dt_origin - timedelta(minutes=5)
dt_end = dt_origin + timedelta(minutes=15)

# Get ISO8601 representation of our time window
dt_start_iso = dt_start.isoformat()
dt_end_iso = dt_end.isoformat()

In [None]:
# Channels we are interested in
# channels will be downloaded only if all specified components are present and validated
required_channels = {
    "BH?": ["BHE", "BHN", "BHZ"],
    "HH?": ["HHE", "HHN", "HHZ"],
    "LH?": ["LHE", "LHN", "LHZ"],
}

In [None]:
def validate_station(string):
    """Our validating function which takes WFCatalog response json and returns
    list of network/station/channels for which all components are
    present and validated against our criteria.

    Args:
        string (string): JSON response from EIDAWS-WFCatalog web service

    Returns:
        []: List of validated networks, stations and channels
        None: If no channels have been found, function returns `None`
    """
    # Parse string to json object
    j = json.loads(string)
    
    # Define channels_found as set to allow 
    channels_found = set()
    channels_validated = []

    for cha in j:
        # Get network station channel identifiers
        network_code = cha["network"]
        station_code = cha["station"]
        channel_code = cha["channel"]
        
        # Get the quality metrics
        availability = int(cha["percent_availability"])
        gaps = int(cha["num_gaps"])
        sum_gaps = int(cha["sum_gaps"])
        overlaps = int(cha["num_overlaps"])
        
        msg = f"{network_code}.{station_code}.{channel_code}: {availability}% coverage, {gaps} gaps ({sum_gaps}s), {overlaps} overlaps."
        if availability < 95 or gaps > 5 or sum_gaps > 50 or overlaps > 0:
            print(f"NOK: {msg}")
        else:
            print(f"OK: {msg}")
            channels_found.add(channel_code.upper())

    # If channel components are present and validated, add them to the channels_validated list
    for c in required_channels.keys():
        if all(e in channels_found for e in required_channels[c]):
            channels_validated.append(c)

    if len(channels_validated) > 0:
        return ",".join(channels_validated)
    else:
        return None

In [None]:
# List of verified stations, will be filled later
validated_stations = []

# Loop through all networks and stations found in the bounding box surrounding our event
for net in inv:
    for sta in net:
        # Build WFCatalog URL (POST method is not avaiable via Federator)
        wfcatalog_url = (
            f"http://eida-federator.ethz.ch/eidaws/wfcatalog/1/query?"
            f"start={dt_start_iso}&end={dt_end_iso}&"
            f"network={net.code}&station={sta.code}"
        )

        # Request the data...
        r = requests.get(wfcatalog_url, timeout=10)
        r.encoding = ENCODING
        
        if r.status_code == 200:
            # Validate station
            cha = validate_station(r.text)
            if cha:
                # Validation passed, add to verified_stations list
                validated_stations.append([net.code, sta.code, cha])
        else:
            # print(f"No data for {net.code}.{sta.code}")
            pass

print("Done!")

In [None]:
# Import pprint to improve readability
import pprint

# Let's print our validated stations
pp = pprint.PrettyPrinter()
pp.pprint(validated_stations)

# 5. Download data for verified stations

In [None]:
from obspy import read

# Loop through our validated stations, build URL, retrieve and read the data
for s in validated_stations:
    dataselect_url = (
        f"http://eida-federator.ethz.ch/fdsnws/dataselect/1/query?"
        f"start={dt_start_iso}&end={dt_end_iso}&"
        f"network={s[0]}&station={s[1]}&"
        f"channel={s[2]}"
    )
    
    # Print URL used to retrieve given dataset
    print(dataselect_url)
    
    # Request and load the data
    st = read(dataselect_url)
    
    # Print waveform metadata and plot the waveforms
    print(st)
    st.plot()

# 6. Download metadata for verified stations

In [None]:
from obspy import read_inventory

# Loop through our validated stations, build URL, retrieve and read the metadata
for s in validated_stations:
    station_url = (
        f"http://eida-federator.ethz.ch/fdsnws/station/1/query?"
        f"level=channel&"
        f"start={dt_start_iso}&end={dt_end_iso}&"
        f"network={s[0]}&station={s[1]}&"
        f"channel={s[2]}"
    )
    
    # Print URL used to retrieve the metadata
    print(station_url)
    
    # Request and load the metadata
    inv = read_inventory(station_url)
    
    # Print the metadata
    print(inv)

# 7. Quality tools on the orfeus-eu.org
Overview of the EIDA Quality Tools: https://www.orfeus-eu.org/data/eida/quality/

## 7.1. Availability
Availability information can be easily previewed using web tool accessible via https://www.orfeus-eu.org/data/eida/quality/availability/:
![](img/availability.png)

## 7.2. Data Quality Inspector
Data quality inspector rendering advanced waveform metrics is avaiable via https://www.orfeus-eu.org/data/eida/quality/metrics/.

Following statistical parameters can be selected and rendered:
* Quadratic mean
* Standard deviation
* Minimum
* Maximum
* Availability
* Gaps
* Sum of gaps
* Overlaps
* Sum of overlaps
* Median
* Mean
* Lower quartile
* Upper quartile

![](img/inspector.png)