# Getting Seismic Data

This section details how to get seismic data from EarthScope. This section requires a working knowledge of Python. You should have an understanding of data types and string formatting, creating file paths and file naming, web requests, and creating Python functions. In addition to Python, you should have an understanding of miniSEED data distributed through the International Federation of Digital Seismograph Networks (FDSN). 

The notebook material is informational and is useful for completing the seismic and geodetic exercises.

## Getting Seismic Data from EarthScope

Seismic data is available through third party packages such as [obspy](https://docs.obspy.org/) or through EarthScope's `dataselect` web service.

![](images/Web_Services_Data_Flow.png)

Currently, the simplest way to access EarthScope's seismic data is to use a third party package such as obspy. If you want to work directly with miniSEED files, EarthScope's dataselect service supports selecting sorting event data and returns data in the miniSEED format. 

![](images/cloud_native_data_access.png)

In the future, data will be available from EarthScopes cloud services hosted in Amazon's S3 storage service. We provide an overview of how data is stored in S3 and an example of how to access data from a public S3 bucket. 

### Getting Data from Obspy and Third-Party Packages

Obspy is a Python framework (or package) for processing seismic data. It provides parsers for common file formats, clients to access data centers and seismological signal processing routines which allow the manipulation of seismological time series.

We'll start by importing modules from the obspy package. It's not necessary to import the entire package, just the function from the module, e.g., obspy.clients.fdsn. Next we create a client that connects to EarthScope's services and send a query based on start and end time, and the minimum magnitude of an event.

```python
from obspy.clients.fdsn import Client
from obspy.clients.fdsn.header import URL_MAPPINGS
from obspy import UTCDateTime
import warnings
import cartopy

warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

# creates a client that connects to the IRIS data center
client = Client("IRIS")

starttime = UTCDateTime("2020-01-01")
endtime = UTCDateTime("2025-12-31")
catalog = client.get_events(starttime=starttime, endtime=endtime, minmagnitude=7)
```

The data is returned in a catalog, which is a list-like container for events. To learn more about working with obspy, see the documentation and tutorials on the obspy [web site](chttps://docs.obspy.org/) 


### Getting Seismic Data from DataSelect

The `fdsnws-dataselect` service provides access to time series data for specified channels and time ranges. Dataselect implements the [FDSN web service specification](https://www.fdsn.org/webservices/).

Data queries use SEED time series identifiers (network, station, location & channel) in addition to time ranges. Returned data formats include miniSEED, SAC zip, and GeoCSV.

To create a request, the Dataselect API takes these parameters at a minimum:

| parameters | examples | discussion | default |type |
| ---------- | -------- | ---------- | ------- |-----|
| start[time] |	2010-02-27T06:30:00	| Specifies the desired start-time for miniSEED data | | day/time |
| end[time]	| 2010-02-27T10:30:00 | Specify the end-time for the miniSEED data | | day/time | 
|net[work] | IU | Select one or more network codes. Accepts wildcards and lists. Can be SEED codes or data center-defined codes. | any | string |
| sta[tion] | ANMO | Select one or more SEED station codes. Accepts wildcards and lists. | any| string |
|loc[ation] |00 | Select one or more SEED location identifiers. Accepts wildcards and lists. Use -- for “Blank” location IDs (ID’s containing 2 spaces). | any | string |
| cha[nnel] | BHZ | Select one or more SEED channel codes. Accepts wildcards and lists. | any | string |

To download a file, we can use the `requests` package to send the HTTP request to the dataselect web service. As discussed in the previous section, the request must include an authorization token using the `get_token` function.

The `download_data` function requires the query parameters required by the FDSN Web Service specification, and where to write the data. The function does several things. First, it requests an authorization token. Next, it creates a file name for the data. Finally, it sends the request to dataselect and writes the data to a file in a directory.

In [None]:
import requests, os
from pathlib import Path
from datetime import datetime
from earthscope_sdk import EarthScopeClient

# SAGE archive
URL = "http://service.iris.edu/fdsnws/dataselect/1/query?"

# function to get authorization token 
def get_token():
    
    # refresh the token if it has expired
    client.ctx.auth_flow.refresh_if_necessary()

    token = client.ctx.auth_flow.access_token
    
    return token

def download_data(params, data_directory):

    # get authorization Bearer token
    token = get_token()

    # get year and day from string start time
    start_date = datetime.strptime(params['start'], '%Y-%m-%dT%H:%M:%S')
    year = start_date.year
    day = start_date.day
    
    
    # file name format: STATION.NETWORK.YEAR.DAYOFYEAR
    file_name = ".".join([params["sta"], params["net"],params['loc'],params['cha'], str(year), "{:03d}".format(day),'mseed'])
    
    
    r = requests.get(URL, params=params, headers={"authorization": f"Bearer {token}"}, stream=True)
    if r.status_code == requests.codes.ok:
        # save the file
        with open(Path(Path(data_directory) / file_name), 'wb') as f:
            for data in r:
                f.write(data)
    else:
        #problem occured
        print(f"failure: {r.status_code}, {r.reason}")
        

# create client to get token
client = EarthScopeClient()

# create directory for data
data_directory = "./miniseed_data"
os.makedirs(data_directory, exist_ok=True)

# parameters specifying the miniSEED file
params = {"net" : 'IU',
          "sta" : 'ANMO',
          "loc" : '00',
          "cha" : 'BHZ',
          "start": '2010-02-27T06:30:00',
          "end": '2010-02-27T10:30:00'}

download_data(params, data_directory)


### Seismic Data in AWS S3

Object storage in the cloud is a cost-effective way to hold and distribute very large collections of data. Objects consist of the data, metadata, and a unique identifier and are accessible through an application programming interface, or API. EarthScope uses Amazon Web Services' (AWS) Simple Storage Service, or S3, to store and distribute seismic and geodetic data.

AWS S3 supports streaming data directly into memory. Streaming data is a significant advantage when analyzing large amounts of data because writing and reading data to and from a drive consumes the majority of time when performing an analysis. When data is streamed directly into memory, it is immediately available for processing.

#### Buckets and Keys

Objects in S3 are stored in containers called `buckets`. Each object is identified by a unique object identifier, or `keys`. Objects are accessed using a combination of the web service endpoint, a bucket name, and a key. The combination is called an ARN or an Amazon Resource Name. Unlike a hierarchical file system on your computer, S3 doesn't have directories; instead, it has prefixes, which act as filters that logically group data. Consider the following example, we can decipher the key:

> s3:ncedc-pds/continuous_waveforms/BK/2022/2022.231/MERC.BK.HNZ.00.D.2022.231

- s3 - service name
- ncedc-pds - bucket name
- continuous_waveforms - prefix
- BK - (prefix) seismic network name 
- 2022 - (prefix) year 
- 2022.231 - (prefix) year and day of year
- MERC.BK.HNZ.00.D.2022.231 - (key) station.network.channel.location.year.day of year

Similar to a web service file URL, the ARN is used to request data.

### S3 Buckets with Public Read Access

S3 buckets can be configured for public read access, and you can access objects without providing credentials. The [`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) Python package provides libraries for working with AWS services, including S3. Boto3 provides two methods for interacting with AWS services. The `client` method is a low-level and fine-grained interface that closely follows the AWS API for a service. The 'resource` method is a high-level interface that wraps the 'client` interface. AWS stopped development on the resource interface in `boto3` in 2023; for this reason, we will use the `client` interface when working with S3 resources.

The following example reads a miniSEED file from the Northern California Earthquake Data Center (NCEDC). The trace data is for the 2014 Napa earthquake. GeoLab's default environment includes both `boto3` and `obspy` packages, and we can import them without installation. We establish the connection to S3 by creating a client that specifies that requests are unsigned. This means that the S3 bucket allows public access and does not require credentials. The client calls the `get_object` method with the bucket name and key for the miniSEED object. 

Instead of writing the data to a file (as we did using web services), the S3 client sends it to an in-memory binary stream that can be read by [`obspy`](https://docs.obspy.org/). Streaming the data to in-memory objects is more efficient than downloading and reading files for analysis.

In [None]:
import boto3
from botocore import UNSIGNED
from botocore.config import Config
# from io import BytesIO
import io
from obspy import read

s3 = boto3.client('s3', config = Config(signature_version = UNSIGNED), region_name='us-west-2')

BUCKET_NAME = 'ncedc-pds'
KEY = 'continuous_waveforms/BK/2014/2014.236/PACP.BK.HHN.00.D.2014.236'

response = s3.get_object(Bucket=BUCKET_NAME, Key=KEY)
data_stream = io.BytesIO(response['Body'].read())

# Parse with ObsPy
st = read(data_stream)

# Print the ObsPy Streams
print(st)

## Summary

There are currently two options to get seismic data from EarthScope. You can use obspy or another third-party package, or use the dataselect service to select and sort event data in miniSEED. In the future, EarthScope will provide access to seismic data through AWS cloud services.

## [< Previous](./3_authorization.ipynb)&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;[Next >](./5_seismic_exercise.ipynb)