# Getting Data from Web Services

This section details how to get seismic data from EarthScope web services. This section requires a working knowledge of Python. You should have an understanding of data types and string formatting, creating file paths and file naming, web requests, and creating Python functions. In addition to Python, you should have an understanding of miniSEED data distributed through the International Federation of Digital Seismograph Networks (FDSN). 

The notebook material is informational and is useful for completing the [**web service exercise**](./5_web_services_exercise.ipynb).

![](images/Web_Services_Data_Flow.png)

## Getting Seismic Data from SAGE Web Services

The `fdsnws-dataselect` service provides access to time series data for specified channels and time ranges. Dataselect implements the [FDSN web service specification](https://www.fdsn.org/webservices/).

Data queries use SEED time series identifiers (network, station, location & channel) in addition to time ranges. Returned data formats include miniSEED, SAC zip, and GeoCSV.

To create a request, the Dataselect API takes these parameters at a minimum:

| parameters | examples | discussion | default |type |
| ---------- | -------- | ---------- | ------- |-----|
| start[time] |	2010-02-27T06:30:00	| Specifies the desired start-time for miniSEED data | | day/time |
| end[time]	| 2010-02-27T10:30:00 | Specify the end-time for the miniSEED data | | day/time | 
|net[work] | IU | Select one or more network codes. Accepts wildcards and lists. Can be SEED codes or data center-defined codes. | any | string |
| sta[tion] | ANMO | Select one or more SEED station codes. Accepts wildcards and lists. | any| string |
|loc[ation] |00 | Select one or more SEED location identifiers. Accepts wildcards and lists. Use -- for “Blank” location IDs (ID’s containing 2 spaces). | any | string |
| cha[nnel] | BHZ | Select one or more SEED channel codes. Accepts wildcards and lists. | any | string |

To download a file, we can use the `requests` package to send the HTTP request to the dataselect web service. As discussed in the previous section, the request must include an authorization token using the `get_token` function.

The `download_data` function requires the query parameters required by the FDSN Web Service specification, and where to write the data. The function does several things. First, it requests an authorization token. Next, it creates a file name for the data. Finally, it sends the request to dataselect and writes the data to a file in a directory.

In [None]:
import requests, os
from pathlib import Path
from datetime import datetime
from earthscope_sdk import EarthScopeClient

# SAGE archive
URL = "http://service.iris.edu/fdsnws/dataselect/1/query?"

# function to get authorization token 
def get_token():
    
    # refresh the token if it has expired
    client.ctx.auth_flow.refresh_if_necessary()

    token = client.ctx.auth_flow.access_token
    
    return token

def download_data(params, data_directory):

    # get authorization Bearer token
    token = get_token()

    # get year and day from string start time
    start_date = datetime.strptime(params['start'], '%Y-%m-%dT%H:%M:%S')
    year = start_date.year
    day = start_date.day
    
    
    # file name format: STATION.NETWORK.YEAR.DAYOFYEAR
    file_name = ".".join([params["sta"], params["net"],params['loc'],params['cha'], str(year), "{:03d}".format(day),'mseed'])
    
    
    r = requests.get(URL, params=params, headers={"authorization": f"Bearer {token}"}, stream=True)
    if r.status_code == requests.codes.ok:
        # save the file
        with open(Path(Path(data_directory) / file_name), 'wb') as f:
            for data in r:
                f.write(data)
    else:
        #problem occured
        print(f"failure: {r.status_code}, {r.reason}")
        

# create client to get token
client = EarthScopeClient()

# create directory for data
data_directory = "./miniseed_data"
os.makedirs(data_directory, exist_ok=True)

# parameters specifying the miniSEED file
params = {"net" : 'IU',
          "sta" : 'ANMO',
          "loc" : '00',
          "cha" : 'BHZ',
          "start": '2010-02-27T06:30:00',
          "end": '2010-02-27T10:30:00'}

download_data(params, data_directory)


## Getting Geodetic Data

This section details how to get geodetic data from EarthScope services. This section requires a working knowledge of Python. You should have an understanding of data types and string formatting, creating file paths and file naming, web requests, and creating Python functions. In addition to Python, you should have an understanding of GNSS data types and formats. 

The notebook material is informational and is useful for completing the [**web service exercise**](./5_web_services_exercise.ipynb).

## Getting Geodetic Data from GAGE Web Services

The GAGE archive holds many types of data ranging from GPS/GNSS data to borehole strain data. We will focus on GPS/GNSS data. Each type of data has API interfaces specific to the data. Unlike dataselect, the API calls return information about data or processed data. The collected data is distributed by a file server and can be programatically downloaded if you know the URL to the file.

In this example, we will download GNSS data in RINEX. GAGE data is located on a file server and data cab be downloaded with a properly formatted URL. The script downloads the stations by providing the parameters that make up the URL to the data. 

### Downloading RINEX files

The GAGE base URL for gnss data in RINEX is `https://gage-data.earthscope.org/archive/gnss/rinex/obs/`. 

Files are organized by year and the day of the year, e.g., `/2025/001/`. File names use this pattern: 

| station | day of year | 0. | two digit year | o.Z or d.Z |
|---------|-------------|----|----------------|-----|
| p034 | 001 |0. | 25 | d.Z |
| p034 | 001 |0. | 25 | o.Z |

The complete URL for this RINEX file:

`https://gage-data.earthscope.org/archive/gnss/rinex/obs/2025/001/p0340010.25d.Z`

> Note: files ending with `d.Z` are [hatanaka compressed files](https://www.unavco.org/data/gps-gnss/hatanaka/hatanaka.html) and files ending with `o.Z` are not hatanaka compressed. Hatanaka compressed files are much smaller but require software to read the data.

The same method for downloading SAGE data can be used to download GAGE data once URL is properly constructed.

In [None]:
import requests, os
from pathlib import Path
from earthscope_sdk import EarthScopeClient

client = EarthScopeClient()

BASE_URL= 'https://gage-data.earthscope.org/archive/gnss/rinex/obs/'

# function to get authorization token 
def get_token():
    
    # refresh the token if it has expired
    client.ctx.auth_flow.refresh_if_necessary()

    token = client.ctx.auth_flow.access_token
    
    return token

# function to download data from GAGE archive
def download_file(url, data_directory):
    
    # get authorization Bearer token
    token = get_token()

    # the pathlib package (https://docs.python.org/3/library/pathlib.html#accessing-individual-parts) 
    # supports extracting the file name from the end of a path
    file_name = Path(url).name
    
    # request a file and provide the token in the Authorization header
    r = requests.get(url, headers={"authorization": f"Bearer {token}"}, stream=True)
    if r.status_code == requests.codes.ok:
        # save the file
        with open(Path(Path(data_directory) / file_name), 'wb') as f:
            for data in r:
                f.write(data)
    else:
        #problem occured
        print(f"failure: {r.status_code}, {r.reason}")

# function to creat URL to download data
def create_url(year, day, station, compression):
    # using Python string formatting and slicing
    doy = "{:03d}".format(day) # converts day to a three character zero padded string , '001'
    two_digit_year = str(year)[2:] # converts integer to string and slices the last characters

    # using the Python join method to concatenate an array or list of strings
    file_path = '/'.join([str(year), doy]) # integer year converted to string for string join
    file_name = ''.join(['/', station, doy, '0.', two_digit_year, compression])
    url = ''.join([BASE_URL,file_path,file_name])

    return url

# create a directory for rinex data
directory_path = "./rinex_data"
os.makedirs(directory_path, exist_ok=True)

# data requested from station p034 on January 1, 2025 hatanaka compressed
year = 2025
day = 1
station = 'p034'
compression = 'd.Z'

# download the RINEX file
url = create_url(year, day, station, compression)
download_file(url, directory_path)

In this example, we’ve added a function to create a URL to the data. The URL can be constructed more succinctly. However, the example demonstrates how to format parameters using Python string functions and how to join strings to form a URL.
A more succinct method for building the URL would be to use string formatting and add the following code to the download_file function, along with the required parameters. However, this is less explicit and illustrative.

```python
doy = '{%03d}'.format(day)
two_digit_year = str(year)[2:]
url='https://gage-data.earthscope.org/archive/gnss/rinex/obs/{}/{}/{}{}.{}d.Z'.format(year,doy,station,doy,two_digit_year)
```

## Summary

Web services are a convenient way to download data files in known formats. However, downloading incurs a penalty because the file is written locally and must be read for processing and analysis.

## [< Previous](./3_authorization.ipynb)&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;[Next >](./5_web_services_exercise.ipynb)