# Getting Geodetic Data

This section details how to get geodetic data from EarthScope and requires a working knowledge of Python. You should have an understanding of data types and string formatting, creating file paths and file naming, web requests, and creating Python functions. In addition to Python, you should have an understanding of GNSS data. 

The notebook material is informational and is useful for completing the seismic and geodetic exercises.

## Getting Geodetic Data from EarthScope

![](images/cloud_native_data_access.png)

The recommended method to access EarthScope's geodetic data is to use the [EarthScope SDK](). The SDK supports requesting just the GNSS observations data you need instead of a single RINEX file. It reduces the size of data and returns the data in Apache [arrow](https://arrow.apache.org/), which is a memory efficient format that can used by analytic data formats such as Pandas [dataframes](https://pandas.pydata.org/). In the future other GNSS data types will be added to the SDK.

![](images/Web_Services_Data_Flow.png)

In addition to GNSS observations, EarthScope distributes RINEX files and other GNSS data such as navigation and meteorological data. These data can be downloaded as files in their respective formats. An example for downloading files is provided below.

## Getting RINEX Observations in Apache Arrow from EarthScope

Apache Arrow is a memory based data format and framework that makes large-scale data analysis faster, more efficient, and compatible across different tools. Arrow is a columnar format, which means that data is held in columns instead of rows. File formats, such as RINEX, are row-based formats. This example compares calculating the average L1 signal strength between a RINEX file and an Arrow response.

**Row-based format (RINEX)**: 
>Each observation epoch has: (timestamp, satellite, obs_code, range, phase, snr, slip, flags, fcn, system, igs). To get snr, you would need to read every row and skip the other details, such as phase, slip, system, etc.
 
**Columnar format (Arrow)**: 
>Data is stored by columns: one column for snr, satellite, phase, etc. Since we're only interested in SNR, you can proceed directly to the SNR column without reviewing the other data.

Another feature of arrow is that it is a memory format that supports zero-copy sharing. This means that other data formats, such as Pandas, xarray, and numpy, can use Arrow data directly in memory without translation. Analyzing data is faster and more efficient without the overhead of translation.

An important feature of arrow is its vectorized operations, which process many values at the same time. Filtering, aggregating, and joining data are done quickly and in memory. Using the previous example, arrow, can filter snr over a specified period of time and aggregate into 15-minute intervals, reducing the amount of data to transfer and analyze.

These features make arrow an ideal way to deliver large amounts of data efficiently.

### Requesting Data in Arrow

The EarthScope SDK supports requesting GNSS observations in arrow. An EarthScope client can request RINEX observations in arrow. Because arrow is an in-memory format, it's commonly converted to a Python data structure for analysis. 

[Pandas](https://pandas.pydata.org/) is popular package for working with tabular, or table, data. It enables loading, cleaning, exploring, transforming, and analyzing data efficiently. Arrow tables integrate with other analyis and visualization Python libraries and is commonly used for data exploration.

The following code demonstrates how to use the SDK to request data. As with other methods, an authorization token is needed to make the request. The EarthScope client checks for a token when making a request, which means that it isn't necessary to include it in the request. If you do not have a token, you can use the EarthScope CLI to get a token, see the [**Getting a Token to Access EarthScope Services**](./3_authorization.ipynb) notebook.

The client has a `data.gnss_observation` method for requesting GNSS observation data. The method takes a number of parameters but at minimum, the start and end datetime and station is required. 

Note that the request is for 30 hours of data with three hours of data before and after. If we used a web service request, we would have to download two files, extract the data for the time period from each file and join the two files to get the dataset. This method does that automatically and returns only the data you need.

Pandas can convert an arrow table directly into a [dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), which is a two-dimensional tabular data structure. The `df` displays the first and last five rows of the table.

In [4]:
import datetime as dt
import pandas as pd
from earthscope_sdk import EarthScopeClient

es = EarthScopeClient()

# Request 30 hours of data (1 day + 3 hour arcs on either side)
arrow_table = es.data.gnss_observations(
    start_datetime=dt.datetime(2025, 7, 20, 21),
    end_datetime=dt.datetime(2025, 7, 22, 3),
    station_name="AC60",
).fetch()

df = arrow_table.to_pandas()
df

ModuleNotFoundError: No module named 'pandas'

By specifying values for the columns, you can filter the data so it returns only the data you need. After the arrow table has been converted to a dataframe, you can apply functions, such as sort, to make it easier to work with.

In [None]:
arrow_table = es.data.gnss_observations(
    start_datetime=dt.datetime(2025, 7, 20),
    end_datetime=dt.datetime(2025, 9, 20),
    station_name="AC60",
    session_name="A",
    system="G",
    obs_code="1C",
    satellite="7",
    field="snr",
).fetch()
df = arrow_table.to_pandas()
df.sort_values(by="timestamp")
df

The `data.gnss_observations` method is the first of new functions that support cloud-native methods planned for the EarthScope SDK. These functions are more efficient than web services and can support large scale research efforts.

## Getting Geodetic Data from GAGE Web Services

The GAGE archive holds many types of data ranging from GPS/GNSS data to borehole strain data. We will focus on GPS/GNSS data. Each type of data has API interfaces specific to the data. Unlike dataselect, the API calls return information about data or processed data. The collected data is distributed by a file server and can be programatically downloaded if you know the URL to the file.

In this example, we will download GNSS data in RINEX. GAGE data is located on a file server and data cab be downloaded with a properly formatted URL. The script downloads the stations by providing the parameters that make up the URL to the data. 

### Downloading RINEX files

The GAGE base URL for gnss data in RINEX is `https://gage-data.earthscope.org/archive/gnss/rinex/obs/`. 

Files are organized by year and the day of the year, e.g., `/2025/001/`. File names use this pattern: 

| station | day of year | 0. | two digit year | o.Z or d.Z |
|---------|-------------|----|----------------|-----|
| p034 | 001 |0. | 25 | d.Z |
| p034 | 001 |0. | 25 | o.Z |

The complete URL for this RINEX file:

`https://gage-data.earthscope.org/archive/gnss/rinex/obs/2025/001/p0340010.25d.Z`

> Note: files ending with `d.Z` are [hatanaka compressed files](https://www.unavco.org/data/gps-gnss/hatanaka/hatanaka.html) and files ending with `o.Z` are not hatanaka compressed. Hatanaka compressed files are much smaller but require software to read the data.

The same method for downloading SAGE data can be used to download GAGE data once URL is properly constructed.

In [None]:
import requests, os
from pathlib import Path
from earthscope_sdk import EarthScopeClient

client = EarthScopeClient()

BASE_URL= 'https://gage-data.earthscope.org/archive/gnss/rinex/obs/'

# function to get authorization token 
def get_token():
    
    # refresh the token if it has expired
    client.ctx.auth_flow.refresh_if_necessary()

    token = client.ctx.auth_flow.access_token
    
    return token

# function to download data from GAGE archive
def download_file(url, data_directory):
    
    # get authorization Bearer token
    token = get_token()

    # the pathlib package (https://docs.python.org/3/library/pathlib.html#accessing-individual-parts) 
    # supports extracting the file name from the end of a path
    file_name = Path(url).name
    
    # request a file and provide the token in the Authorization header
    r = requests.get(url, headers={"authorization": f"Bearer {token}"}, stream=True)
    if r.status_code == requests.codes.ok:
        # save the file
        with open(Path(Path(data_directory) / file_name), 'wb') as f:
            for data in r:
                f.write(data)
    else:
        #problem occured
        print(f"failure: {r.status_code}, {r.reason}")

# function to creat URL to download data
def create_url(year, day, station, compression):
    # using Python string formatting and slicing
    doy = "{:03d}".format(day) # converts day to a three character zero padded string , '001'
    two_digit_year = str(year)[2:] # converts integer to string and slices the last characters

    # using the Python join method to concatenate an array or list of strings
    file_path = '/'.join([str(year), doy]) # integer year converted to string for string join
    file_name = ''.join(['/', station, doy, '0.', two_digit_year, compression])
    url = ''.join([BASE_URL,file_path,file_name])

    return url

# create a directory for rinex data
directory_path = "./rinex_data"
os.makedirs(directory_path, exist_ok=True)

# data requested from station p034 on January 1, 2025 hatanaka compressed
year = 2025
day = 1
station = 'p034'
compression = 'd.Z'

# download the RINEX file
url = create_url(year, day, station, compression)
download_file(url, directory_path)

In this example, we’ve added a function to create a URL to the data. The URL can be constructed more succinctly. However, the example demonstrates how to format parameters using Python string functions and how to join strings to form a URL.
A more succinct method for building the URL would be to use string formatting and add the following code to the download_file function, along with the required parameters. However, this is less explicit and illustrative.

```python
doy = '{%03d}'.format(day)
two_digit_year = str(year)[2:]
url='https://gage-data.earthscope.org/archive/gnss/rinex/obs/{}/{}/{}{}.{}d.Z'.format(year,doy,station,doy,two_digit_year)
```

## Summary

The EarthScope SDK supports selectively pulling GNSS observation data instead of pulling all the data from RINEX that covers the requested time period. The SDK returns the data in Apache Arrow, which is an efficient method for transporting data and converting it to a scientific computing data format such as Pandas dataframes. 

Other geodetic data are available by downloading their respective files.

## [< Previous](./5_seismic_exercises.ipynb)&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;[Next >](./7_geodetic_exercise.ipynb)