## Purpose

This notebook explores using the pystac & pystac-client libraries to traverse and extract information from the ASW STAC catalog for USGS 3DEP point clouds.

In [1]:
from pystac_client import Client

api = Client.open("https://usgs-lidar-stac.s3-us-west-2.amazonaws.com/ept/catalog.json")

In [2]:
item_links = api.get_item_links()
print(f"Items in catalog: {len(item_links)}")

Items in catalog: 1987


## Issue - STAC Catalog does not implement any helper utilities

Cannot search by name, spatial extent, content, anything..

In [3]:
from pystac_client import ConformanceClasses

for cc in ConformanceClasses:
    print(
        "Conforms to %s: %s"
        % (
            cc,
            api._conforms_to(cc),
        )
    )

Conforms to ConformanceClasses.stac_prefix: False
Conforms to ConformanceClasses.CORE: False
Conforms to ConformanceClasses.COLLECTIONS: False
Conforms to ConformanceClasses.FEATURES: False
Conforms to ConformanceClasses.ITEM_SEARCH: False
Conforms to ConformanceClasses.CONTEXT: False
Conforms to ConformanceClasses.FIELDS: False
Conforms to ConformanceClasses.SORT: False
Conforms to ConformanceClasses.QUERY: False
Conforms to ConformanceClasses.FILTER: False


## Implementing a href contains search

Example item relation from the top level catalog:
```javascript
{
    "rel": "item", 
    "href": "https://s3-us-west-2.amazonaws.com/usgs-lidar-stac/ept/MN_RainyLake_1_2020.json"
}
```

The workunit (`MN_RainyLake_1_2020` in this case) is used to form the item href. We can use a `str in str` conditional to filter the hrefs without calling them. This effectively prefilters the items before constructing the Item objects. 

The alternative would be to filter based on information of the Item objects returned by the `Client.get_all_items()` method. However, this will make a request to all ~2000 item urls and thus takes 2-3 minutes to run.

In [4]:
from pystac_client import Client
from pystac import Item


def get_ept_stac_item(stac_catalog_url: str, workunit: str) -> list[Item]:
    """Uses a `workunit in stac_catalog_url` conditional to filter item hrefs
    then constructs a list of `Item` objects that match the filter."""
    client = Client.open(stac_catalog_url)
    item_links = [link for link in client.get_item_links() if workunit in link.href]
    return [Item.from_file(item_link.href) for item_link in item_links]

In [5]:
stac_catalog_url = "https://usgs-lidar-stac.s3-us-west-2.amazonaws.com/ept/catalog.json"
workunit = "MN_RainyLake_1_2020"
stac_item = get_ept_stac_item(stac_catalog_url, workunit)[0]
stac_item

0
id: MN_RainyLake_1_2020
"bbox: [-92.82823301030858, 47.31443916205206, -89.45955069569851, 48.58574228169277]"
description: A USGS Lidar pointcloud in Entwine/EPT format
pc:count: 355339247767
pc:type: lidar
pc:encoding: ept
"pc:schemas: [{'name': 'X', 'offset': -10145671, 'scale': 0.001, 'size': 4, 'type': 'signed'}, {'name': 'Y', 'offset': 6099307, 'scale': 0.001, 'size': 4, 'type': 'signed'}, {'name': 'Z', 'offset': 763, 'scale': 0.001, 'size': 4, 'type': 'signed'}, {'name': 'Intensity', 'size': 2, 'type': 'unsigned'}, {'name': 'ReturnNumber', 'size': 1, 'type': 'unsigned'}, {'name': 'NumberOfReturns', 'size': 1, 'type': 'unsigned'}, {'name': 'ScanDirectionFlag', 'size': 1, 'type': 'unsigned'}, {'name': 'EdgeOfFlightLine', 'size': 1, 'type': 'unsigned'}, {'name': 'Classification', 'size': 1, 'type': 'unsigned'}, {'name': 'ScanAngleRank', 'size': 4, 'type': 'floating'}, {'name': 'UserData', 'size': 1, 'type': 'unsigned'}, {'name': 'PointSourceId', 'size': 2, 'type': 'unsigned'}, {'name': 'GpsTime', 'size': 8, 'type': 'floating'}, {'name': 'ScanChannel', 'size': 1, 'type': 'unsigned'}, {'name': 'ClassFlags', 'size': 1, 'type': 'unsigned'}]"
proj:epsg: 3857
"proj:projjson: {'$schema': 'https://proj.org/schemas/v0.6/projjson.schema.json', 'type': 'ProjectedCRS', 'name': 'WGS 84 / Pseudo-Mercator', 'base_crs': {'name': 'WGS 84', 'datum_ensemble': {'name': 'World Geodetic System 1984 ensemble', 'members': [{'name': 'World Geodetic System 1984 (Transit)', 'id': {'authority': 'EPSG', 'code': 1166}}, {'name': 'World Geodetic System 1984 (G730)', 'id': {'authority': 'EPSG', 'code': 1152}}, {'name': 'World Geodetic System 1984 (G873)', 'id': {'authority': 'EPSG', 'code': 1153}}, {'name': 'World Geodetic System 1984 (G1150)', 'id': {'authority': 'EPSG', 'code': 1154}}, {'name': 'World Geodetic System 1984 (G1674)', 'id': {'authority': 'EPSG', 'code': 1155}}, {'name': 'World Geodetic System 1984 (G1762)', 'id': {'authority': 'EPSG', 'code': 1156}}, {'name': 'World Geodetic System 1984 (G2139)', 'id': {'authority': 'EPSG', 'code': 1309}}], 'ellipsoid': {'name': 'WGS 84', 'semi_major_axis': 6378137, 'inverse_flattening': 298.257223563}, 'accuracy': '2.0', 'id': {'authority': 'EPSG', 'code': 6326}}, 'coordinate_system': {'subtype': 'ellipsoidal', 'axis': [{'name': 'Geodetic latitude', 'abbreviation': 'Lat', 'direction': 'north', 'unit': 'degree'}, {'name': 'Geodetic longitude', 'abbreviation': 'Lon', 'direction': 'east', 'unit': 'degree'}]}, 'id': {'authority': 'EPSG', 'code': 4326}}, 'conversion': {'name': 'Popular Visualisation Pseudo-Mercator', 'method': {'name': 'Popular Visualisation Pseudo Mercator', 'id': {'authority': 'EPSG', 'code': 1024}}, 'parameters': [{'name': 'Latitude of natural origin', 'value': 0, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8801}}, {'name': 'Longitude of natural origin', 'value': 0, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8802}}, {'name': 'False easting', 'value': 0, 'unit': 'metre', 'id': {'authority': 'EPSG', 'code': 8806}}, {'name': 'False northing', 'value': 0, 'unit': 'metre', 'id': {'authority': 'EPSG', 'code': 8807}}]}, 'coordinate_system': {'subtype': 'Cartesian', 'axis': [{'name': 'Easting', 'abbreviation': 'X', 'direction': 'east', 'unit': 'metre'}, {'name': 'Northing', 'abbreviation': 'Y', 'direction': 'north', 'unit': 'metre'}]}, 'scope': 'Web mapping and visualisation.', 'area': 'World between 85.06°S and 85.06°N.', 'bbox': {'south_latitude': -85.06, 'west_longitude': -180, 'north_latitude': 85.06, 'east_longitude': 180}, 'id': {'authority': 'EPSG', 'code': 3857}}"
datetime: 2023-05-24T16:01:30.033432Z

0
https://stac-extensions.github.io/pointcloud/v1.0.0/schema.json
https://stac-extensions.github.io/projection/v1.1.0/schema.json

0
href: https://s3-us-west-2.amazonaws.com/usgs-lidar-public/MN_RainyLake_1_2020/ept.json
title: entwine
description: The ept.json for accessing data
owner: MN_RainyLake_1_2020

0
rel: self
href: https://s3-us-west-2.amazonaws.com/usgs-lidar-stac/ept/MN_RainyLake_1_2020.json
type: application/json

0
rel: parent
href: https://s3-us-west-2.amazonaws.com/usgs-lidar-stac/ept/catalog.json


In [20]:
from pyproj import CRS

crs = CRS.from_epsg(stac_item.properties["proj:epsg"])
crs

<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [26]:
crs.to_epsg()

3857

In [9]:
epsg = stac_item.properties["proj:epsg"]
print(f"Value: {epsg} Type: {type(epsg)}")

Value: 3857 Type: <class 'int'>


In [18]:
ept_json_url = stac_item.assets["ept.json"].href
print(f"Value: {ept_json_url} Type: {type(ept_json_url)}")

Value: https://s3-us-west-2.amazonaws.com/usgs-lidar-public/MN_RainyLake_1_2020/ept.json Type: <class 'str'>


## Plotting

The item is a geojson feature, so we can plot it using geopandas.

In [6]:
import geopandas
from geopandas import GeoDataFrame


gdf: GeoDataFrame = geopandas.read_file(stac_item.self_href, parse_dates=True)
# The gdf.explore() will raise an exception because of the datetime column
# It isn't needed for this exploration, so we just drop it
gdf.drop(columns=["datetime"], inplace=True)
gdf

Unnamed: 0,id,description,pc:count,pc:type,pc:encoding,pc:schemas,proj:epsg,proj:projjson,geometry
0,MN_RainyLake_1_2020,A USGS Lidar pointcloud in Entwine/EPT format,355339247767,lidar,ept,"[ { ""name"": ""X"", ""offset"": -10145671, ""scale"":...",3857,{'$schema': 'https://proj.org/schemas/v0.6/pro...,"MULTIPOLYGON (((-91.25147 47.31435, -91.19757 ..."


In [7]:
gdf.filter(["id", "proj:epsg", "pc:count", "geometry"], axis="columns").explore()

## Conculsion

I'm unsure if including `pystac-client` as a dependency is worthwhile since the STAC Catalog of interest doesn't implement any of the features that would make it really helpful. I can extract the minimal information I need using the `requests` and `json` libraries instead.

Maybe there is a use in a make_ept_index intermim data process that simplifies adding tile_index data for a workunit.