In this notebook, we will explore how to use the SpatioTemporal Asset Catalog (STAC) to organize and access a large collection of vector-based, multi-temporal data. This guide is designed to help you understand the process of querying, accessing, and visualizing STAC data.

We connect to the Microsoft Planetary Computer STAC catalog, which contains metadata about various spatial-temporal assets, including vector data. 

catalog: https://planetarycomputer.microsoft.com/dataset/ms-buildings#overview

In [1]:
# Import necessary libraries
import json
import pystac_client
import geopandas as gpd
import shapely
from shapely.geometry import mapping, GeometryCollection, shape
import dask.dataframe
import dask_geopandas
import dask.distributed

import matplotlib.pyplot as plt
import leafmap.foliumap as leafmap


#import contextily
#import mercantile

# Define the STAC API endpoint for Microsoft Planetary Computer
STAC_URL = 'https://planetarycomputer.microsoft.com/api/stac/v1'

# import the geojson that defines our area of interest
geojson_file_path = 'data/city-of-Melbourne/MunicipalBoundary.geojson'

# # Read the GeoJSON file
# gdf = gpd.read_file(geojson_file_path)

# bbox = gdf.geometry.bounds.iloc[0].tolist()


area_of_interest = shapely.geometry.box(14.11, 53.73, 14.13, 53.75)

##



with open(geojson_file_path) as f:
  features = json.load(f)["features"]

# NOTE: buffer(0) is a trick for fixing scenarios where polygons have overlapping coordinates 
geometry_collection = GeometryCollection([shape(feature["geometry"]).buffer(0) for feature in features])

geometries = [geom for geom in geometry_collection.geoms]
gdf = gpd.GeoDataFrame(geometry=geometries)


We will query the catalog for a dataset containing building footprints. This dataset is a vector-based representation of buildings, useful for urban planning and analysis.


In [2]:
# Connect to the STAC catalog
catalog = pystac_client.Client.open(STAC_URL)
collection = catalog.get_collection("ms-buildings")

# Optionally, explore the catalog
#print(list(client.get_collections()))


After retrieving the relevant items, we can load this vector data using `geopandas` and visualize the building footprints. 

From the documentation for this catalog: "The assets are a set of geoparquet files grouped by a processing date. Newer files (since April 25th, 2023) are stored in Delta Format. This is a layer on top of parquet files offering scalable metadata handling, which is useful for this dataset."

In [3]:
search = catalog.search(
    collections=["ms-buildings"],
    intersects=area_of_interest,
    query={
        "msbuildings:region": {"eq": "Germany"},
        "msbuildings:processing-date": {"eq": "2023-04-25"},
    },
)

ic = search.get_all_items()
len(ic)


2

In [4]:
prefixes = [item.assets["data"].href for item in ic]
print(prefixes[:5])

test_df = dask_geopandas.read_parquet(prefixes[0])
print(test_df.head())

['abfs://footprints/delta/2023-04-25/ml-buildings.parquet/RegionName=Germany/quadkey=120210302', 'abfs://footprints/delta/2023-04-25/ml-buildings.parquet/RegionName=Germany/quadkey=120210300']


ValueError: An error occurred while calling the read_parquet method registered to the pandas backend.
Original Message: unable to connect to account for Must provide either a connection_string or account_name with credentials!!

In [None]:

# df = dask.dataframe.concat(
#     [
#         dask_geopandas.read_parquet(prefix)
#         for prefix in prefixes
#     ]
# )
# df.head()

In [None]:
m = leafmap.Map()
m.add_gdf(gdf, layer_name="AOI")


m