## Search a STAC catalog

A number of satellites take snapshots of the Earth’s surface from space. The images recorded by these remote sensors represent a very precious data source for any activity that involves monitoring changes on Earth. Satellite imagery is typically provided in the form of geospatial raster data, with the measurements in each grid cell (“pixel”) being associated to accurate geographic coordinate information.

In this episode we will explore how to access open satellite data using Python. In particular, we will consider the Sentinel-2 data collection that is hosted on Amazon Web Services (AWS). This dataset consists of multi-band optical images acquired by the constellation of two satellites from the Sentinel-2 mission and it is continuously updated with new images.

# Search for satellite imagery

## The SpatioTemporal Asset Catalog (STAC) specification

Current sensor resolutions and satellite revisit periods are such that terabytes of data products are added daily to the corresponding collections. Such datasets cannot be made accessible to users via full-catalog download. Therefore, space agencies and other data providers often offer access to their data catalogs through interactive Graphical User Interfaces (GUIs), see for instance the Copernicus Browser for the Sentinel missions ( https://browser.dataspace.copernicus.eu/?zoom=5&lat=50.16282&lng=20.78613&themeId=DEFAULT-THEME&visualizationUrl=U2FsdGVkX1%2B1VJvvyBdiOoVJ2VJfqwdDZ229eCeCXLEg9Nev8tdcCKmYrw7NJDeOC6I4qmI40jun%2FYiOMc5warjOLONlULsYwkcqWZfmmLhoOTfZXuZto9X3L%2B7BLkxq&datasetId=S2_L2A_CDAS&demSource3D=%22MAPZEN%22&cloudCoverage=30&dateMode=SINGLE ). Accessing data via a GUI is a nice way to explore a catalog and get familiar with its content, but it represents a heavy and error-prone task that should be avoided if carried out systematically to retrieve data.

A service that offers programmatic access to the data enables users to reach the desired data in a more reliable, scalable and reproducible manner. An important element in the software interface exposed to the users, which is generally called the Application Programming Interface (API), is the use of standards. Standards, in fact, can significantly facilitate the reusability of tools and scripts across datasets and applications.

The SpatioTemporal Asset Catalog (STAC) specification is an emerging standard for describing geospatial data. By organizing metadata in a form that adheres to the STAC specifications, data providers make it possible for users to access data from different missions, instruments and collections using the same set of tools.

Search a STAC catalog

The STAC browser is a good starting point to discover available datasets, as it provides an up-to-date list of existing STAC catalogs. From the list, let’s click on the “Earth Search” catalog, i.e. the access point to search the archive of Sentinel-2 images hosted on AWS.

When opening a catalog with the STAC browser, you can access the API URL by clicking on the “Source” button on the top right of the page. By using this URL, you have access to the catalog content and, if supported by the catalog, to the functionality of searching its items. For the Earth Search STAC catalog the API URL is:


https://stacindex.org/catalogs

https://stacindex.org/catalogs/earth-search#/



In [None]:
api_url = 'https://earth-search.aws.element84.com/v1'

You can query a STAC API endpoint from Python using the pystac_client library. To do so we will first import Client from pystac_client and use the method open from the Client object:

In [None]:
from pystac_client import Client

client = Client.open(api_url)

For this episode we will focus at scenes belonging to the sentinel-2-l2a collection. This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).

In order to see which collections are available in the provided api_url the get_collections method can be used on the Client object.

In [None]:
collection = 'sentinel-2-l2a'  # Sentinel-2, Level 2A

In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request. In order to do so we are using the shapely library to define a geometrical point. Below we have included a point on the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).

In [None]:
from shapely.geometry import Point
point = Point(27.95, 36.20)  # Rhodes

Note: at this stage, we are only dealing with metadata, so no image is going to be downloaded yet. But even metadata can be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersection of the point (by setting the parameter intersects) and assign the collection (by setting the parameter collections). More information about the possible parameters to be set can be found in the pystac_client documentation for the Client’s search method.

We now set up our search of satellite images in the following way:

In [None]:
search = client.search(
    collections=[collection],
    intersects=point,
)

Now we submit the query in order to find out how many scenes match our search criteria with the parameters assigned above (please note that this output can be different as more data is added to the catalog to when this episode was created):

In [None]:
search.matched()

### Exercise: searching satellite scenes with a time filter

Add a time filter to the search in order to select the only scenes recorded between 1 July and 31 August. You can find the input argument and the required syntax in the documentation of `client.search` (which you can access from Python or [online](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search)). How many scenes do now match our search?

Now that we have added a time filter, we retrieve the metadata of the search results by calling the method item_collection:

In [None]:
search = client.search(
    collections=[collection],
    intersects=point,
    datetime='2023-07-01/2023-08-31',
)

In [None]:
search.matched()

In [None]:
items = search.item_collection()

Now let us check the size using len:

In [None]:
len(items)

which is consistent with the number of scenes matching our search results as found with search.matched(). We can iterate over the returned items and print these to show their IDs:

In [None]:
for item in items:
    print(item)

In [None]:
item = items[-1]
print(item.datetime)
print(item.geometry)
print(item.properties)

### Exercise: searching satellite scenes using metadata filters

Let's add a filter on the cloud cover to select the only scenes with less than 1% cloud coverage. How many scenes do now match our search?

Hint: generic metadata filters can be implemented via the `query` input argument of `client.search`, which requires the following syntax (see [docs](https://pystac-client.readthedocs.io/en/stable/usage.html#query-extension)): `query=['<property><operator><value>']`.

In [None]:
search = client.search(
    collections=[collection],
    intersects=point,
    datetime='2023-07-01/2023-08-31',
    query=['eo:cloud_cover<1']
)

In [None]:
print(search.matched())

In [None]:
items = search.item_collection()

In [None]:
items.save_object('../data/output_data/rhodes_sentinel-2.json')

## Access the assets

So far we have only discussed metadata - but how can one get to the actual images of a satellite scene (the “assets” in the STAC nomenclature)? These can be reached via links that are made available through the item’s attribute assets. Let’s focus on the last item in the collection: this is the oldest in time, and it thus corresponds to an image taken before the wildfires.

In [None]:
assets = items[0].assets  # first item's asset dictionary
print(assets.keys())

We can print a minimal description of the available assets:

In [None]:
for key, asset in assets.items():
    print(f"{key}: {asset.title}")

Among the other data files, assets include multiple raster data files (one per optical band, as acquired by the multi-spectral instrument), a thumbnail, a true-color image (“visual”), instrument metadata and scene-classification information (“SCL”). Let’s get the URL link to the thumbnail, which gives us a glimpse of the Sentinel-2 scene:

In [None]:
assets["thumbnail"].href

For comparison, we can check out the thumbnail of the most recent scene of the sequence considered (i.e. the first item in the item collection), which has been taken after the wildfires:

In [None]:
items[-1].assets['thumbnail'].href

From the thumbnails alone we can already observe some dark spots on the island of Rhodes at the bottom right of the image!

In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the rioxarray library. Note that this library can both work with local and remote raster data. At this moment, we will only quickly look at the functionality of this library. We will learn more about it in the next episode.

Now let us focus on the red band by accessing the item red from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using .href after the item selection.

For now we are using rioxarray to open the raster file.

In [None]:
import rioxarray

red_href = assets['red'].href
red = rioxarray.open_rasterio(red_href)
red.rio.to_raster('../data/output_data/red.tif', driver='COG')

That might take a while, given there are over 10000 x 10000 = a hundred million pixels in the 10-meter NIR band. But we can take a smaller subset before downloading it. Because the raster is a COG, we can download just what we need!

In order to do that, we are using rioxarray´s clip_box with which you can set a bounding box defining the area you want.

In [None]:
red_subset = red.rio.clip_box(
    minx=560900,
    miny=3995000,
    maxx=570900,
    maxy=4015000
)

In [None]:
#Next, we save the subset using to_raster again.

red_subset.rio.to_raster("../data/output_data/red_subset.tif")

## Exercise: 

Let's put in practice what we have learned in this episode, and search for the tile of the global [Copernicus Digital Elevation Model (DEM)](https://spacedata.copernicus.eu/collections/copernicus-digital-elevation-model) that includes the island of Rhodes. Use the same STAC endpoint and coordinates as in the previous search, but modify the searched collection (the Copernicus DEM collection has ID: "cop-dem-glo-30"). Extract the URL pointing to the data from the item matching your search. How does it differ from the URL you got for the satellite images?

In [None]:
search = client.search(
    collections=['cop-dem-glo-30'],
    intersects=point,
)

In [None]:
items = search.item_collection()
items[0].assets['data'].href