# terrabyte STAC API - Discovery and Access of data using STAC


## Table of Content

* [1. Introduction](#intro)
* [2. What is STAC?](#stac)
* [3. How to use the catalogue](#catalogue)
    * [3.1 General Settings](#settings)
    * [3.2 Get general information about the Catalog](#information)
    * [3.3 Discover Collections](#collections)
    * [3.4 Discover Items](#items)
* [4. Helpful Links](#links)




## <a id ='intro'></a> 1. Introduction

The terrabyte STAC catalog provides interoperable access to metadata from EO collections and geospatial products. The majority of data is available via the terrabyte STAC catalog or can be accessed on the terrabyte Data Science Storage (DSS) geodata file system. STAC provides the user a more efficient way of data discovery for further processing.


## <a id ='stac'></a> 2. What is STAC?

STAC (Spatio Temporal Asset Catalog) is a standardized data model for cataloging and exchanging geospatial and temporal data, particularly focused on satellite-based Earth Observation (EO) data.
At its core, the SpatioTemporal Asset Catalog (STAC) specification provides a common structure for describing and cataloging spatiotemporal assets.
A spatiotemporal asset is any file that represents information about the earth captured in a certain space and time.

### STAC Catalogue
STAC (Spatio Temporal Asset Catalog) is a standardized data model for cataloging and exchanging geospatial and temporal data, particularly focused on satellite-based Earth Observation (EO) data. [STAC Catalog Specification](https://github.com/radiantearth/stac-spec/blob/master/catalog-spec/catalog-spec.md)
The terrabyte STAC catalog URL is https://stac.terrabyte.lrz.de/public/api
or https://stac-test.terrabyte.lrz.de/public/api for the tes catalog.

### STAC Collection
A STAC Collection provides additional information about a spatio-temporal collection of data. [STAC Collection Specification](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md)

### STAC Items
Fundamental to any STAC, a STAC Item represents an atomic collection of inseparable data and metadata. [STAC Item Specification](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md)

### STAC Extensions
While the core specification says nothing about particular types of data, the extensions folder is where one can find domain-specific fields that can be easily added to any STAC Item. [STAC Extensions](https://github.com/radiantearth/stac-spec/tree/master/extensions)

To get more information about each component follow links below:
- [About STAC - The STAC Specification](https://stacspec.org/en/about/stac-spec)
- [STAC on Github](https://github.com/radiantearth/stac-spec)
- [STAC Extensions on GitHub](https://stac-extensions.github.io/)


![STAC Logo](https://stacindex.org/img/logo.32c921b9.png)

## <a id ='catalogue'></a> 3. How to use the catalogue
This chapter provides general information on how the user can operate the catalogue.
In addition to the installation of the Python modules required for STAC, access to the collections and items is shown here.

### <a id ='settings'></a> 3.1 General Settings

In [1]:
# import STAC libraries
from pystac_client import Client
from pystac import Collection
from odc.stac import load
from IPython.display import HTML, display
from timeit import default_timer as timer

# define Catalogue Connections
terrabyte_Catalog = Client.open(url="https://stac.terrabyte.lrz.de/public/api/")

### <a id ='information'></a> 3.2 Get general information about the Catalog

In [2]:
print(f"ID: {terrabyte_Catalog.id}")
print(f"Title: {terrabyte_Catalog.title}")
print(f"Description: {terrabyte_Catalog.description}")

ID: terrabyte-public-stac-api
Title: terrabyte STAC API
Description: Curated data catalog of the terrabyte platform available at https://stac.terrabyte.lrz.de/public/api


### <a id ='collections'></a> 3.3 Discover Collections

##### "A STAC Collection provides additional information about a spatio-temporal collection of data. It extends Catalog directly, layering on additional fields to enable description of things like the spatial and temporal extent of the data, the license, keywords, providers, etc. It in turn can easily be extended for additional collection level metadata. It is used standalone by parts of the STAC community, as a lightweight way to describe data holdings." [(STAC Spec)](https://stacspec.org/en/about/stac-spec/)

First we query the catalogue for the number of available collections. The same information is available visually at the [terrabyte STAC browser](https://stac.terrabyte.lrz.de/browser/).

In [3]:
collections = list(terrabyte_Catalog.get_collections())
print(f"Number of collections: {len(collections)}")

Number of collections: 22


Here we receive a tabular overview of all collections available in the catalog.
The IDs and the corresponding title are listed. 
The ID of the respective collection is used to obtain further information on the collection.
Furthermore in the future we will provide an overview of the availability of all datasets https://docs.terrabyte.lrz.de/datasets/terrabyte-datasets/inventory/

In [4]:
print('[["Collection IDs","Collection Title","Temporal Extent"],')
#print("---------------------------------")

for collection in collections:
    print(f'["{collection.id}","{collection.title}","{collection.to_dict()["extent"]["temporal"]["interval"][0][0]} to {collection.to_dict()["extent"]["temporal"]["interval"][0][1]}"],')
print("]")

[["Collection IDs","Collection Title","Temporal Extent"],
["modis-09gq-061","MOD09GQ.061 & MYD09GQ.061: MODIS Surface Reflectance Daily (250m)","2000-02-24T00:00:00Z to 2024-04-14T00:00:00Z"],
["viirs-13a1-001","VNP13A1.001: VIIRS Vegetation Indices 16-Day (500m)","2012-12-18T00:00:00Z to 2024-03-29T00:00:00Z"],
["modis-13q1-061","MOD13Q1.061 & MYD13Q1.061: MODIS Vegetation Indices 16-Day (250m)","2000-02-18T00:00:00Z to 2024-09-21T00:00:00Z"],
["sentinel-2-c1-l1c","Sentinel-2 Collection 1 Level-1C","2017-03-21T10:30:11.026000Z to 2024-07-09T14:27:49.024000Z"],
["sentinel-1-grd","Sentinel-1 GRD Level-1","2014-10-04T00:58:44.620029Z to 2024-10-21T22:10:48.421448Z"],
["sentinel-1-slc","Sentinel-1 SLC Level-1","2016-05-01T02:01:56.036162Z to 2024-06-12T04:09:19.030071Z"],
["modis-13a2-061","MOD13A2.061 & MYD13A2.061: MODIS Vegetation Indices 16-Day (1km)","2000-02-18T00:00:00Z to 2024-09-21T00:00:00Z"],
["sentinel-3-olci-l1-efr","Sentinel-3 OLCI Level-1 EFR","2016-04-25T11:33:13.837140Z t

In [7]:
data = [["Collection IDs","Collection Title","Temporal Extent"],
["sentinel-1-grd","Sentinel-1 GRD Level-1","2014-10-04T00:58:44.620029Z to 2024-07-15T20:53:12.040299Z"],
["modis-09gq-061","MOD09GQ.061 & MYD09GQ.061: MODIS Surface Reflectance Daily (250m)","2000-02-24T00:00:00Z to 2024-04-14T00:00:00Z"],
["sentinel-1-slc","Sentinel-1 SLC Level-1","2016-05-01T02:01:56.036162Z to 2024-06-12T04:09:19.030071Z"],
["modis-13a2-061","MOD13A2.061 & MYD13A2.061: MODIS Vegetation Indices 16-Day (1km)","2000-02-18T00:00:00Z to 2024-03-29T00:00:00Z"],
["modis-13q1-061","MOD13Q1.061 & MYD13Q1.061: MODIS Vegetation Indices 16-Day (250m)","2000-02-18T00:00:00Z to 2024-04-14T00:00:00Z"],
["sentinel-2-c1-l1c","Sentinel-2 Collection 1 Level-1C","2017-03-21T10:30:11.026000Z to 2024-07-09T14:27:49.024000Z"],
["cop-dem-glo-30","Copernicus DEM GLO-30","2021-04-22T00:00:00Z to 2021-04-22T00:00:00Z"],
["viirs-13a1-001","VNP13A1.001: VIIRS Vegetation Indices 16-Day (500m)","2012-12-18T00:00:00Z to 2024-03-29T00:00:00Z"],
["sentinel-3-olci-l1-efr","Sentinel-3 OLCI Level-1 EFR","2016-04-25T11:33:13.837140Z to 2024-07-15T20:44:44.052277Z"],
["sentinel-2-l1c","Sentinel-2 Level-1C","2015-07-04T10:10:06.027000Z to 2024-04-28T10:25:59.024000Z"],
["cop-dem-glo-90","Copernicus DEM GLO-90","2021-04-22T00:00:00Z to 2021-04-22T00:00:00Z"],
["viirs-09ga-001","VNP09GA.001: VIIRS/NPP Surface Reflectance Daily L2GD 500m and 1km","2013-01-01T00:00:00Z to 2024-04-13T00:00:00Z"],
["landsat-tm-c2-l2","Landsat 4-5 TM Collection 2 Level-2","1982-08-22T14:18:20.392044Z to 2012-05-05T17:54:06.781038Z"],
["landsat-etm-c2-l2","Landsat 7 ETM+ Collection 2 Level-2","1999-06-30T06:52:25.853967Z to 2024-01-17T00:27:18.147088Z"],
["sentinel-1-nrb","Sentinel-1 Normalized Radar Backscatter (NRB)","2015-12-06T13:06:16Z to 2023-09-02T05:11:51Z"],
["landsat-ot-c2-l2","Landsat 8-9 OLI/TIRS Collection 2 Level-2","2013-03-19T06:54:31.129217Z to 2024-07-14T23:57:56.593788Z"],
["sentinel-2-c1-l2a","Sentinel-2 Collection 1 Level-2A","2015-07-08T08:10:16.027000Z to 2024-07-15T21:30:39.024000Z"],
["viirs-15a2h-001","VNP15A2H.001: VIIRS Leaf Area Index/FPAR 8-Day","2012-12-26T00:00:00Z to 2024-03-29T00:00:00Z"],
["modis-10a1-061","MOD10A1.061 & MYD10A1.061: MODIS Snow Cover Daily","2000-02-24T00:00:00Z to 2024-04-14T00:00:00Z"],
["sentinel-2-l2a","Sentinel-2 Level-2A","2022-01-25T02:29:59.024000Z to 2022-12-06T06:22:21.024000Z"],
["modis-13a3-061","MOD13A3.061 & MYD13A3.061: MODIS Vegetation Indices Monthly (1km)","2000-02-01T00:00:00Z to 2024-03-01T00:00:00Z"],
["modis-09ga-061","MOD09GA.061 & MYD09GA.061: MODIS Surface Reflectance Daily (1km and 500m)","2000-02-24T00:00:00Z to 2024-04-14T00:00:00Z"],
]

display(HTML(
   '<table><tr>{}</tr></table>'.format(
       '</tr><tr>'.join(
           '<td>{}</td>'.format('</td><td>'.join(str(_) for _ in row)) for row in data)
       )
))

0,1,2
Collection IDs,Collection Title,Temporal Extent
sentinel-1-grd,Sentinel-1 GRD Level-1,2014-10-04T00:58:44.620029Z to 2024-07-15T20:53:12.040299Z
modis-09gq-061,MOD09GQ.061 & MYD09GQ.061: MODIS Surface Reflectance Daily (250m),2000-02-24T00:00:00Z to 2024-04-14T00:00:00Z
sentinel-1-slc,Sentinel-1 SLC Level-1,2016-05-01T02:01:56.036162Z to 2024-06-12T04:09:19.030071Z
modis-13a2-061,MOD13A2.061 & MYD13A2.061: MODIS Vegetation Indices 16-Day (1km),2000-02-18T00:00:00Z to 2024-03-29T00:00:00Z
modis-13q1-061,MOD13Q1.061 & MYD13Q1.061: MODIS Vegetation Indices 16-Day (250m),2000-02-18T00:00:00Z to 2024-04-14T00:00:00Z
sentinel-2-c1-l1c,Sentinel-2 Collection 1 Level-1C,2017-03-21T10:30:11.026000Z to 2024-07-09T14:27:49.024000Z
cop-dem-glo-30,Copernicus DEM GLO-30,2021-04-22T00:00:00Z to 2021-04-22T00:00:00Z
viirs-13a1-001,VNP13A1.001: VIIRS Vegetation Indices 16-Day (500m),2012-12-18T00:00:00Z to 2024-03-29T00:00:00Z
sentinel-3-olci-l1-efr,Sentinel-3 OLCI Level-1 EFR,2016-04-25T11:33:13.837140Z to 2024-07-15T20:44:44.052277Z


### <a id ='items'></a> 3.4 Discover Items 

##### "Fundamental to any STAC, a STAC Item represents an atomic collection of inseparable data and metadata. A STAC Item is a GeoJSON feature and can be easily read by any modern GIS or geospatial library. The STAC Item JSON specification includes additional fields for the time the asset represents, a thumbnail for quick browsing, asset links, links to the described data and relationship links, allowing users to traverse other related STAC Items." [(STAC Spec)](https://stacspec.org/en/about/stac-spec/)


#### Filter your query

To avoid long loading times, the results can be limited.
Especially if you are interested in data from a specific region, for a specific time or with specific attributes such as cloud cover etc.
The limit parameter defines the number of items to return per page of results and can affect the duration of your query.

Extensions can help to describe data completely. To know which extensions are available for your collection check the STAC browser or the print the item information (as done below).
In the following example, we are refining our search by filtering for cloud cover below 20%.

In [15]:
terrabyte_Catalog = Client.open(url="https://stac-test.terrabyte.lrz.de/public/api/")

query = {
    'eo:cloud_cover': {
        "gte": 0,
        "lte": 20
    }
}

bbox_germany=[5.98865807458,47.3024876979,15.0169958839,54.983104153]

start = '2023-10-01T00:00:00Z'
end = '2024-01-31T23:59:59Z'

limit_list = [10, 100, 500, 1000, 2500, 5000, 7500, 10000] # A recommendation to the service as to the number of items to return per page of results. Defaults to 100.

In [16]:
for limit in limit_list:
    start_timer = timer()
    search_germany = terrabyte_Catalog.search(collections=["sentinel-2-c1-l2a", "landsat-ot-c2-l2"], 
                                              datetime=[start, end], 
                                              query=query,
                                              limit=limit,
                                              bbox=bbox_germany)
    items_germany = search_germany.item_collection() 
    end_timer = timer()
    print(f"Search returned {len(items_germany)} items. The limit variable was set to {limit} and it took {round(end_timer - start_timer,4)} seconds.")

Search returned 474 items. The limit variable was set to 10 and it took 30.1783 seconds.
Search returned 474 items. The limit variable was set to 100 and it took 2.9303 seconds.
Search returned 474 items. The limit variable was set to 500 and it took 3.3208 seconds.
Search returned 474 items. The limit variable was set to 1000 and it took 3.0483 seconds.
Search returned 474 items. The limit variable was set to 2500 and it took 3.0555 seconds.
Search returned 474 items. The limit variable was set to 5000 and it took 2.5706 seconds.
Search returned 474 items. The limit variable was set to 7500 and it took 3.0398 seconds.
Search returned 474 items. The limit variable was set to 10000 and it took 3.0252 seconds.


Find your bounding box at [bboxfinder](http://bboxfinder.com/) or extract them for countries [from this list](https://gist.github.com/graydon/11198540).

#### Get information about Items

In [17]:
# one item
items_germany[0]

In [18]:
for item in items_germany:
    print(f"- {item.properties['platform']} - {item.properties['datetime']} - {item.properties['grid:code']} - {item.properties['eo:cloud_cover']}")

- sentinel-2a - 2024-01-31T10:22:51.024000Z - MGRS-33UVA - 14.849807
- sentinel-2a - 2024-01-31T10:22:51.024000Z - MGRS-33UUB - 20.52241
- sentinel-2a - 2024-01-31T10:22:51.024000Z - MGRS-33UUA - 16.107471
- sentinel-2a - 2024-01-31T10:22:51.024000Z - MGRS-32UPF - 12.082554
- sentinel-2b - 2024-01-30T10:01:49.024000Z - MGRS-33UWR - 15.669832
- sentinel-2b - 2024-01-30T10:01:49.024000Z - MGRS-33UUQ - 7.288156
- sentinel-2b - 2024-01-30T10:01:49.024000Z - MGRS-33UUP - 15.002677
- sentinel-2b - 2024-01-30T10:01:49.024000Z - MGRS-33TWN - 3.070894
- sentinel-2b - 2024-01-30T10:01:49.024000Z - MGRS-33TVN - 0.004854
- sentinel-2b - 2024-01-30T10:01:49.024000Z - MGRS-33TUN - 0.041154
- sentinel-2b - 2024-01-30T10:01:49.024000Z - MGRS-32TQT - 0.263332
- sentinel-2b - 2024-01-29T10:32:09.024000Z - MGRS-32UNU - 10.193405
- sentinel-2a - 2024-01-28T10:13:01.024000Z - MGRS-33UWS - 3.712707
- sentinel-2a - 2024-01-28T10:13:01.024000Z - MGRS-33UWR - 0.138388
- sentinel-2a - 2024-01-28T10:13:01.024000

### <a id ='tutorials'></a> 4. Next Steps

To give you further insights into how to use STAC on terrabyte, for example how to ...
- visualize the footprints of a STAC search
- generate a data cube from a STAC search for further processing
- If you have any use cases in mind let us know!

You can find the tutorials here: 

- [terrabyte examples](https://github.com/DLR-terrabyte/eo-examples/tree/main)


### <a id ='links'></a> 5. Helpful links

To delve deeper into the subject, we recommend the following links.

- [STAC Specification](https://stacspec.org/en/)
- [STAC Tools - Overview](https://stacspec.org/en/about/tools-resources/#Data-Processing)
- [STAC Tutorials](https://stacspec.org/en/tutorials/)

In [19]:
# extract footprints from stac items into geopandas data frame
import geopandas as gpd
#items = search_germany.item_collection()
df_s2_ls = gpd.GeoDataFrame.from_features(search_germany.item_collection())


In [20]:
len(df_s2_ls['grid:code'].unique())

117

In [21]:
# prepare bbox
from shapely.geometry import box
bbox_map = box(*bbox_germany)
bbox_map = gpd.GeoDataFrame({"geometry": [bbox_map]})
bbox_map = bbox_map.to_json()

In [22]:
import folium
import folium.plugins as folium_plugins
from shapely.geometry import box

map = folium.Map()
layer_control = folium.LayerControl(position='topright', collapsed=True)
fullscreen = folium_plugins.Fullscreen()
style = {'fillColor': '#00000000', "color": "#0000ff", "weight": 1}

footprints = folium.GeoJson(
    df_s2_ls.to_json(),
    name='S2 and Landsat 8/9 footprints',
    style_function=lambda x: style,
    control=True
)




bbox_map = folium.GeoJson(
    bbox_map,
    name='Bbox Search',
    control=True
)    

bbox_map.add_to(map)
footprints.add_to(map)
layer_control.add_to(map)
fullscreen.add_to(map)
map.fit_bounds(map.get_bounds())
map
