# Getting Started with STAC and eoAPI

## A Foundational Guide for the IFRC's Geospatial Data Analysis

### Overview

Welcome to this guide designed to introduce you to the world of SpatioTemporal Asset Catalog (STAC), Cloud Optimized Geotiffs (COGs), and eoAPI. Tailored for the International Federation of Red Cross and Red Crescent Societies (IFRC), this Jupyter Notebook aims to provide you with the foundational knowledge and practical skills necessary to effectively utilize these powerful tools in geospatial data management and analysis.

### Context

The notebook delves into the essentials of using eoAPI, offering insights into STAC and COGs. We will explore how eoAPI facilitates querying metadata and visualizing the underlying referenced data. All notebooks of the repository run within a specially configured cluster, integrating a deployed instance of eoAPI and JupyterHub to provide a seamless, hands-on experience.

### STAC, COG, and eoAPI

- **STAC**: We begin with an introduction to STAC, explaining its structure and critical role in damage assessment and risk management.
- **COG**: Next, we explore COGs and why they are pivotal in modern geospatial data handling.
- **eoAPI**: Finally, we discuss eoAPI, detailing its functionalities and importance in this ecosystem.

In this notebook, we leverage Maxar's high-resolution satellite imagery, acquired explicitly for the M7.8 and M7.5 Kahramanmaras earthquakes in Turkey on February 6, 2023. This data, pivotal for emergency response and risk assessment, has been ingested into our instance of eoAPI deployed on the cluster. The imagery, available as Cloud Optimized Geotiffs (COGs) with accompanying STAC metadata, enables streamlined analysis and visualization of the earthquake's impact. Maxar's data, being analysis-ready and cloud-optimized, provides a rich resource for demonstrating the capabilities of eoAPI in handling large-scale geospatial datasets in a crisis context.

This notebook is the foundational guide for the IFRC's of eoAPI and associated resources. It is structured to guide you through these concepts, culminating in a hands-on demonstration of using eoAPI to query and visualize earth observation data.

## STAC: What is it? Why is it important? How can I learn more?

### What is STAC (SpatioTemporal Asset Catalog)?

The SpatioTemporal Asset Catalog (STAC) is a specification designed to standardize how geospatial data is organized and accessed. It's a community-driven standard developed collaboratively by experts and stakeholders in the geospatial data field. STAC provides a common language and structure for describing geospatial information, making it easier to index, discover, and manage spatial data across various platforms and systems. One of the critical features of STAC is its format: simple JSON objects, which are both human-readable and machine-processable, ensuring ease of use and broad accessibility.

A fundamental aspect of STAC's design is its hierarchical organization, comprising Catalogs, Collections, and Items. At the highest level, **Catalogs** serve as containers that organize and provide access to Collections and Items. **Collections** represent a group of Items that share common properties, metadata, and structure, typically representing a dataset or a series of datasets. Within Collections, **Items** are individual pieces of data, each containing specific geospatial information, such as location and time. Crucially, each Item includes links to its Assets, which are the files associated with the data, such as imagery, thumbnails, or other related resources. These **Assets** are integral to the utility of each Item, providing the data required for analysis and visualization. This structured approach ensures a consistent and intuitive way to access and manage vast amounts of geospatial data, making STAC an efficient tool for data providers and users.

### Why is STAC Important?

STAC is critical in the modern earth observation stack, particularly for large-scale environmental monitoring, disaster response, and risk assessment scenarios. Its standardized, community-backed approach facilitates:

- **Efficient Data Discovery**: Simplifying the finding of relevant geospatial data across vast datasets.
- **Interoperability**: Enabling different systems and tools to access and use geospatial data seamlessly.
- **Scalability**: Supporting the management of increasingly large and complex datasets is a common challenge in geospatial analysis.
- **Inclusivity and Collaboration**: STAC evolves through collective insights as a community-driven standard, ensuring it remains relevant and effective for various use cases.

For organizations like the IFRC, STAC's capabilities are invaluable in rapidly accessing and analyzing data for humanitarian aid, disaster relief, and risk management purposes.

### How Can I Learn More?

To deepen your understanding of STAC and its applications:

- **STAC Specification**: Dive into the official [STAC Specification](https://stacspec.org/) for detailed documentation and standards.
- **STAC Index**: Explore the [STAC Index](https://stacindex.org/), a comprehensive resource listing STAC implementations, tools, and data catalogs, to get a better grasp of the ecosystem and its applications.
- **Developer Best Practices**: For a more technical perspective, review the [STAC Best Practices](https://github.com/radiantearth/stac-spec/blob/master/best-practices.md) on GitHub, offering in-depth guidance for developers working with STAC.

As we progress through this notebook, we'll explore applying STAC principles using eoAPI in practical scenarios.

## COG: What is it? Why is it important? How can I learn more?

### What is a COG (Cloud Optimized Geotiff)?

A Cloud Optimized GeoTIFF (COG) is a variant of the TIFF image format tailored explicitly for optimized access over a network. A raster format specifies a particular layout of internal data within the GeoTIFF specification, allowing for efficient, subsetted, or aggregated access. This format is designed to enable efficient workflows on the cloud by leveraging HTTP GET range requests for just the parts of the file needed. COGs are regular GeoTIFF files, making them backward compatible with other geospatial software but with an internal organization that enables more efficient data access and processing.

### Why is COG Important?

COGs represent a significant advancement in geospatial data handling, primarily due to:

- **Efficient Imagery Data Access**: COG-aware software can stream just the portion of data it needs, significantly improving processing times and enabling real-time workflows that were previously not possible.
- **Internal Compression and Optimized Read Performance**: COGs are internally compressed, meaning the inner blocks in a GeoTIFF are already compressed. This internal compression allows COG readers to decompress only the specific portion of the file requested rather than the entire file.
- **Reduced Duplication of Data**: COGs enable diverse software to access a single file online, reducing the need to copy and cache data.
- **Legacy Compatibility**: Traditional GIS software can treat cloud-optimized geoTIFFs just like normal geoTIFFs, simplifying data management.

### How Can I Learn More?

For a deeper dive into COGs and to understand their implementation and usage:

- **Official Documentation**: Visit the [COG website](https://www.cogeo.org/) for a comprehensive understanding of COGs and their implementation details.
- **Cloud Native Geospatial Formats Guide**: The [Cloud Native Geospatial Formats Guide](https://guide.cloudnativegeo.org/) offers detailed insights into COGs, including advanced details and working examples to help you grasp the practical applications of this format.

In the following sections, we'll explore the practical application of COGs in conjunction with eoAPI, focusing on how they can be utilized for efficient geospatial data management and visualization.

## eoAPI: What is it? Why is it important? How can I learn more?

### What is eoAPI?

eoAPI is an open-source framework for accessing and utilizing Earth Observation (EO) data. It simplifies constructing a cloud-native EO infrastructure by providing sensible defaults for most EO and geospatial infrastructure needs. eoAPI is modular, allowing for easy customization to specific requirements. Its key features include:

- **STAC Powered**: Utilizes a suite of STAC-focused technologies.
- **Sensible Defaults**: Facilitates seamless deployment, configuration, and customization.
- **Cloud Agnostic**: Capable of quick deployment and scaling of EO services anywhere.

### Why is eoAPI Important?

eoAPI addresses the challenge of making EO data easily discoverable, interoperable, ingestible, and optimized for integration into modern applications and decision-making tools. Simply put, it lowers the barrier of entry to earth observation data to a broader range of developers, scientists, and the general public. This is crucial for understanding our changing planet and maximizing the societal impact of EO data. eoAPI's framework supports end-to-end EO infrastructure, including data cataloging, searching, visualization, and access, making it a powerful tool for organizations like the IFRC in their mission-driven work.

### How Can I Learn More?

To explore eoAPI further:

- **Official Documentation and Guides**: Visit [eoAPI's website](https://eoapi.dev/) for comprehensive documentation and guides on deploying and using the framework.
- **GitHub Repository**: Check out the [eoAPI GitHub repository](https://github.com/developmentseed/eoapi) for source code, updates, and community contributions.

We'll delve into effectively utilizing eoAPI with STAC and COGs as we proceed.

## eoAPI in Action - Kahramanmaras Earthquakes

### Background

This section will review the different [eoAPI](https://github.com/developmentseed/eoAPI) services using the latest Open data from Maxar acquired for the M7.8 and M7.5 Kahramanmaras earthquakes in Turkey on February 6, 2023. For more information on the event, visit [USGS' article](https://www.usgs.gov/news/featured-story/m78-and-m75-kahramanmaras-earthquake-sequence-near-nurdagi-turkey-turkiye)

Maxar provides pre and post-event high-resolution satellite imagery in support of emergency planning, risk assessment, monitoring of staging areas and emergency response, damage assessment, and recovery. These images are generated using the Maxar ARD pipeline, tiled on an organized grid in analysis-ready cloud-optimized formats.

Maxar releases open data for select sudden-onset major crisis events. In addition to making the formatted COG data freely available on AWS, they also add static STAC metadata alongside the images. Having the STAC items already created makes ingestion into the PgSTAC database easy because we don't have to produce the items ourselves and read the images.

To learn more about ingesting the Maxar OpenData STAC catalog into PgSTAC, see https://github.com/vincentsarago/MAXAR_opendata_to_pgstac. For IFRC, data ingestion to the Kubernetes cluster can be trigger via [GitHub actions](https://github.com/developmentseed/eoapi-risk/tree/main/.github/workflows) directly on the repository.

### Structure of eoAPI in Action

0. Setting Up
1. How to query STAC
    - How to query Collections
    - How to query Items
2. Find Items within a data range
3. Visualizing an Asset
4. Visualizing multiple Assets with Mosaics
5. Comparing pre and post events Mosaics

### Setting Up

Before we start querying the STAC endpoint, there are a few setup steps to ensure that our environment has all the necessary tools and libraries. This section will guide you through installing the required packages and importing essential modules.

We must install two packages: **`httpx`** for making HTTP requests and **`ipyleaflet`** for interactive mapping in Jupyter notebooks.

Run the following command in your Jupyter Notebook to install these packages:

In [11]:
!python -m pip install httpx ipyleaflet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


This command uses pip, Python's package installer, to download and install the **`httpx`** and **`ipyleaflet`** packages from the Python Package Index (PyPI).

### Importing Modules

After installing the necessary packages, we need to import some standard and installed modules that will be used throughout our queries. The following Python code imports these modules:

In [12]:
from datetime import datetime

import json
import httpx

import ipyleaflet

- **`datetime`**: This module from Python's standard library works with dates and times. It's especially useful when dealing with time-based geospatial data.
- **`json`**: This module is used for JSON data handling. Since responses from STAC endpoints are typically in JSON format, this module is crucial for parsing and processing these responses.
- **`httpx`**: A powerful and user-friendly HTTP client for Python. We will use it to send requests to the STAC endpoint.
- **`ipyleaflet`**: A library for creating interactive maps in Jupyter notebooks. It's beneficial for visualizing geospatial data on a map.

Ensure that all imports run successfully. Double-check that the required packages are installed correctly if there are any issues.

### How to query STAC

The SpatioTemporal Asset Catalog (STAC) provides a standardized way to expose geospatial data and metadata. This section will explore how to query the STAC endpoint, specifically for the International Federation of Red Cross and Red Crescent Societies (IFRC).

The eoAPI STAC endpoint for IFRC is available at:

In [13]:
stac_endpoint = "https://eoapi.ifrc-risk.k8s.labs.ds.io/stac"

This endpoint is your gateway to accessing various geospatial datasets and metadata associated with IFRC's initiatives.

#### How to query Collections

Collections in STAC represent higher-level groupings under which items (geospatial datasets) are associated. These collections are folders containing related items. To start querying, we'll focus on these collections to understand the types of data available.

To query collections, we can use `httpx` to use a HTTP GET request to the **`/collections`** endpoint. To give you an overview of the STAC specification, here's what the first Collection of the requests looks like in its JSON format:

In [14]:
collections = httpx.get("https://stac.eoapi.dev/collections").json()
collections["collections"][0]

{'id': 'MAXAR_BayofBengal_Cyclone_Mocha_May_23',
 'type': 'Collection',
 'links': [{'rel': 'items',
   'type': 'application/geo+json',
   'href': 'https://stac.eoapi.dev/collections/MAXAR_BayofBengal_Cyclone_Mocha_May_23/items'},
  {'rel': 'parent',
   'type': 'application/json',
   'href': 'https://stac.eoapi.dev/'},
  {'rel': 'root',
   'type': 'application/json',
   'href': 'https://stac.eoapi.dev/'},
  {'rel': 'self',
   'type': 'application/json',
   'href': 'https://stac.eoapi.dev/collections/MAXAR_BayofBengal_Cyclone_Mocha_May_23'}],
 'extent': {'spatial': {'bbox': [[91.831615,
     19.984656587012033,
     92.97426268500965,
     21.666101],
    [92.75855246040959,
     19.982078842323997,
     92.89682495377032,
     20.514473160464657],
    [91.831615, 21.518411, 91.957078, 21.666101]]},
  'temporal': {'interval': [['2023-01-03T04:30:17Z',
     '2023-03-14T04:30:25Z']]}},
 'license': 'proprietary',
 'description': 'Maxar OpenData | BayofBengal Cyclone Mocha May 23',
 'item_as

However, there are many more collections available through the STAC endpoint. We can list the available collections by their `id`:

In [15]:
[c['id'] for c in collections["collections"]]

['MAXAR_BayofBengal_Cyclone_Mocha_May_23',
 'MAXAR_Emilia_Romagna_Italy_flooding_may23',
 'MAXAR_Gambia_flooding_8_11_2022',
 'MAXAR_Marshall_Fire_21_Update',
 'MAXAR_Hurricane_Fiona_9_19_2022',
 'MAXAR_Hurricane_Ian_9_26_2022',
 'MAXAR_Indonesia_Earthquake22',
 'MAXAR_Kahramanmaras_turkey_earthquake_23',
 'MAXAR_Kalehe_DRC_Flooding_5_8_23',
 'MAXAR_volcano_indonesia21',
 'MAXAR_New_Zealand_Flooding22',
 'MAXAR_New_Zealand_Flooding23',
 'MAXAR_Sudan_flooding_8_22_2022',
 'MAXAR_afghanistan_earthquake22',
 'MAXAR_cyclone_emnati22',
 'MAXAR_ghana_explosion22',
 'MAXAR_kentucky_flooding_7_29_2022',
 'MAXAR_pakistan_flooding22',
 'MAXAR_southafrica_flooding22',
 'MAXAR_tonga_volcano21',
 'MAXAR_yellowstone_flooding22',
 'MAXAR_Maui_Hawaii_fires_Aug_23',
 'MAXAR_NWT_Canada_Aug_23',
 'MAXAR_shovi_georgia_landslide_8Aug23',
 'MAXAR_Hurricane_Idalia_Florida_Aug23',
 'MAXAR_Libya_Floods_Sept_2023',
 'MAXAR_McDougallCreekWildfire_BC_Canada_Aug_23',
 'MAXAR_Morocco_Earthquake_Sept_2023']

For the following steps, we will be using the Kahramanmaras Earthquake collection. A key feature of STAC, is spatial and temporal extents. It is slightly intricate, but a collection can have multiple spatial and temporal extents. 

**TODO: is multiple extents because of different acquisition dates?**

First, let's illustrate the structure for spatial extents, and display them on a map:


In [21]:
collection_id = "MAXAR_Kahramanmaras_turkey_earthquake_23"

collection_info = httpx.get(f"https://stac.eoapi.dev/collections/{collection_id}").json()
bboxes = collection_info["extent"]["spatial"]["bbox"]
print(f"Number of spatial extents: {len(bboxes)}")
print(f"First spatial extent bounding box: \n {bboxes[0]}")

Number of spatial extents: 72
First spatial extent bounding box: 
 [35.32861203895262, 36.06630343440598, 38.45685512435119, 37.90150133428409]


In [18]:
geojson = {
    "type": "FeatureCollection",
    "features": [
        {
            'type': 'Feature',
            'geometry': {
                'type': 'Polygon',
                'coordinates': [[
                    [bbox[0], bbox[1]],
                    [bbox[2], bbox[1]],
                    [bbox[2], bbox[3]],
                    [bbox[0], bbox[3]],
                    [bbox[0], bbox[1]],
                ]]
            },
            'properties': {}
        }
        for bbox in bboxes
    ]
}

mainbbox = collection_info["extent"]["spatial"]["bbox"][0]

m = ipyleaflet.leaflet.Map(
    center=((mainbbox[1] + mainbbox[3]) / 2,(mainbbox[0] + mainbbox[2]) / 2),
    zoom=7
)

geo_json = ipyleaflet.leaflet.GeoJSON(data=geojson)
m.add_layer(geo_json)
m

Map(center=[36.983902384345036, 36.89273358165191], controls=(ZoomControl(options=['position', 'zoom_in_text',…

Finally, for temporal extents, you can also have multiple extents. However, the convention is for the first temporal extent to encapsulate the entire range of dates of the sub-extents. In the case of Maxar's collections, there is a single temporal extent:

In [28]:
temporal_extents = collection_info["extent"]["temporal"]['interval']
print(f"Number of temporal extents: {len(temporal_extents)}")
print(f"Temporal extent: \n {temporal_extents[0]}")

Number of temporal extents: 1
Temporal extent: 
 ['2021-02-28T08:10:22Z', '2023-03-11T08:29:15Z']
