# STAC Item Search and Submission to Data Pipeline

This notebook allows operators to:

1. Define an area of interest (AOI) and time range
2. Search for STAC items from the EOPF STAC catalog
3. Submit selected items to the data pipeline for processing via HTTP webhook

## Setup and Imports


In [23]:
import json

import requests
from pystac import Item
from pystac_client import Client

# Try to load .env file if available
# try:
#     from dotenv import load_dotenv

#     dotenv_path = Path(".env")
#     if dotenv_path.exists():
#         load_dotenv(dotenv_path)
#         print("‚úÖ Loaded credentials from .env file")
#     else:
#         print("‚ÑπÔ∏è  No .env file found, will prompt for credentials")
# except ImportError:
#     print("‚ÑπÔ∏è  python-dotenv not installed, will prompt for credentials")
#     print("   Install with: pip install python-dotenv")

## Configuration


In [24]:
# STAC API Configuration
SOURCE_STAC_API_URL = "https://stac.core.eopf.eodc.eu/"
TARGET_STAC_API_URL = "https://api.explorer.eopf.copernicus.eu/stac"

# Webhook Configuration
WEBHOOK_URL = "http://localhost:12000/samples"

print("‚úÖ Configuration loaded")

‚úÖ Configuration loaded


## Define Area and Time of Interest


In [None]:
# Area of Interest (AOI) - Bounding box: [min_lon, min_lat, max_lon, max_lat]
# Example: Rome area
# aoi_bbox = [12.4, 41.8, 12.6, 42.0]
# Example 2: Majorca area (2.1697998046875004%2C39.21097520599528%2C3.8177490234375004)
# aoi_bbox = [2.16, 39.21, 3.82, 39.78]
# Example 3: France Full
# aoi_bbox = [-5.14, 41.33, 9.56, 51.09]
# Example 4: Lagoon From Venice to Trieste
# aoi_bbox = [12.0, 44.4, 14.0, 46.0]
# La Palma Island
# aoi_bbox = [-18, 27.4, -13.70, 29.5]
# Italy Full
# aoi_bbox = [6.627265, 35.492537, 18.513648, 47.092146]
# 2025 Corbi√®res Massif wildfire area
# aoi_bbox = [2.4, 42.8, 3.2, 43.1]
# Pi√≥d√£o, Portugal wildfire area
aoi_bbox = [-7.866, 40.316, -7.633, 40.483]

# Time range
start_date = "2025-07-01T00:00:00Z"
end_date = "2025-10-31T23:59:59Z"

print(f"Area of Interest: {aoi_bbox}")
print(f"Time Range: {start_date} to {end_date}")

Area of Interest: [2.4, 42.8, 3.2, 43.1]
Time Range: 2025-07-01T00:00:00Z to 2025-10-31T23:59:59Z


## Browse Available Collections


In [26]:
# Connect to STAC API
catalog = Client.open(SOURCE_STAC_API_URL)

# List available collections
collections = list(catalog.get_collections())

print(f"\nüìö Available Collections ({len(collections)} total):\n")
for col in collections:
    print(f"  - {col.id}")
    if col.description:
        print(
            f"    {col.description[:100]}..."
            if len(col.description) > 100
            else f"    {col.description}"
        )
    print()


üìö Available Collections (12 total):

  - sentinel-2-l2a
    The Sentinel-2 Level-2A Collection 1 product provides orthorectified Surface Reflectance (Bottom-Of-...

  - sentinel-3-slstr-l1-rbt
    The Sentinel-3 SLSTR Level-1B RBT product provides radiances and brightness temperatures for each pi...

  - sentinel-2-l1c
    The Sentinel-2 Level-1C product is composed of 110x110 km2 tiles (ortho-images in UTM/WGS84 projecti...

  - sentinel-3-olci-l2-lrr
    The Sentinel-3 OLCI L2 LRR product provides land and atmospheric geophysical parameters computed for...

  - sentinel-3-olci-l2-lfr
    The Sentinel-3 OLCI L2 LFR product provides land and atmospheric geophysical parameters computed for...

  - sentinel-3-olci-l1-efr
    The Sentinel-3 OLCI L1 EFR product provides TOA radiances at full resolution for each pixel in the i...

  - sentinel-1-l1-slc
    The Sentinel-1 Level-1 Single Look Complex (SLC) products consist of focused SAR data, geo-reference...

  - sentinel-1-l1-grd
    T

## Select Collection and Search for Items


In [27]:
# Choose the source collection to search
source_collection = "sentinel-2-l2a"  # Change this to your desired collection

# Choose the target collection for processing
target_collection = "sentinel-2-l2a"  # Change this to your target collection

print(f"üîç Searching collection: {source_collection}")
print(f"üéØ Target collection for processing: {target_collection}")

üîç Searching collection: sentinel-2-l2a
üéØ Target collection for processing: sentinel-2-l2a


In [28]:
# Search for items
search = catalog.search(
    collections=[source_collection],
    bbox=aoi_bbox,
    datetime=f"{start_date}/{end_date}",  # Adjust as needed
    limit=200,  # Adjust limit as needed
)

# Collect items paginated results and clean them (workaround for issue #26)
# Use pages_as_dicts() to get raw JSON before PySTAC parsing
items = []

for page_dict in search.pages_as_dicts():
    for feature in page_dict.get("features", []):
        # Skip deprecated items
        properties = feature.get("properties", {})
        if properties.get("deprecated", False):
            item_id = feature.get("id", "unknown")
            print(f"‚ö†Ô∏è  Skipping deprecated item: {item_id}")
            continue

        # Clean assets with missing href before parsing
        if "assets" in feature:
            original_count = len(feature["assets"])
            feature["assets"] = {
                key: asset for key, asset in feature["assets"].items() if "href" in asset
            }
            removed_count = original_count - len(feature["assets"])
            if removed_count > 0:
                item_id = feature.get("id", "unknown")
                # print(f"‚ö†Ô∏è  Item {item_id}: Removed {removed_count} asset(s) with missing href")

        # Now parse the cleaned item
        try:
            item = Item.from_dict(feature)
            items.append(item)
        except Exception as e:
            item_id = feature.get("id", "unknown")
            print(f"‚ö†Ô∏è  Skipping item {item_id}: {e}")
            continue

print(f"\n‚úÖ Found {len(items)} items (after filtering).\n")

‚ö†Ô∏è  Skipping deprecated item: S2C_MSIL2A_20251031T105221_N0511_R051_T31TDH_20251031T144221
‚ö†Ô∏è  Skipping deprecated item: S2A_MSIL2A_20250930T104041_N0511_R008_T31TEH_20250930T143113
‚ö†Ô∏è  Skipping deprecated item: S2A_MSIL2A_20250930T104041_N0511_R008_T31TDH_20250930T143113
‚ö†Ô∏è  Skipping deprecated item: S2A_MSIL2A_20250920T103741_N0511_R008_T31TEH_20250920T144020
‚ö†Ô∏è  Skipping deprecated item: S2A_MSIL2A_20250920T103741_N0511_R008_T31TDH_20250920T144020
‚ö†Ô∏è  Skipping deprecated item: S2C_MSIL2A_20250918T104031_N0511_R008_T31TEH_20250918T161413
‚ö†Ô∏è  Skipping deprecated item: S2C_MSIL2A_20250918T104031_N0511_R008_T31TDH_20250918T161413
‚ö†Ô∏è  Skipping deprecated item: S2B_MSIL2A_20250916T104619_N0511_R051_T31TDH_20250916T133243
‚ö†Ô∏è  Skipping deprecated item: S2A_MSIL2A_20250913T104651_N0511_R051_T31TDH_20250913T161813
‚ö†Ô∏è  Skipping deprecated item: S2B_MSIL2A_20250913T103619_N0511_R008_T31TEH_20250913T131830
‚ö†Ô∏è  Skipping deprecated item: S2B_MSIL2A_20250

## Submit Items to Pipeline


In [29]:
def check_item_exists_in_target(item_id: str, target_collection: str) -> bool:
    """
    Check if a STAC item already exists in the target catalog.

    Args:
        item_id: The ID of the STAC item to check
        target_collection: The target collection to check in

    Returns:
        True if item exists, False otherwise
    """
    try:
        # Try to get the specific item
        item_url = f"{TARGET_STAC_API_URL}/collections/{target_collection}/items/{item_id}"
        response = requests.get(item_url)

        # If we get a 200 response, the item exists
        return response.status_code == 200

    except Exception as e:
        # If there's any error, assume the item doesn't exist
        # This prevents false positives that could skip valid items
        print(f"‚ö†Ô∏è  Error checking if item {item_id} exists: {e}")
        return False

In [30]:
def submit_item_to_pipeline(item_url: str, target_collection: str) -> bool:
    """
    Submit a single STAC item to the data pipeline via HTTP webhook.

    Args:
        item_url: The self-link URL of the STAC item
        target_collection: The target collection for processing

    Returns:
        True if successful, False otherwise
    """
    try:
        # Create payload
        payload = {
            "source_url": item_url,
            "collection": target_collection,
            "action": "convert-v1-s2",  # specify the action to use the V1 S2 trigger
        }

        # Submit via HTTP webhook endpoint
        message = json.dumps(payload)
        response = requests.post(
            WEBHOOK_URL,
            data=message,
            headers={"Content-Type": "application/json"},
        )

        response.raise_for_status()
        return True

    except Exception as e:
        print(f"‚ùå Error submitting item: {e}")
        return False

In [31]:
# Submit all found items to the pipeline (skip items already in target catalog)
if items:
    print(f"\nüì§ Processing {len(items)} items...\n")

    success_count = 0
    fail_count = 0
    skipped_count = 0

    for item in items:
        # Check if item already exists in target catalog
        if check_item_exists_in_target(item.id, target_collection):
            print(f"‚è≠Ô∏è  Skipping {item.id}: Already exists in target catalog")
            skipped_count += 1
            continue

        # Get the self link (canonical URL for the item)
        item_url = next((link.href for link in item.links if link.rel == "self"), None)

        if not item_url:
            print(f"‚ö†Ô∏è  Skipping {item.id}: No self link found")
            fail_count += 1
            continue

        # Submit to pipeline
        if submit_item_to_pipeline(item_url, target_collection):
            print(f"‚úÖ Submitted: {item.id}")
            success_count += 1
        else:
            print(f"‚ùå Failed: {item.id}")
            fail_count += 1

    print("\nüìä Summary:")
    print(f"  - Successfully submitted: {success_count}")
    print(f"  - Already existed (skipped): {skipped_count}")
    print(f"  - Failed: {fail_count}")
    print(f"  - Total processed: {len(items)}")
else:
    print("No items to submit.")


üì§ Processing 34 items...

‚úÖ Submitted: S2A_MSIL2A_20251030T104211_N0511_R008_T31TEH_20251030T144716
‚úÖ Submitted: S2A_MSIL2A_20251030T104211_N0511_R008_T31TDH_20251030T144716
‚úÖ Submitted: S2C_MSIL2A_20251028T104151_N0511_R008_T31TEH_20251028T145122
‚úÖ Submitted: S2C_MSIL2A_20251028T104151_N0511_R008_T31TDH_20251028T145122
‚úÖ Submitted: S2B_MSIL2A_20251026T105039_N0511_R051_T31TDH_20251026T131435
‚úÖ Submitted: S2A_MSIL2A_20251023T105131_N0511_R051_T31TDH_20251023T145710
‚úÖ Submitted: S2B_MSIL2A_20251023T104009_N0511_R008_T31TEH_20251023T144134
‚úÖ Submitted: S2B_MSIL2A_20251023T104009_N0511_R008_T31TDH_20251023T144134
‚úÖ Submitted: S2C_MSIL2A_20251021T105111_N0511_R051_T31TDH_20251021T150514
‚úÖ Submitted: S2A_MSIL2A_20251020T104101_N0511_R008_T31TEH_20251020T142609
‚úÖ Submitted: S2A_MSIL2A_20251020T104101_N0511_R008_T31TDH_20251020T142609
‚úÖ Submitted: S2C_MSIL2A_20251018T104051_N0511_R008_T31TEH_20251018T205314
‚úÖ Submitted: S2C_MSIL2A_20251018T104051_N0511_R008_T31TD

## Submit Specific Items (Optional)

If you want to submit only specific items instead of all found items, you can manually select them:


In [32]:
# Example: Submit only specific items by index
# Uncomment and modify as needed

# selected_indices = [0, 1, 2]  # Select first 3 items
#
# for idx in selected_indices:
#     if idx < len(items):
#         item = items[idx]
#         item_url = next((link.href for link in item.links if link.rel == "self"), None)
#
#         if item_url:
#             if submit_item_to_pipeline(item_url, target_collection):
#                 print(f"‚úÖ Submitted: {item.id}")
#             else:
#                 print(f"‚ùå Failed: {item.id}")
#     else:
#         print(f"‚ö†Ô∏è  Index {idx} out of range")