# Storage Tier Management Workflow

This notebook demonstrates how to use the `change_storage_tier.py` and `update_stac_storage_tier.py` scripts for automated storage tier management.

## Purpose

These scripts are designed to be integrated into **automated cron workflows** that optimize storage costs by moving older datasets to cheaper storage tiers while keeping metadata synchronized.

### Automated Workflow Process:

1. **Inspect current storage tier metadata** in STAC items
2. **Change S3 object storage classes** based on dataset age and access patterns  
3. **Update STAC catalog** with new storage tier information
4. **Verify changes** were applied correctly

**Use Case**: Implement automated data lifecycle management to reduce storage costs as datasets age, moving from expensive immediate-access storage (STANDARD) to cheaper archive storage (STANDARD_IA) over time.

### Background Context

This workflow addresses the need for automated storage optimization in large-scale Earth observation data archives:

- **Issue #178**: [Storage tier optimization strategy](https://github.com/EOPF-Explorer/coordination/issues/178)
- **Issue #182**: [Automated storage class transitions](https://github.com/EOPF-Explorer/coordination/issues/182)

## Two-Step Process Overview

### Step A: Change S3 Storage Classes
Use `change_storage_tier.py` to change the actual storage class of objects in S3.

### Step B: Update STAC Metadata  
Use `update_stac_storage_tier.py` to update STAC items with current storage tier information.

## Setup and Imports

In [None]:
import json
import sys
from pathlib import Path

from pystac import Item
from pystac_client import Client

# Add scripts directory to path to import storage_tier_utils
# Notebook is in: operator-tools/example_notebooks/
# Scripts are in: scripts/
# So we need to go up 2 levels then into scripts
scripts_path = Path.cwd().parent.parent / "scripts"
sys.path.insert(0, str(scripts_path))

try:
    import storage_tier_utils  # Import module to verify availability  # noqa: F401

    print("‚úÖ Storage tier utilities imported successfully")
except ImportError as e:
    print(f"‚ö†Ô∏è  Could not import storage_tier_utils: {e}")
    print("   Make sure you're running from the correct directory")

‚úÖ Storage tier utilities imported successfully


## Configuration

In [281]:
# STAC API Configuration - using staging collection
STAC_API_URL = "https://api.explorer.eopf.copernicus.eu/stac"
COLLECTION = "sentinel-2-l2a-staging"

# S3 Configuration (if needed for direct queries)
S3_ENDPOINT = "https://s3.de.io.cloud.ovh.net"  # Example OVHcloud endpoint

print("‚úÖ Configuration loaded")
print(f"   STAC API:  {STAC_API_URL}")
print(f"   Collection: {COLLECTION}")

‚úÖ Configuration loaded
   STAC API:  https://api.explorer.eopf.copernicus.eu/stac
   Collection: sentinel-2-l2a-staging


## Helper Functions

In [282]:
def extract_storage_tier_info(item: Item) -> dict[str, any]:
    """
    Extract storage tier information from a STAC item's assets.

    Args:
        item: PySTAC Item object

    Returns:
        Dictionary with storage tier information per asset
    """
    storage_info = {"item_id": item.id, "assets": {}}

    for asset_key, asset in item.assets.items():
        asset_info = {
            "href": asset.href,
            "storage_tier": None,
            "has_alternate_s3": False,
            "storage_tier_distribution": None,
        }

        # Check for storage tier in alternate.s3 metadata
        if (
            hasattr(asset, "extra_fields")
            and asset.extra_fields
            and "alternate" in asset.extra_fields
        ):
            alternate = asset.extra_fields.get("alternate", {})
            if isinstance(alternate, dict) and "s3" in alternate:
                s3_info = alternate.get("s3", {})
                if isinstance(s3_info, dict):
                    asset_info["has_alternate_s3"] = True

                    # Get storage tier
                    if "ovh:storage_tier" in s3_info:
                        asset_info["storage_tier"] = s3_info["ovh:storage_tier"]

                    # Get storage tier distribution (for mixed storage)
                    if "ovh:storage_tier_distribution" in s3_info:
                        asset_info["storage_tier_distribution"] = s3_info[
                            "ovh:storage_tier_distribution"
                        ]

        storage_info["assets"][asset_key] = asset_info

    return storage_info


def display_storage_tier_summary(storage_info: dict[str, any]) -> None:
    """
    Display a formatted summary of storage tier information.

    Args:
        storage_info: Dictionary from extract_storage_tier_info()
    """
    print(f"\nüì¶ Item: {storage_info['item_id']}")
    print(f"   Assets: {len(storage_info['assets'])} total")

    # Count storage tiers
    tier_counts = {}
    assets_with_tier = 0
    assets_with_s3_alternate = 0

    for _asset_key, asset_info in storage_info["assets"].items():
        if asset_info.get("has_alternate_s3", False):
            assets_with_s3_alternate += 1

        tier = asset_info.get("storage_tier")
        if tier:
            assets_with_tier += 1
            tier_counts[tier] = tier_counts.get(tier, 0) + 1

    # Display summary
    print(f"   Assets with S3 alternate: {assets_with_s3_alternate}/{len(storage_info['assets'])}")

    if assets_with_tier > 0:
        print(
            f"   ‚úÖ Assets with storage tier info: {assets_with_tier}/{len(storage_info['assets'])}"
        )
        print("   Storage tier distribution:")
        for tier, count in sorted(tier_counts.items()):
            print(f"      - {tier}: {count} asset(s)")
    else:
        print("   ‚ö†Ô∏è  No storage tier information found in any assets")


def display_detailed_asset_info(storage_info: dict[str, any], max_assets: int = 5) -> None:
    """
    Display detailed storage tier information for each asset.

    Args:
        storage_info: Dictionary from extract_storage_tier_info()
        max_assets: Maximum number of assets to display in detail
    """
    print(f"\nüìÑ Detailed Asset Information (showing first {max_assets} assets):\n")

    for assets_shown, (asset_key, asset_info) in enumerate(storage_info["assets"].items()):
        if assets_shown >= max_assets:
            remaining = len(storage_info["assets"]) - max_assets
            print(f"\n   ... and {remaining} more asset(s)")
            break

        print(f"   Asset: {asset_key}")
        print(f"      Storage Tier: {asset_info.get('storage_tier') or '‚ùå Not set'}")
        print(f"      Has S3 Alternate: {asset_info.get('has_alternate_s3', False)}")

        # Show distribution if available
        distribution = asset_info.get("storage_tier_distribution")
        if distribution and isinstance(distribution, dict):
            print(f"      Tier Distribution: {distribution}")

        href = asset_info.get("href", "")
        if len(href) > 80:
            print(f"      HREF: {href[:80]}...")
        else:
            print(f"      HREF: {href}")
        print()


print("‚úÖ Helper functions loaded")

‚úÖ Helper functions loaded


## Step 1: Search for STAC Items

In [283]:
# Define search parameters
# Area of Interest (AOI) - Bounding box: [min_lon, min_lat, max_lon, max_lat]
# Example: Rome area
# aoi_bbox = [12.4, 41.8, 12.6, 42.0]
# Example 2: Majorca area (2.1697998046875004%2C39.21097520599528%2C3.8177490234375004)
# aoi_bbox = [2.16, 39.21, 3.82, 39.78]
# Example 3: France Full
aoi_bbox = [-5.14, 41.33, 9.56, 51.09]
# Example 4: Lagoon From Venice to Trieste
# aoi_bbox = [12.0, 44.4, 14.0, 46.0]
# La Palma Island
# aoi_bbox = [-18, 27.4, -13.70, 29.5]
# Italy Full
# aoi_bbox = [6.627265, 35.492537, 18.513648, 47.092146]
# 2025 Corbi√®res Massif wildfire area
# aoi_bbox = [2.4, 42.8, 3.2, 43.1]
# Pi√≥d√£o, Portugal wildfire area
# aoi_bbox = [-7.866, 40.316, -7.633, 40.483]

# Time range
start_date = "2025-07-30T00:00:00Z"
end_date = "2025-07-31T23:59:59Z"

print("üîç Searching for items... ")
print(f"   Collection: {COLLECTION}")
print(f"   AOI: {aoi_bbox}")
print(f"   Time Range: {start_date} to {end_date}")

üîç Searching for items... 
   Collection: sentinel-2-l2a-staging
   AOI: [-5.14, 41.33, 9.56, 51.09]
   Time Range: 2025-07-30T00:00:00Z to 2025-07-31T23:59:59Z


In [284]:
# Connect to STAC API and search
print("üîó Connecting to STAC API...")
catalog = Client.open(STAC_API_URL)
print(f"‚úÖ Connected to: {catalog}")

print("üîç Executing search...")
search = catalog.search(
    collections=[COLLECTION],
    bbox=aoi_bbox,
    datetime=f"{start_date}/{end_date}",
    limit=10,  # Limit to 10 items for demonstration
)

print("üìÑ Processing search results...")
# Collect items
items = []
total_features = 0
skipped_deprecated = 0
skipped_errors = 0

for page_dict in search.pages_as_dicts():
    features_in_page = page_dict.get("features", [])
    total_features += len(features_in_page)
    print(f"   Processing page with {len(features_in_page)} features...")

    for feature in features_in_page:
        # Skip deprecated items
        properties = feature.get("properties", {})
        if properties.get("deprecated", False):
            skipped_deprecated += 1
            continue

        # Clean assets with missing href
        if "assets" in feature:
            original_assets = len(feature["assets"])
            feature["assets"] = {
                key: asset for key, asset in feature["assets"].items() if "href" in asset
            }
            cleaned_assets = len(feature["assets"])
            if original_assets != cleaned_assets:
                print(
                    f"   Cleaned {original_assets - cleaned_assets} assets without href from {feature.get('id', 'unknown')}"
                )

        try:
            item = Item.from_dict(feature)
            items.append(item)
        except Exception as e:
            item_id = feature.get("id", "unknown")
            print(f"‚ö†Ô∏è  Skipping item {item_id}: {e}")
            skipped_errors += 1
            continue

print("\nüìä Search Results Summary:")
print(f"   Total features found: {total_features}")
print(f"   Deprecated items skipped: {skipped_deprecated}")
print(f"   Items with errors skipped: {skipped_errors}")
print(f"   ‚úÖ Valid items collected: {len(items)}")

if items:
    print("\nüìã Sample items:")
    for i, item in enumerate(items[:5], 1):
        print(f"  {i}. {item.id}")
        # Check if this item has any alternate.s3 information
        has_s3_alternate = False
        for _asset_key, asset in item.assets.items():
            if (
                hasattr(asset, "extra_fields")
                and "alternate" in asset.extra_fields
                and "s3" in asset.extra_fields.get("alternate", {})
            ):
                has_s3_alternate = True
                break
        print(f"      Has S3 alternate: {has_s3_alternate}")
else:
    print("‚ö†Ô∏è  No items found! Check your search parameters:")
    print(f"   - Collection: {COLLECTION}")
    print(f"   - Bbox: {aoi_bbox}")
    print(f"   - Date range: {start_date} to {end_date}")
    print("   Try adjusting the date range or bounding box.")

üîó Connecting to STAC API...
‚úÖ Connected to: <Client id=eopf-sentinel-explorer>
üîç Executing search...
üìÑ Processing search results...
   Processing page with 10 features...
   Processing page with 1 features...

üìä Search Results Summary:
   Total features found: 11
   Deprecated items skipped: 0
   Items with errors skipped: 0
   ‚úÖ Valid items collected: 11

üìã Sample items:
  1. S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754
      Has S3 alternate: True
  2. S2B_MSIL2A_20250730T113319_N0511_R080_T29UQR_20250730T135754
      Has S3 alternate: True
  3. S2C_MSIL2A_20250730T104041_N0511_R008_T32TLS_20250730T160714
      Has S3 alternate: True
  4. S2C_MSIL2A_20250730T104041_N0511_R008_T31UGQ_20250730T160714
      Has S3 alternate: True
  5. S2C_MSIL2A_20250730T104041_N0511_R008_T31UFS_20250730T160714
      Has S3 alternate: True


## Step 2: Inspect BEFORE - Current Storage Tier Metadata

In [285]:
# Analyze storage tier information for all found items
print("=" * 80)
print("BEFORE:  Current Storage Tier Metadata")
print("=" * 80)

storage_info_before = []

if not items:
    print("‚ùå No items to analyze! The items list is empty.")
    print("   This might happen if:")
    print("   1. No items match your search criteria")
    print("   2. All items were filtered out (deprecated, errors, etc.)")
    print("   3. There was an issue with the STAC API connection")
else:
    for item in items:
        info = extract_storage_tier_info(item)
        storage_info_before.append(info)
        display_storage_tier_summary(info)

# Overall statistics
total_items = len(items)
items_with_tier = sum(
    1
    for info in storage_info_before
    if any(asset.get("storage_tier") for asset in info["assets"].values())
)

print("\n" + "=" * 80)
print("üìä Overall Statistics (BEFORE):")
print(f"   Total items analyzed: {total_items}")
print(f"   Items with storage tier metadata: {items_with_tier}/{total_items}")
print(f"   Items missing storage tier metadata: {total_items - items_with_tier}/{total_items}")
print("=" * 80)

BEFORE:  Current Storage Tier Metadata

üì¶ Item: S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754
   Assets: 5 total
   Assets with S3 alternate: 4/5
   ‚úÖ Assets with storage tier info: 4/5
   Storage tier distribution:
      - EXPRESS_ONEZONE: 1 asset(s)
      - STANDARD: 3 asset(s)

üì¶ Item: S2B_MSIL2A_20250730T113319_N0511_R080_T29UQR_20250730T135754
   Assets: 5 total
   Assets with S3 alternate: 4/5
   ‚ö†Ô∏è  No storage tier information found in any assets

üì¶ Item: S2C_MSIL2A_20250730T104041_N0511_R008_T32TLS_20250730T160714
   Assets: 5 total
   Assets with S3 alternate: 4/5
   ‚ö†Ô∏è  No storage tier information found in any assets

üì¶ Item: S2C_MSIL2A_20250730T104041_N0511_R008_T31UGQ_20250730T160714
   Assets: 5 total
   Assets with S3 alternate: 4/5
   ‚ö†Ô∏è  No storage tier information found in any assets

üì¶ Item: S2C_MSIL2A_20250730T104041_N0511_R008_T31UFS_20250730T160714
   Assets: 5 total
   Assets with S3 alternate: 4/5
   ‚ö†Ô∏è  No storage 

## Step 3: Display Detailed Asset Information (Sample Item)

In [286]:
# Show detailed information for the first item
if storage_info_before:
    print("\n" + "=" * 80)
    print("Detailed View: First Item")
    print("=" * 80)
    display_detailed_asset_info(storage_info_before[0], max_assets=10)
else:
    print("No items to display")


Detailed View: First Item

üìÑ Detailed Asset Information (showing first 10 assets):

   Asset: AOT_10m
      Storage Tier: STANDARD
      Has S3 Alternate: True
      Tier Distribution: {'STANDARD': 2}
      HREF: https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/tests-outp...

   Asset: SCL_20m
      Storage Tier: STANDARD
      Has S3 Alternate: True
      Tier Distribution: {'STANDARD': 2}
      HREF: https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/tests-outp...

   Asset: WVP_10m
      Storage Tier: STANDARD
      Has S3 Alternate: True
      Tier Distribution: {'STANDARD': 2}
      HREF: https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/tests-outp...

   Asset: thumbnail
      Storage Tier: ‚ùå Not set
      Has S3 Alternate: False
      HREF: https://api.explorer.eopf.copernicus.eu/raster/collections/sentinel-2-l2a-stagin...

   Asset: reflectance
      Storage Tier: EXPRESS_ONEZONE
      Has S3 Alternate: True
    

## Step 4: Change S3 Storage Classes

**This is the first step of the two-step process to change storage tiers.**

The `change_storage_tier.py` script changes the actual storage class of S3 objects. This affects storage costs and access patterns.

### Available Storage Classes:
- **STANDARD**: Immediate access, higher cost
- **STANDARD_IA**: Archive storage, lower cost, retrieval required
- **EXPRESS_ONEZONE**: High-performance, single AZ

### Prerequisites:
1. AWS credentials configured for S3 access
2. STAC item with `alternate.s3` metadata

In [287]:
# STEP A: Change S3 Storage Classes
# This step changes the actual storage class of objects in S3

if items:
    # Get the first item for demonstration
    sample_item = items[0]
    item_id = sample_item.id

    print("üîÑ STEP A: Changing S3 Storage Classes")
    print("=" * 50)
    print(f"Sample Item: {item_id}")
    print("")
    print("üí° To change storage class from STANDARD to STANDARD_IA, run:")
    print("")
    print("# 1. Preview changes (dry run)")
    print("uv run python scripts/change_storage_tier.py \\")
    print(
        f'    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/{COLLECTION}/items/{item_id}" \\'
    )
    print("    --storage-class STANDARD_IA \\")
    print(f'    --s3-endpoint "{S3_ENDPOINT}" \\')
    print("    --dry-run")
    print("")
    print("# 2. Apply changes")
    print("uv run python scripts/change_storage_tier.py \\")
    print(
        f'    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/{COLLECTION}/items/{item_id}" \\'
    )
    print("    --storage-class STANDARD_IA \\")
    print(f'    --s3-endpoint "{S3_ENDPOINT}"')
    print("")
    print("üìã Optional filtering examples:")
    print("# Only change reflectance data:")
    print("# Add: --include-pattern 'measurements/reflectance/*'")
    print("# Only 60m resolution data:")
    print("# Add: --include-pattern '*/r60m/*'")
    print("# Exclude metadata files:")
    print("# Add: --exclude-pattern '*.zmetadata'")
else:
    print("‚ùå No items available for demonstration")

üîÑ STEP A: Changing S3 Storage Classes
Sample Item: S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754

üí° To change storage class from STANDARD to STANDARD_IA, run:

# 1. Preview changes (dry run)
uv run python scripts/change_storage_tier.py \
    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754" \
    --storage-class STANDARD_IA \
    --s3-endpoint "https://s3.de.io.cloud.ovh.net" \
    --dry-run

# 2. Apply changes
uv run python scripts/change_storage_tier.py \
    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754" \
    --storage-class STANDARD_IA \
    --s3-endpoint "https://s3.de.io.cloud.ovh.net"

üìã Optional filtering examples:
# Only change reflectance data:
# Add: --include-pattern 'measurements/reflectance/*'
# Only 60m resolution dat

## Step 5: Update STAC Catalog Metadata

**This is the second step of the two-step process.**

After changing S3 storage classes, update the STAC catalog to reflect the current storage tier information.

In [288]:
# STEP B: Update STAC Catalog with Storage Tier Information
# This step updates the STAC catalog with current storage tier metadata

if items:
    sample_item = items[0]
    item_id = sample_item.id

    print("üìù STEP B: Updating STAC Catalog Metadata")
    print("=" * 50)
    print(f"Sample Item: {item_id}")
    print("")
    print("üí° To update STAC catalog with current storage tier information, run:")
    print("")
    print("# 1. Preview changes (dry run)")
    print("uv run python scripts/update_stac_storage_tier.py \\")
    print(
        f'    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/{COLLECTION}/items/{item_id}" \\'
    )
    print(f'    --stac-api-url "{STAC_API_URL}" \\')
    print(f'    --s3-endpoint "{S3_ENDPOINT}" \\')
    print("    --dry-run")
    print("")
    print("# 2. Apply updates")
    print("uv run python scripts/update_stac_storage_tier.py \\")
    print(
        f'    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/{COLLECTION}/items/{item_id}" \\'
    )
    print(f'    --stac-api-url "{STAC_API_URL}" \\')
    print(f'    --s3-endpoint "{S3_ENDPOINT}"')
    print("")
    print("üìã For legacy items without alternate.s3 metadata:")
    print("# Add: --add-missing")
    print("")
    print("‚ÑπÔ∏è  This script will:")
    print("   - Query S3 for current storage classes")
    print("   - Update 'ovh:storage_tier' field in alternate.s3")
    print("   - Handle Zarr directories with mixed storage classes")
    print("   - Add storage tier distribution for mixed storage")
else:
    print("‚ùå No items available for demonstration")

üìù STEP B: Updating STAC Catalog Metadata
Sample Item: S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754

üí° To update STAC catalog with current storage tier information, run:

# 1. Preview changes (dry run)
uv run python scripts/update_stac_storage_tier.py \
    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754" \
    --stac-api-url "https://api.explorer.eopf.copernicus.eu/stac" \
    --s3-endpoint "https://s3.de.io.cloud.ovh.net" \
    --dry-run

# 2. Apply updates
uv run python scripts/update_stac_storage_tier.py \
    --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754" \
    --stac-api-url "https://api.explorer.eopf.copernicus.eu/stac" \
    --s3-endpoint "https://s3.de.io.cloud.ovh.net"

üìã For legacy items without alternate.s3 metadata:
# A

## Step 6: Verify Changes

After completing both steps, re-run the analysis to confirm changes were applied.

**Verify that both storage class changes and STAC updates were successful.**

In [289]:
# Re-fetch and analyze the same items to verify changes
print("=" * 80)
print("VERIFICATION: Current Storage Tier Metadata (After Changes)")
print("=" * 80)
print("")
print("üìã To verify changes were applied, re-run the STAC search and analysis:")
print("   1. Re-execute the search cell to fetch fresh data")
print("   2. Re-run this analysis to see updated storage tier metadata")
print("")
print("üîÑ For immediate verification, you could also run:")
if items:
    sample_item = items[0]
    item_id = sample_item.id
    print("   # Check storage tier metadata in STAC:")
    print(
        f'   curl "https://api.explorer.eopf.copernicus.eu/stac/collections/{COLLECTION}/items/{item_id}" | jq ".assets[].alternate.s3."ovh:storage_tier""'
    )
    print("   ")
    print("   # Check actual S3 storage classes:")
    print(
        '   aws s3api list-objects-v2 --bucket BUCKET --prefix PREFIX --query "Contents[0:5].[Key,StorageClass]"'
    )
print("")

# For demonstration, we'll re-analyze the current items
storage_info_verification = []

for item in items:
    info = extract_storage_tier_info(item)
    storage_info_verification.append(info)
    display_storage_tier_summary(info)

print("\n" + "=" * 80)

# Overall statistics
total_items = len(items)
items_with_tier_verification = sum(
    1
    for info in storage_info_verification
    if any(asset.get("storage_tier") for asset in info["assets"].values())
)

print("üìä Verification Statistics:")
print(f"   Total items analyzed: {total_items}")
print(f"   Items with storage tier metadata: {items_with_tier_verification}/{total_items}")
print(
    f"   Items missing storage tier metadata: {total_items - items_with_tier_verification}/{total_items}"
)
print("=" * 80)

VERIFICATION: Current Storage Tier Metadata (After Changes)

üìã To verify changes were applied, re-run the STAC search and analysis:
   1. Re-execute the search cell to fetch fresh data
   2. Re-run this analysis to see updated storage tier metadata

üîÑ For immediate verification, you could also run:
   # Check storage tier metadata in STAC:
   curl "https://api.explorer.eopf.copernicus.eu/stac/collections/sentinel-2-l2a-staging/items/S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754" | jq ".assets[].alternate.s3."ovh:storage_tier""
   
   # Check actual S3 storage classes:
   aws s3api list-objects-v2 --bucket BUCKET --prefix PREFIX --query "Contents[0:5].[Key,StorageClass]"


üì¶ Item: S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754
   Assets: 5 total
   Assets with S3 alternate: 4/5
   ‚úÖ Assets with storage tier info: 4/5
   Storage tier distribution:
      - EXPRESS_ONEZONE: 1 asset(s)
      - STANDARD: 3 asset(s)

üì¶ Item: S2B_MSIL2A_20250730T11331

## Step 7: Compare Before vs After

In [290]:
# COMPARISON: Before vs After Workflow
print("\n" + "=" * 80)
print("üìä WORKFLOW COMPARISON TEMPLATE")
print("=" * 80)
print("")
print("This section shows how to compare results before and after the workflow.")
print("To see actual before/after comparison:")
print("")
print("1. üîç BEFORE: Run this notebook to capture initial state")
print("2. üîÑ EXECUTE: Run both change_storage_tier.py and update_stac_storage_tier.py")
print("3. üîç AFTER: Re-run this notebook to see updated state")
print("")

if storage_info_before:
    print("üìã Current Analysis Summary:")
    print(f"   Total items analyzed: {total_items}")
    print(f"   Items with storage tier metadata: {items_with_tier}/{total_items}")

    if items_with_tier > 0:
        print("\nüìà Cost Optimization Opportunities:")
        print(f"   - Items already have storage tier metadata: {items_with_tier}")
        print("   - Ready for storage class optimization")
        print("   - Can use filtering to target specific data types")
    else:
        print("\n‚öôÔ∏è  Setup Required:")
        print("   - No storage tier metadata found")
        print("   - Need to run update_stac_storage_tier.py first")
        print("   - Use --add-missing flag for legacy items")

    # Show example workflow for first item
    if items:
        sample_item = items[0]
        print(f"\nüí° Example Workflow for Item: {sample_item.id}")
        print("   # Step A: Change storage classes")
        print("   uv run python scripts/change_storage_tier.py \\")
        print(f'       --stac-item-url ".../items/{sample_item.id}" \\')
        print("       --storage-class STANDARD_IA --dry-run")
        print("")
        print("   # Step B: Update STAC metadata")
        print("   uv run python scripts/update_stac_storage_tier.py \\")
        print(f'       --stac-item-url ".../items/{sample_item.id}" \\')
        print(f'       --stac-api-url "{STAC_API_URL}" \\')
        print(f'       --s3-endpoint "{S3_ENDPOINT}"')

print("\n" + "=" * 80)


üìä WORKFLOW COMPARISON TEMPLATE

This section shows how to compare results before and after the workflow.
To see actual before/after comparison:

1. üîç BEFORE: Run this notebook to capture initial state
2. üîÑ EXECUTE: Run both change_storage_tier.py and update_stac_storage_tier.py
3. üîç AFTER: Re-run this notebook to see updated state

üìã Current Analysis Summary:
   Total items analyzed: 11
   Items with storage tier metadata: 1/11

üìà Cost Optimization Opportunities:
   - Items already have storage tier metadata: 1
   - Ready for storage class optimization
   - Can use filtering to target specific data types

üí° Example Workflow for Item: S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754
   # Step A: Change storage classes
   uv run python scripts/change_storage_tier.py \
       --stac-item-url ".../items/S2B_MSIL2A_20250730T113319_N0511_R080_T30UUU_20250730T135754" \
       --storage-class STANDARD_IA --dry-run

   # Step B: Update STAC metadata
   uv run p

## Step 8: Export Sample STAC Item with Storage Tier Metadata

In [291]:
# Expected STAC Item Structure with Storage Tier Metadata
print("\n" + "=" * 80)
print("üìã Expected STAC Asset Structure (After Updates)")
print("=" * 80)

print("\nAfter running the complete workflow, STAC assets should contain:")
print("")

# Show the expected structure
expected_structure = {
    "href": "https://example.com/path/to/asset",
    "type": "application/x-zarr",
    "roles": ["data"],
    "alternate": {
        "s3": {
            "href": "s3://bucket/path/to/asset.zarr",
            "storage:platform": "OVHcloud",
            "storage:region": "de",
            "storage:requester_pays": False,
            "ovh:storage_tier": "STANDARD_IA",
            "ovh:storage_tier_distribution": {"STANDARD": 450, "STANDARD_IA": 608},
        }
    },
}

print(json.dumps(expected_structure, indent=2))

print("\nüîë Key Fields Explanation:")
print("   üìÅ href: Original HTTPS URL to the asset")
print("   üóÉÔ∏è  alternate.s3.href: S3 URL for direct access")
print("   üè∑Ô∏è  ovh:storage_tier: Current storage class (STANDARD, STANDARD_IA, etc.)")
print("   üìä ovh:storage_tier_distribution: File count per tier (for Zarr with mixed storage)")
print("   üåç storage:region: OVH Cloud region (de, gra, sbg, etc.)")

if items:
    sample_item = items[0]
    print(f"\nüîç Current Sample Asset Structure for: {sample_item.id}")

    if sample_item.assets:
        asset_key = list(sample_item.assets.keys())[0]
        asset = sample_item.assets[asset_key]

        print(f"\nAsset: {asset_key}")

        # Create actual representation
        actual_structure = {
            "href": asset.href,
            "type": asset.media_type,
            "roles": asset.roles,
        }

        if (
            hasattr(asset, "extra_fields")
            and asset.extra_fields
            and "alternate" in asset.extra_fields
        ):
            actual_structure["alternate"] = asset.extra_fields["alternate"]
        else:
            actual_structure["alternate"] = "‚ùå Not present - run update_stac_storage_tier.py"

        print(json.dumps(actual_structure, indent=2))

    print("\nüí° To add/update storage tier metadata for this item:")
    print("   uv run python scripts/update_stac_storage_tier.py \\")
    print(
        f'       --stac-item-url "https://api.explorer.eopf.copernicus.eu/stac/collections/{COLLECTION}/items/{sample_item.id}" \\'
    )
    print(f'       --stac-api-url "{STAC_API_URL}" \\')
    print(f'       --s3-endpoint "{S3_ENDPOINT}"')
else:
    print("\n‚ùå No items available to show current structure")


üìã Expected STAC Asset Structure (After Updates)

After running the complete workflow, STAC assets should contain:

{
  "href": "https://example.com/path/to/asset",
  "type": "application/x-zarr",
  "roles": [
    "data"
  ],
  "alternate": {
    "s3": {
      "href": "s3://bucket/path/to/asset.zarr",
      "storage:platform": "OVHcloud",
      "storage:region": "de",
      "storage:requester_pays": false,
      "ovh:storage_tier": "STANDARD_IA",
      "ovh:storage_tier_distribution": {
        "STANDARD": 450,
        "STANDARD_IA": 608
      }
    }
  }
}

üîë Key Fields Explanation:
   üìÅ href: Original HTTPS URL to the asset
   üóÉÔ∏è  alternate.s3.href: S3 URL for direct access
   üè∑Ô∏è  ovh:storage_tier: Current storage class (STANDARD, STANDARD_IA, etc.)
   üìä ovh:storage_tier_distribution: File count per tier (for Zarr with mixed storage)
   üåç storage:region: OVH Cloud region (de, gra, sbg, etc.)

üîç Current Sample Asset Structure for: S2B_MSIL2A_20250730T113319

## Summary: Automated Storage Tier Management

This notebook demonstrated how to use the storage tier management scripts that are designed for **automated cron workflows**.

### üîÑ Production Automation

In production, these scripts will be integrated into automated workflows that:

1. **Monitor dataset age** and access patterns
2. **Automatically trigger storage class changes** based on predefined policies
3. **Keep STAC metadata synchronized** with actual S3 storage classes
4. **Generate reports** on cost savings and storage optimization

### ‚úÖ Manual Workflow (Demonstrated in this Notebook)

#### Step 1: Inspect Current State
1. Query STAC items from the catalog
2. Extract and analyze current storage tier metadata
3. Identify optimization opportunities

#### Step 2: Change S3 Storage Classes
Use `change_storage_tier.py` to modify actual S3 object storage classes:
```bash
# Preview changes
uv run python scripts/change_storage_tier.py \
    --stac-item-url "STAC_ITEM_URL" \
    --storage-class STANDARD_IA \
    --dry-run

# Apply changes  
uv run python scripts/change_storage_tier.py \
    --stac-item-url "STAC_ITEM_URL" \
    --storage-class STANDARD_IA
```

#### Step 3: Update STAC Catalog
Use `update_stac_storage_tier.py` to sync STAC metadata with S3:
```bash
# Preview updates
uv run python scripts/update_stac_storage_tier.py \
    --stac-item-url "STAC_ITEM_URL" \
    --stac-api-url "STAC_API_URL" \
    --s3-endpoint "S3_ENDPOINT" \
    --dry-run

# Apply updates
uv run python scripts/update_stac_storage_tier.py \
    --stac-item-url "STAC_ITEM_URL" \
    --stac-api-url "STAC_API_URL" \
    --s3-endpoint "S3_ENDPOINT"
```

#### Step 4: Verify Results
1. Re-run analysis to confirm changes
2. Check storage tier distribution
3. Verify cost optimization goals achieved

### üéØ Key Benefits of Automation

- **Cost Optimization**: Automatic transition of older datasets to cheaper storage tiers
- **Hands-off Management**: Reduces manual intervention for large-scale data archives
- **Policy-Driven**: Configure rules based on dataset age, access patterns, or other criteria
- **Metadata Consistency**: Automated synchronization between S3 and STAC catalog
- **Audit Trail**: Complete tracking of storage transitions and cost savings
- **Safe Operations**: Dry-run capabilities and validation checks

### üîó Integration Context

These scripts support the automated data lifecycle management strategy outlined in:

- **[Issue #178](https://github.com/EOPF-Explorer/coordination/issues/178)**: Storage tier optimization strategy
- **[Issue #182](https://github.com/EOPF-Explorer/coordination/issues/182)**: Automated storage class transitions

### üìö Related Documentation

- [README_change_storage_tier.md](../scripts/README_change_storage_tier.md) - Detailed usage guide for changing storage classes
- [README_update_storage_tier.md](../scripts/README_update_storage_tier.md) - Guide for updating STAC metadata
- [storage_tier_utils.py](../scripts/storage_tier_utils.py) - Utility functions for storage operations