# EO-Lab Tutorial: Query and Download of EnMAP Data from the EOC Geoservice

This notebook demonstrates how to query and download EnMAP data from the EOC Geoservice STAC catalogue using a curl-based approach. The notebook is divided into several cells for clarity.

## Session Cookie Instructions

To download data from the EOC Geoservice using `curl`, you must first obtain the session cookie from the EOC Geoservice UMS. Follow these steps:

1. **Access the Login Page**: Open the following link in your browser:
   
   [EO-LAB (ENMAP) Login](https://sso.eoc.dlr.de/eoc/auth/login?service=https://download.geoservice.dlr.de/ENMAP/files/L2A/)
   
   On the right side under **External Identity Providers**, click **EO-LAB (ENMAP)**.

2. **Log in**: Enter your EO-Lab credentials when prompted.

3. **Open Developer Tools**: Press **F12** (or open your browser's Web Developer Tools). If you miss it, refresh the page with **F5**.

4. **Locate the GET Request**: In the **Network** tab, look for a GET request resembling the following URL:
   
   `https://download.geoservice.dlr.de/ENMAP/files/L2A/?ticket=ST-11848-nWxfdPAbhjmUoQdgPeS8uTr-gps-auth`

5. **Copy the cURL Command**: Right-click on that request and select **Copy as cURL**. The exact command you copy does not matter since the code handles both POSIX and Windows environments.

6. **Update the Notebook**: Replace the example `CURL_COMMAND` in the user configuration cell with your own cURL command containing your session cookie.

Note: Session cookies expire over time. For long downloads or additional files, repeat this procedure to obtain a new session cookie.

### Imports

The following cell imports all the necessary libraries required by the notebook.

In [None]:
import os              # For operating system interactions (file paths, directories)
import re              # For regular expressions (pattern matching and extraction)
import requests        # For making HTTP requests (downloading files)
from datetime import datetime  # For date/time manipulation if needed
from pystac_client import Client  # For accessing and querying the STAC catalogue
from pathlib import Path  # For high-level file system path operations
from math import cos, radians  # For geographic calculations (used to compute bounding boxes)

### User Configuration

The following cell contains user-configurable parameters. Update the `CURL_COMMAND` with your session cookie, set filtering options, define which collections and assets to download, and specify search parameters (area of interest, time range, etc.).

In [None]:
# ---------------------------
#      USER CONFIGURATION
# ---------------------------

# CURL command simulating a browser request; used to extract necessary HTTP headers.
# IMPORTANT: Replace this example with your own cURL command (obtained as explained above).
# You can paste your cURL command directly as a single line within the triple quotes.
# For example:
# CURL_COMMAND = """curl "https://download.geoservice.dlr.de/ENMAP/files/L2A/?ticket=YOUR_TICKET" -H "User-Agent: YourAgent" -H "Cookie: session=YOUR_SESSION" """ 
CURL_COMMAND = """
curl "https://download.geoservice.dlr.de/ENMAP/files/L2A/?ticket=ST-11848-nWxfdPAbhjmUoQdgPeS8uTr-gps-auth" \
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0" \
  -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8" \
  -H "Accept-Language: de-DE,de;q=0.8,en-US;q=0.5,en;q=0.3" \
  -H "Accept-Encoding: gzip, deflate, br, zstd" \
  -H "DNT: 1" \
  -H "Connection: keep-alive" \
  -H "Cookie: session=Gpmbm3K-UAvKO0UmTO49OQ^|1739786884^|9zxCUa2Pz8OLhoe4zRQSYX515svNoPyKsLa93-7PevePOm7SkQBhBtKkWtOczQDe_v7f36KBsDIvQ9JIeJynkJntLlgpB3RKfsyqZV7fblipH2uMRJE3n4BDurIwMymM9KL4I8dEOruktKRpLCdgxyjurRXAQH80bK97bXq5Y2k^|cPfahrZ_q1kpCsKub_5j4QiiPlI" \
  -H "Upgrade-Insecure-Requests: 1" \
  -H "Sec-Fetch-Dest: document" \
  -H "Sec-Fetch-Mode: navigate" \
  -H "Sec-Fetch-Site: same-site" \
  -H "Priority: u=0, i"
"""

# Extract base URL from CURL_COMMAND (for potential modifications)
CURL_PARTS = CURL_COMMAND.split(' ')
BASE_URL = CURL_PARTS[1][1:-1]

# Cloud cover filter: only process items with cloud cover below the specified maximum.
CLOUD_COVER_FILTER = {
    "enabled": True,         # Set to True to enable filtering.
    "max_coverage": 5.0        # Maximum allowed cloud cover percentage.
}

# Flag to print all properties (metadata) for each item.
PRINT_PROPERTIES = True

# Define downloads for each collection.

# L2A Collection - uses lowercase asset names
DOWNLOADS = {
    "ENMAP_HSI_L2A": {
        "enabled": True,  # Set to True to download this collection
        "assets": [
            "image",                # Main spectral image
            #"metadata",           # Metadata file
            #"vnir",              # VNIR sensor data
            #"swir",              # SWIR sensor data
            #"thumbnail",         # Small preview image
            #"quality_classes",   # Quality classification
            #"quality_cloud",     # Cloud mask
            #"quality_cloud_shadow", # Cloud shadow mask
            #"quality_haze",      # Haze mask
            #"quality_cirrus",    # Cirrus mask
            #"quality_snow",      # Snow mask
            #"quality_testflags", # Quality test flags
            #"defective_pixel_mask" # Mask of defective pixels
        ]
    },
    
    # L0 Quicklook Collection - uses UPPERCASE asset names
    "ENMAP_HSI_L0_QL": {
        "enabled": False,  # Set to True to download this collection
        "assets": [
            "THUMBNAIL",          # Small preview image
            "OVERVIEW",           # Larger preview image
            #"VNIR",             # VNIR sensor quicklook
            #"SWIR",             # SWIR sensor quicklook
            #"QUALITY_CLOUD",    # Cloud mask
            #"QUALITY_CLOUDSHADOW", # Cloud shadow mask
            #"QUALITY_CIRRUS",   # Cirrus mask
            #"QUALITY_CLASSES",  # Quality classification
            #"QUALITY_SNOW",     # Snow mask
            #"QUALITY_HAZE",     # Haze mask
            #"PIXELMASK_VNIR",   # VNIR pixel mask
            #"PIXELMASK_SWIR",   # SWIR pixel mask
            #"TESTFLAGS_SWIR",   # SWIR test flags
            #"TESTFLAGS_VNIR"    # VNIR test flags
        ]
    }
}

# Area of interest configuration:
# Option 1: Define a bounding box [west, south, east, north]
BBOX = [11.230259, 48.051808, 11.337891, 48.117059]  # Example: DLR Oberpfaffenhofen

# Option 2: Use a center coordinate with a specified box size (km)
USE_CENTER_COORD = True  # Set to True to use center coordinate instead of a bounding box.
CENTER_COORD = {
    "lat": 50.71868778231684,  # Center latitude.
    "lon": 7.158329088235492,  # Center longitude.
    "size_km": 3              # Size of the square area in kilometers.
}

# Time range for the search (format: YYYY-MM-DD)
START_DATE = "2024-01-01"  # Start date.
END_DATE = "2024-12-31"    # End date.

# Maximum number of items to process per collection (None means all found items)
MAX_ITEMS = None

# Download directory settings:
CUSTOM_DOWNLOAD_PATH = None  # e.g., "/home/user/my_enmap_data" or "C:/EnMAP_Data"
BASE_DIR = "EnMAP_downloads"  # Default directory if no custom path is provided.


### Helper Functions

The following cell defines helper functions used to process the cURL command and calculate the bounding box from a center coordinate.

In [None]:
def clean_curl_command(curl_command):
    # Clean and format the provided CURL command.
    # - Removes extra newlines and spaces.
    # - Fixes cookie formatting (replaces '^|' with '|').
    # - Ensures the command is properly quoted.
    curl_command = ' '.join(curl_command.split())
    if 'Cookie:' in curl_command:
        cookie_start = curl_command.find('Cookie:')
        cookie_end = curl_command.find('" -H', cookie_start)
        if cookie_end == -1:
            cookie_end = curl_command.find('"', cookie_start + 15)
        cookie_part = curl_command[cookie_start:cookie_end]
        cleaned_cookie = cookie_part.replace('^|', '|')
        curl_command = curl_command[:cookie_start] + cleaned_cookie + curl_command[cookie_end:]
    if not curl_command.startswith('"""'):
        curl_command = '"""' + curl_command + '"""'
    return curl_command

def parse_curl_command(curl_command):
    # Extract headers from the provided CURL command.
    # Returns a dictionary mapping header names to their values.
    curl_command = clean_curl_command(curl_command)
    headers = {}
    header_pattern = r'-H\s*"([^:]+):\s*([^\"]+)"'
    matches = re.findall(header_pattern, curl_command)
    for header, value in matches:
        headers[header] = value
    return headers

def create_bbox_from_center(lat, lon, size_km):
    # Create a bounding box from a center coordinate and box size.
    # Returns a list [west, south, east, north] representing the bounding box.
    km_per_degree_lat = 111.0
    km_per_degree_lon = 111.0 * cos(radians(lat))
    lat_offset = (size_km / 2) / km_per_degree_lat
    lon_offset = (size_km / 2) / km_per_degree_lon
    west = lon - lon_offset
    east = lon + lon_offset
    south = lat - lat_offset
    north = lat + lat_offset
    return [west, south, east, north]


### EnmapDownloader Class

The following cell defines the `EnmapDownloader` class, which searches the STAC catalogue, filters items (e.g., by cloud cover), and downloads the specified asset types. It also optionally prints item properties.

In [None]:
class EnmapDownloader:
    def __init__(self):
        self.catalog = Client.open("https://geoservice.dlr.de/eoc/ogc/stac/v1/")
        self.headers = parse_curl_command(CURL_COMMAND)
        
    def setup_download_directory(self, collection_name):
        if CUSTOM_DOWNLOAD_PATH:
            base_path = os.path.expanduser(CUSTOM_DOWNLOAD_PATH)
            target_directory = os.path.join(base_path, collection_name)
        else:
            target_directory = os.path.join(BASE_DIR, collection_name)
        try:
            if not os.path.exists(target_directory):
                os.makedirs(target_directory)
                print(f"\nCreated directory at: {target_directory}")
            else:
                print(f"\nUsing existing directory at: {target_directory}")
        except Exception as e:
            print(f"Error creating directory: {str(e)}")
            print("Falling back to default directory")
            target_directory = os.path.join(BASE_DIR, collection_name)
            if not os.path.exists(target_directory):
                os.makedirs(target_directory)
        return target_directory

    def download_file(self, url, output_path):
        if 'Referer' in self.headers:
            self.headers['Referer'] = os.path.dirname(url) + '/'
        try:
            response = requests.get(url, headers=self.headers, stream=True, allow_redirects=True)
            response.raise_for_status()
            total_size = int(response.headers.get('content-length', 0))
            with open(output_path, 'wb') as f:
                if total_size == 0:
                    f.write(response.content)
                else:
                    downloaded = 0
                    for chunk in response.iter_content(chunk_size=8192):
                        if chunk:
                            f.write(chunk)
                            downloaded += len(chunk)
                            progress = int(50 * downloaded / total_size)
                            print(f"\rProgress: [{'=' * progress}{' ' * (50 - progress)}] {downloaded}/{total_size} bytes", end='')
            print(f"\nSuccessfully downloaded: {output_path}")
            return True
        except Exception as e:
            print(f"Error downloading file: {str(e)}")
            if os.path.exists(output_path):
                os.remove(output_path)
            return False

    def search_and_download_collection(self, collection, asset_types):
        print(f"\nSearching for {collection} data...")
        search_params = {
            "collections": [collection],
            "bbox": create_bbox_from_center(CENTER_COORD["lat"], CENTER_COORD["lon"], CENTER_COORD["size_km"]) if USE_CENTER_COORD else BBOX
        }
        if START_DATE and END_DATE:
            search_params["datetime"] = f"{START_DATE}T00:00:00Z/{END_DATE}T23:59:59Z"
        try:
            search = self.catalog.search(**search_params)
            total_matches = search.matched()
            if total_matches == 0:
                print("No items found matching your criteria.")
                return
            print(f"Found {total_matches} items.")
            items = list(search.items())
            if CLOUD_COVER_FILTER["enabled"]:
                filtered_items = []
                print("\nFiltering by cloud cover...")
                for item in items:
                    cloud_cover = float(item.properties.get("eo:cloud_cover", 100.0))
                    if cloud_cover <= CLOUD_COVER_FILTER["max_coverage"]:
                        filtered_items.append(item)
                items = filtered_items
                print(f"After cloud cover filtering ({CLOUD_COVER_FILTER['max_coverage']}% max): {len(items)} items")
            items_to_process = min(MAX_ITEMS, len(items)) if MAX_ITEMS else len(items)
            print(f"Will download {items_to_process} items.")
            download_dir = self.setup_download_directory(collection.split('_')[-1])
            items_processed = 0
            for item in items:
                if MAX_ITEMS and items_processed >= MAX_ITEMS:
                    break
                print(f"\nProcessing item {items_processed + 1}/{items_to_process}")
                print(f"Cloud cover: {item.properties.get('eo:cloud_cover', 'N/A')}%")
                if PRINT_PROPERTIES:
                    print("Item properties:")
                    for key, value in item.properties.items():
                        print(f"  {key}: {value}")
                assets = item.get_assets()
                if items_processed == 0:
                    print("\nAvailable assets in first item:")
                    for asset_name in assets.keys():
                        print(f"- {asset_name}")
                    print()
                for asset_type in asset_types:
                    if asset_type in assets:
                        asset = assets[asset_type]
                        filename = os.path.basename(asset.href)
                        output_path = os.path.join(download_dir, filename)
                        print(f"\nDownloading {asset_type}: {filename}")
                        self.download_file(asset.href, output_path)
                    else:
                        print(f"\nAsset type {asset_type} not found in item")
                items_processed += 1
            print(f"\nDownload complete for {collection}!")
        except Exception as e:
            print(f"Error during search and download: {str(e)}")

    def download_all(self):
        for collection, config in DOWNLOADS.items():
            if config["enabled"]:
                print(f"\n{'=' * 50}")
                print(f"Processing collection: {collection}")
                print(f"{'=' * 50}")
                self.search_and_download_collection(collection, config["assets"])


### Main Script Execution

The final cell initiates the download process by creating an instance of `EnmapDownloader` and calling its `download_all()` method.

In [None]:
if __name__ == "__main__":
    downloader = EnmapDownloader()
    downloader.download_all()
