# S5CMD Utilities for Copernicus Data Download

This notebook provides utilities for downloading Sentinel-1 SAR data from the Copernicus Data Space Ecosystem using s5cmd.

## Overview

The s5cmd tool is a high-performance S3 client that enables fast downloads from S3-compatible storage services. This notebook contains functions to:
- Configure s5cmd with Copernicus Data Space credentials
- Download individual files or entire directories
- Handle authentication and endpoint configuration

## Prerequisites

1. Install s5cmd: `pip install s5cmd`
2. Configure credentials in `.s5cfg` file
3. Ensure network access to Copernicus Data Space Ecosystem

## Configuration File Format

The `.s5cfg` file should contain:
```ini
[default]
aws_access_key_id = 'your_access_key'
aws_secret_access_key = 'your_secret_key'
aws_region = 'us-east-1'
host_base = 'eodata.dataspace.copernicus.eu'
use_https = 'true'
```

In [1]:
import subprocess
import sys

def test_s5cmd_availability():
    """Test if s5cmd is available and properly installed.
    
    Returns:
        bool: True if s5cmd is available, False otherwise
        
    Raises:
        SystemExit: If s5cmd is not available
    """
    try:
        result = subprocess.run(['s5cmd', '--help'], 
                              capture_output=True, 
                              text=True, 
                              check=True)
        print(f'✓ s5cmd is available: {result.stdout.strip()}')
        return True
        
    except FileNotFoundError:
        print('✗ s5cmd is not installed or not in PATH')
        print('Install with: pip install s5cmd')
        return False
        
    except subprocess.CalledProcessError as e:
        print(f'✗ s5cmd command failed: {e.stderr}')
        return False

# Test s5cmd availability
if not test_s5cmd_availability():
    print('Please install s5cmd before proceeding')
    sys.exit(1)

✓ s5cmd is available: NAME:
   s5cmd - Blazing fast S3 and local filesystem execution tool

USAGE:
   s5cmd [global options] command [command options] [arguments...]

COMMANDS:
   ls              list buckets and objects
   cp              copy objects
   rm              remove objects
   mv              move/rename objects
   mb              make bucket
   rb              remove bucket
   select          run SQL queries on objects
   du              show object size usage
   cat             print remote object content
   pipe            stream to remote from stdin
   run             run commands in batch
   sync            sync objects
   version         print version
   bucket-version  configure bucket versioning
   presign         print remote object presign url
   help, h         Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --credentials-file value       use the specified credentials file instead of the default credentials file
   --dry-run                  

## Core S5CMD Functions

The following functions provide a Python interface to s5cmd for downloading Copernicus data.

In [None]:
from phidown.s5cmd_utils import download


s3_path = '/eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE'
output_dir = '/Data_large/marine/PythonProjects/SAR/sarpyx/data'
download_successfull = download(s3_path, output_dir)



INFO:s5cmd_utils:Created configuration file: .s5cfg
INFO:s5cmd_utils:Downloading from: s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/*
INFO:s5cmd_utils:Output directory: /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE
INFO:s5cmd_utils:Running command: s5cmd --endpoint-url https://eodata.dataspace.copernicus.eu cp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/* /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/


'cp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE-report-20240503T052153.pdf /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE-report-20240503T052153.pdf\ncp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/s1a-iw-raw-s-vh-20240503t031926-20240503t031942-053701-0685fb-index.dat /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/s1a-iw-raw-s-vh-20240503t031926-20240503t031942-053701-0685fb-index.dat\ncp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/support/s1-level-0-annot.xsd /Da

## Usage Examples

The following cells demonstrate how to use the s5cmd utilities for different scenarios.

In [4]:
# Example 1: Download entire Sentinel-1 SAFE directory
s3_path = '/eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE'

output_dir = '/Data_large/marine/PythonProjects/SAR/sarpyx/data'

try:
    output = download_sentinel_safe(
        s3_path=s3_path,
        output_dir=output_dir,
        config_file='.s5cfg',
        endpoint_url='https://eodata.dataspace.copernicus.eu'
    )
    print('Download completed successfully!')
    print(f'Output: {output}')
except subprocess.CalledProcessError as e:
    print(f'Download failed: {e}')
except Exception as e:
    print(f'Error: {e}')

INFO:__main__:Downloading from: s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/*
INFO:__main__:Output directory: /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE
INFO:__main__:Running command: s5cmd --endpoint-url https://eodata.dataspace.copernicus.eu cp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/* /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/
INFO:__main__:Output directory: /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE
INFO:__main__:Running command: s5cmd --endpoint-url https://eodata.dataspace.copernicus.eu cp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_

Download completed successfully!
Output: cp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/manifest.safe /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/manifest.safe
cp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE-report-20240503T052153.pdf /Data_large/marine/PythonProjects/SAR/sarpyx/data/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE-report-20240503T052153.pdf
cp s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/s1a-iw-raw-s-vv-20240503t031926-20240503t031942-053701-0685fb-index.dat /Data_large/marine/PythonProjects/SAR/

## Directory Listing Example

List available Sentinel-1 data to explore the directory structure.

In [5]:
# Example 2: List Sentinel-1 directories
try:
    output = run_s5cmd_with_config(
        'ls s3://eodata/Sentinel-1/SAR/',
        config_file='.s5cfg',
        endpoint_url='https://eodata.dataspace.copernicus.eu'
    )
    print('Available Sentinel-1 SAR products:')
    print(output)
except subprocess.CalledProcessError as e:
    print(f'Listing failed: {e}')
    if hasattr(e, 'stderr'):
        print(f'Error details: {e.stderr}')
except Exception as e:
    print(f'Error: {e}')

INFO:__main__:Running command: s5cmd --endpoint-url https://eodata.dataspace.copernicus.eu ls s3://eodata/Sentinel-1/SAR/


Available Sentinel-1 SAR products:
                                  DIR  AISAUX_PRIVATE/
                                  DIR  AI_RAW__0__PRIVATE/
                                  DIR  CARD-BS/
                                  DIR  CARD-COH/
                                  DIR  CARD-COH12/
                                  DIR  CARD-COH6/
                                  DIR  EN_RAW__0S_PRIVATE/
                                  DIR  EW_ETA__AX/
                                  DIR  EW_GRDH_1S-COG/
                                  DIR  EW_GRDM_1A/
                                  DIR  EW_GRDM_1A_PRIVATE/
                                  DIR  EW_GRDM_1S/
                                  DIR  EW_GRDM_1S-COG/
                                  DIR  EW_GRDM_1S-COG_PRIVATE/
                                  DIR  EW_GRDM_1S_PRIVATE/
                                  DIR  EW_OCN__2A/
                                  DIR  EW_OCN__2A_PRIVATE/
                                  DIR  E

## Advanced Usage

Additional examples for more complex scenarios.

In [6]:
def list_sentinel_products(
    base_path: str = 's3://eodata/Sentinel-1/SAR/',
    date_filter: Optional[str] = None,
    config_file: str = '.s5cfg',
    endpoint_url: str = 'https://eodata.dataspace.copernicus.eu'
) -> List[str]:
    """List available Sentinel-1 products with optional date filtering.
    
    Args:
        base_path: Base S3 path for Sentinel-1 data
        date_filter: Optional date filter (e.g., '2024/05')
        config_file: Path to s5cmd configuration file
        endpoint_url: Copernicus Data Space endpoint URL
        
    Returns:
        List[str]: List of available product paths
        
    Example:
        >>> products = list_sentinel_products(date_filter='2024/05')
    """
    if date_filter:
        search_path = f'{base_path.rstrip("/")}/IW_RAW__0S/{date_filter}/'
    else:
        search_path = f'{base_path.rstrip("/")}/IW_RAW__0S/'
    
    try:
        output = run_s5cmd_with_config(
            f'ls {search_path}',
            config_file=config_file,
            endpoint_url=endpoint_url,
            verbose=False
        )
        
        # Parse output to extract product names
        lines = output.strip().split('\n')
        products = [line.strip() for line in lines if line.strip()]
        return products
        
    except Exception as e:
        logger.error(f'Failed to list products: {e}')
        return []

# Test the fixed function
print('Testing S3 path construction...')
test_path = '/eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE'
s3_url = f's3:/{test_path}/*'
print(f'Constructed S3 URL: {s3_url}')

# Example usage (commented out for safety)
# print('Listing Sentinel-1 products for May 2024:')
# products = list_sentinel_products(date_filter='2024/05')
# for i, product in enumerate(products[:5]):  # Show first 5
#     print(f'{i+1}. {product}')
# if len(products) > 5:
#     print(f'... and {len(products) - 5} more products')

Testing S3 path construction...
Constructed S3 URL: s3://eodata/Sentinel-1/SAR/IW_RAW__0S/2024/05/03/S1A_IW_RAW__0SDV_20240503T031926_20240503T031942_053701_0685FB_E003.SAFE/*
