# 🌍 Simple Disaster COG Processing

This simplified notebook converts disaster satellite imagery to Cloud Optimized GeoTIFFs (COGs) with just a few cells.

## ✨ Features
- **See files first** - List S3 files before configuring
- **Smart configuration** - Define filename functions after seeing actual files
- **Auto-discovery** - Automatically categorizes your files
- **Simple processing** - Just run the cells in order

---

## 📋 Step 1: Basic Configuration

Set your event details and S3 paths:

In [1]:
# ========================================
# BASIC CONFIGURATION
# ========================================

# Event Details
EVENT_NAME = 'temp-planet'  # Your disaster event name
PRODUCT_NAME = ''                   # Product type (sentinel1, sentinel2, landsat, etc.)

# S3 Paths
BUCKET = 'nasa-disasters'                                          # S3 bucket
SOURCE_PATH = f'{EVENT_NAME}/{PRODUCT_NAME}'      # Where your files are
DESTINATION_BASE = 'temp_planet_cog'                          # Where to save COGs

# Processing Options
OVERWRITE = False      # Set to True to replace existing files
VERIFY = True          # Set to True to verify results after processing
SAVE_RESULTS = True    # Set to False to skip saving results CSV to /output directory

print("✅ Basic configuration loaded")
print(f"Event: {EVENT_NAME}")
print(f"Source: s3://{BUCKET}/{SOURCE_PATH}")
print(f"Destination: s3://{BUCKET}/{DESTINATION_BASE}/")

✅ Basic configuration loaded
Event: temp-planet
Source: s3://nasa-disasters/temp-planet/
Destination: s3://nasa-disasters/temp_planet_cog/


In [None]:
# to add... satellite name within the above


## 🔍 Step 2: Connect to S3 and List Files

Let's see what files are available before configuring filename transformations:

In [2]:
# Import necessary modules
import sys
import os
from pathlib import Path

# Add parent directory to path
sys.path.insert(0, str(Path('..').resolve()))

# Import S3 operations
from core.s3_operations import (
    initialize_s3_client,
    list_s3_files,
    get_file_size_from_s3
)

# Initialize S3 client
print("🌐 Connecting to S3...")
s3_client, _ = initialize_s3_client(bucket_name=BUCKET, verbose=False)

if s3_client:
    print("✅ Connected to S3\n")
    
    # List all TIF files
    print(f"📂 Files in s3://{BUCKET}/{SOURCE_PATH}:")
    print("="*60)
    
    files = list_s3_files(s3_client, BUCKET, SOURCE_PATH, suffix='.tif')
    
    if files:
        print(f"Found {len(files)} .tif files:\n")
        for i, file_path in enumerate(files[:10], 1):  # Show first 10
            filename = os.path.basename(file_path)
            try:
                size_gb = get_file_size_from_s3(s3_client, BUCKET, file_path)
                print(f"{i:2}. {filename:<60} ({size_gb:.2f} GB)")
            except:
                print(f"{i:2}. {filename}")
        
        if len(files) > 10:
            print(f"\n... and {len(files) - 10} more files")
        
        print("\n" + "="*60)
        print("\n💡 Use this information to create filename functions in Step 3")
    else:
        print("⚠️ No .tif files found in the specified path.")
        print("   Check your SOURCE_PATH configuration.")
else:
    print("❌ Could not connect to S3. Check your AWS credentials.")
    files = []

🌐 Connecting to S3...
✅ Connected to S3

📂 Files in s3://nasa-disasters/temp-planet/:
Found 16 .tif files:

 1. Planet_NDVI_20250602_170712_43_8106770.tif                   (0.26 GB)
 2. Planet_NDVI_20250602_170714_26_8106770.tif                   (0.26 GB)
 3. Planet_NDVI_20250602_170716_08_8106770.tif                   (0.26 GB)
 4. Planet_NDVI_20250602_170717_90_8106770.tif                   (0.26 GB)
 5. Planet_NDVI_20250602_172810_91_8106814.tif                   (0.37 GB)
 6. Planet_NDVI_20250602_172813_10_8106814.tif                   (0.37 GB)
 7. Planet_NDVI_20250602_172815_30_8106814.tif                   (0.37 GB)
 8. Planet_NDVI_20250602_173839_64_8106856.tif                   (0.35 GB)
 9. Planet_NDVI_20250602_173841_74_8106856.tif                   (0.35 GB)
10. Planet_NDVI_20250602_173843_85_8106856.tif                   (0.35 GB)

... and 6 more files


💡 Use this information to create filename functions in Step 3


In [3]:
a='Planet_NDVI_20250602_170712_43_8106770.tif'

print (f'{EVENT_NAME}_{a}')

temp-planet_Planet_NDVI_20250602_170712_43_8106770.tif


## 🏷️ Step 3: Define Categorization and Filename Transformations

Based on the files you see above, configure:
1. **Categorization patterns** - Regex patterns to identify file types
2. **Filename functions** - How to transform filenames
3. **Output directories** - Where each category should be saved

In [4]:
# ========================================
# CATEGORIZATION AND OUTPUT CONFIGURATION
# ========================================

import re

# STEP 1: Define how to extract dates from filenames
def extract_date_from_filename(filename):
    """Extract date from filename in YYYYMMDD format."""
    dates = re.findall(r'\d{8}', filename)
    if dates:
        date_str = dates[0]
        return f"{date_str[0:4]}-{date_str[4:6]}-{date_str[6:8]}"
    return None

# STEP 2: Define filename transformation functions for each category
def create_truecolor_filename(original_path, event_name):
    """Create filename for trueColor products."""
    filename = os.path.basename(original_path)
    stem = os.path.splitext(filename)[0]
    date = extract_date_from_filename(stem)
    
    if date:
        stem_clean = re.sub(r'_\d{8}', '', stem)
        return f"{event_name}_{stem_clean}_{date}_day.tif"
    return f"{event_name}_{stem}_day.tif"

def create_colorinfrared_filename(original_path, event_name):
    """Create filename for colorInfrared products."""
    filename = os.path.basename(original_path)
    stem = os.path.splitext(filename)[0]
    date = extract_date_from_filename(stem)
    
    if date:
        stem_clean = re.sub(r'_\d{8}', '', stem)
        return f"{event_name}_{stem_clean}_{date}_day.tif"
    return f"{event_name}_{stem}_day.tif"

def create_naturalcolor_filename(original_path, event_name):
    """Create filename for naturalColor products."""
    filename = os.path.basename(original_path)
    stem = os.path.splitext(filename)[0]
    date = extract_date_from_filename(stem)
    
    if date:
        stem_clean = re.sub(r'_\d{8}', '', stem)
        return f"{event_name}_{stem_clean}_{date}_day.tif"
    return f"{event_name}_{stem}_day.tif"

def planet_test(original_path, event_name):
    return (f'{original_path}')

# STEP 3: Configure categorization patterns (REQUIRED)
# These regex patterns determine which files belong to which category
CATEGORIZATION_PATTERNS = {
    'trueColor': r'trueColor|truecolor|true_color',
    'colorInfrared': r'colorInfrared|colorIR|color_infrared',
    'naturalColor': r'naturalColor|natural_color',
    'NDVI': r'NDVI|ndvi'
    # Add patterns for ALL file types you want to process
    # Files not matching any pattern will be skipped with a warning
}

# STEP 4: Map categories to filename transformation functions
FILENAME_CREATORS = {
    'trueColor': create_truecolor_filename,
    'colorInfrared': create_colorinfrared_filename,
    'naturalColor': create_naturalcolor_filename,
    'NDVI': planet_test
    # Must have an entry for each category in CATEGORIZATION_PATTERNS
}

# STEP 5: Specify output directories for each category
OUTPUT_DIRS = {
    'trueColor': 'Landsat/trueColor',
    'colorInfrared': 'Landsat/colorIR',
    'naturalColor': 'Landsat/naturalColor',
    'NDVI': 'cog',
    # Must have an entry for each category in CATEGORIZATION_PATTERNS
}

# OPTIONAL: Specify no-data values (None = auto-detect)
NODATA_VALUES = {
    'NDVI': -9999.0,
    'MNDWI': -9999.0,
    # Leave empty or set to None for auto-detection
}

print("✅ Configuration defined")
print(f"\n📂 Categories and output paths:")
for category, path in OUTPUT_DIRS.items():
    pattern = CATEGORIZATION_PATTERNS.get(category, 'No pattern defined')
    print(f"   • {category}:")
    print(f"     Pattern: {pattern}")
    print(f"     Output:  {DESTINATION_BASE}/{path}")

# Test with sample filename if files exist
if files:
    sample_file = files[0]
    sample_name = os.path.basename(sample_file)
    
    # Check which category it would match
    matched_category = None
    for cat, pattern in CATEGORIZATION_PATTERNS.items():
        if re.search(pattern, sample_name, re.IGNORECASE):
            matched_category = cat
            break
    
    if matched_category:
        new_name = FILENAME_CREATORS[matched_category](sample_file, EVENT_NAME)
        print(f"\n📝 Example transformation:")
        print(f"   Original: {sample_name}")
        print(f"   Category: {matched_category}")
        print(f"   → New:    {new_name}")
        print(f"   → Output: {DESTINATION_BASE}/{OUTPUT_DIRS[matched_category]}/{new_name}")
    else:
        print(f"\n⚠️ Warning: Sample file doesn't match any category pattern:")
        print(f"   File: {sample_name}")
        print(f"   Add a pattern to CATEGORIZATION_PATTERNS to process this file type")

✅ Configuration defined

📂 Categories and output paths:
   • trueColor:
     Pattern: trueColor|truecolor|true_color
     Output:  temp_planet_cog/Landsat/trueColor
   • colorInfrared:
     Pattern: colorInfrared|colorIR|color_infrared
     Output:  temp_planet_cog/Landsat/colorIR
   • naturalColor:
     Pattern: naturalColor|natural_color
     Output:  temp_planet_cog/Landsat/naturalColor
   • NDVI:
     Pattern: NDVI|ndvi
     Output:  temp_planet_cog/cog

📝 Example transformation:
   Original: Planet_NDVI_20250602_170712_43_8106770.tif
   Category: NDVI
   → New:    temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif
   → Output: temp_planet_cog/cog/temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif


## 🚀 Step 4: Initialize Processor and Preview

Now let's set up the processor and preview all transformations:

## ⚙️ Step 5: Process Files

Run this cell to start processing all files:

In [5]:
# Import our simplified helper
from notebooks.notebook_helpers import SimpleProcessor

# Create full configuration with categorization patterns
config = {
    'event_name': EVENT_NAME,
    'bucket': BUCKET,
    'source_path': SOURCE_PATH,
    'destination_base': DESTINATION_BASE,
    'overwrite': OVERWRITE,
    'verify': VERIFY,
    'save_results': SAVE_RESULTS,  # Add save results flag
    'categorization_patterns': CATEGORIZATION_PATTERNS,  # IMPORTANT: Include patterns
    'filename_creators': FILENAME_CREATORS,
    'output_dirs': OUTPUT_DIRS,
    'nodata_values': NODATA_VALUES
}

# Initialize processor
processor = SimpleProcessor(config)

# Connect to S3 (already connected, but needed for processor)
if processor.connect_to_s3():
    print("✅ Processor ready\n")
    
    # Discover and categorize files
    num_files = processor.discover_files()
    
    if num_files > 0:
        # Show preview of transformations
        processor.preview_processing()
        
        print("\n📌 Review the transformations above.")
        print("   • Files will be saved to the directories specified in OUTPUT_DIRS")
        print("   • If files appear as 'uncategorized', add patterns to CATEGORIZATION_PATTERNS")
        print("   • When ready, proceed to Step 5 to process the files.")
    else:
        print("⚠️ No files found to process.")
else:
    print("❌ Could not initialize processor.")

✅ All modules loaded successfully

🌐 Connecting to S3...
✅ Connected to S3 successfully
✅ Processor ready


🔍 Searching for files in: temp-planet/
✅ Found 16 files

📊 File Categories:
  • NDVI: 16 files

📋 PROCESSING PREVIEW

Total files to process: 16
Event: temp-planet
Source: s3://nasa-disasters/temp-planet/
Destination: s3://nasa-disasters/temp_planet_cog/

File categories:
  • NDVI: 16 files
    Example: Planet_NDVI_20250602_170712_43_8106770.tif
    → temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif

Settings:
  • Compression: ZSTD level 22
  • Overwrite existing: False
  • Verify results: True

📌 Review the transformations above.
   • Files will be saved to the directories specified in OUTPUT_DIRS
   • If files appear as 'uncategorized', add patterns to CATEGORIZATION_PATTERNS
   • When ready, proceed to Step 5 to process the files.


In [None]:
# Process all files
if 'num_files' in locals() and num_files > 0:
    print("🚀 Starting processing...")
    print("This may take several minutes depending on file sizes.\n")
    
    # Process everything
    results = processor.process_all()
    
    # Display results
    if not results.empty:
        print("\n📊 Processing Complete!")
        display(results) if 'display' in dir() else print(results)
else:
    print("⚠️ No files to process. Complete Steps 1-4 first.")

🚀 Starting processing...
This may take several minutes depending on file sizes.


🚀 Starting processing...

📦 Processing NDVI (16 files)
  ⚙️ Processing: Planet_NDVI_20250602_170712_43_8106770.tif (0.3GB)
   [CHECK] Checking if file already exists in S3: s3://nasa-disasters/temp_planet_cog/cog/temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif
   [INFO] File size: 0.3 GB
   [CONFIG] Using fixed chunks
   [TEMP] Using temp directory: /home/jovyan/disasters-aws-conversion/templates/temp_cog
   [MEMORY] Initial: 228.6 MB, Available: 29459.7 MB
   [DOWNLOAD] Downloading from S3...
   [DOWNLOAD] Downloading from S3: s3://nasa-disasters/temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif
   [DOWNLOAD] ✅ Downloaded 266.4 MB to data_download/temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif
   [OPTIMIZED] Using GDAL COG driver for maximum performance
   [NODATA] Using manual no-data value: -9999.0
   [GDAL-COG] Creating COG with native GDAL driver...
   [GDAL-COG] Data type: float3

ERROR 4: Attempt to create new tiff file `cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif: No such file or directory

Reading input: <open WarpedVRT name='WarpedVRT(data_download/temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif)' mode='r'>



ERROR 4: Attempt to create new tiff file `cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif: No such file or directory

   [GDAL-COG] Failed, trying rio-cogeo fallback...
   [OPTIMIZED] Using rio-cogeo for single-pass COG creation
   [NODATA] Using manual no-data value: -9999.0
   [COG] Creating COG with reprojection in single pass...


Adding overviews...
Updating dataset tags...
Writing output to: cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif


   [ERROR] Attempt to create new tiff file 'cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif: No such file or directory
   [CLEANUP] Removed: data_download/temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif
  ❌ Failed: Planet_NDVI_20250602_170712_43_8106770.tif - Attempt to create new tiff file 'cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170712_43_8106770.tif: No such file or directory
  ⚙️ Processing: Planet_NDVI_20250602_170714_26_8106770.tif (0.3GB)
   [CHECK] Checking if file already exists in S3: s3://nasa-disasters/temp_planet_cog/cog/temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif
   [INFO] File size: 0.3 GB
   [CONFIG] Using fixed chunks
   [TEMP] Using temp directory: /home/jovyan/disasters-aws-conversion/templates/temp_cog
   [MEMORY] Initial: 748.3 MB, Available: 28950.0 MB
   [DOWNLOAD] Downloading from S3...
   [DOWNLOAD] Do

ERROR 4: Attempt to create new tiff file `cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif: No such file or directory

Reading input: <open WarpedVRT name='WarpedVRT(data_download/temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif)' mode='r'>



ERROR 4: Attempt to create new tiff file `cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif: No such file or directory

   [GDAL-COG] Failed, trying rio-cogeo fallback...
   [OPTIMIZED] Using rio-cogeo for single-pass COG creation
   [NODATA] Using manual no-data value: -9999.0
   [COG] Creating COG with reprojection in single pass...


Adding overviews...
Updating dataset tags...
Writing output to: cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif


   [ERROR] Attempt to create new tiff file 'cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif: No such file or directory
   [CLEANUP] Removed: data_download/temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif
  ❌ Failed: Planet_NDVI_20250602_170714_26_8106770.tif - Attempt to create new tiff file 'cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif' failed: cog_temp-planet/Planet_NDVI_20250602_170714_26_8106770.tif: No such file or directory
  ⚙️ Processing: Planet_NDVI_20250602_170716_08_8106770.tif (0.3GB)
   [CHECK] Checking if file already exists in S3: s3://nasa-disasters/temp_planet_cog/cog/temp-planet/Planet_NDVI_20250602_170716_08_8106770.tif
   [INFO] File size: 0.3 GB
   [CONFIG] Using fixed chunks
   [TEMP] Using temp directory: /home/jovyan/disasters-aws-conversion/templates/temp_cog
   [MEMORY] Initial: 760.4 MB, Available: 28937.2 MB
   [DOWNLOAD] Downloading from S3...
   [DOWNLOAD] Do

In [None]:
# Analyze results
if 'results' in locals() and not results.empty:
    print("📊 PROCESSING STATISTICS")
    print("="*40)
    
    # Success rate
    total = len(results)
    success = len(results[results['status'] == 'success'])
    failed = len(results[results['status'] == 'failed'])
    skipped = len(results[results['status'] == 'skipped'])
    
    print(f"Total files: {total}")
    print(f"✅ Success: {success}")
    print(f"❌ Failed: {failed}")
    print(f"⏭️ Skipped: {skipped}")
    print(f"\nSuccess rate: {(success/total*100):.1f}%")
    
    # Failed files
    if failed > 0:
        print("\n❌ Failed files:")
        failed_df = results[results['status'] == 'failed']
        for idx, row in failed_df.iterrows():
            print(f"  - {row['source_file']}: {row.get('error', 'Unknown error')}")
    
    # Processing times
    if 'time_seconds' in results.columns:
        success_df = results[results['status'] == 'success']
        if not success_df.empty:
            avg_time = success_df['time_seconds'].mean()
            max_time = success_df['time_seconds'].max()
            print(f"\n⏱️ Timing:")
            print(f"Average: {avg_time:.1f} seconds per file")
            print(f"Slowest: {max_time:.1f} seconds")
else:
    print("No results to analyze. Run Step 5 first.")

In [None]:
# Analyze results
if 'results' in locals() and not results.empty:
    print("📊 PROCESSING STATISTICS")
    print("="*40)
    
    # Success rate
    total = len(results)
    success = len(results[results['status'] == 'success'])
    failed = len(results[results['status'] == 'failed'])
    skipped = len(results[results['status'] == 'skipped'])
    
    print(f"Total files: {total}")
    print(f"✅ Success: {success}")
    print(f"❌ Failed: {failed}")
    print(f"⏭️ Skipped: {skipped}")
    print(f"\nSuccess rate: {(success/total*100):.1f}%")
    
    # Failed files
    if failed > 0:
        print("\n❌ Failed files:")
        failed_df = results[results['status'] == 'failed']
        for idx, row in failed_df.iterrows():
            print(f"  - {row['file']}: {row.get('error', 'Unknown error')}")
    
    # Processing times
    if 'time_seconds' in results.columns:
        success_df = results[results['status'] == 'success']
        if not success_df.empty:
            avg_time = success_df['time_seconds'].mean()
            max_time = success_df['time_seconds'].max()
            print(f"\n⏱️ Timing:")
            print(f"Average: {avg_time:.1f} seconds per file")
            print(f"Slowest: {max_time:.1f} seconds")
else:
    print("No results to analyze. Run Step 5 first.")

## 💡 Tips & Troubleshooting

### Workflow Summary:
1. **Configure** basic settings (Step 1)
2. **List files** from S3 to see naming patterns (Step 2)
3. **Define functions** to transform filenames (Step 3)
4. **Preview** transformations (Step 4)
5. **Process** all files (Step 5)
6. **Review** results (Step 6)

### Common Issues:

1. **"No files found"**
   - Check `SOURCE_PATH` in Step 1
   - Verify bucket permissions
   - Ensure files have `.tif` extension

2. **Wrong filename transformations**
   - Review actual filenames in Step 2
   - Adjust functions in Step 3
   - Re-run Step 4 to preview

3. **Files being skipped**
   - Files already exist in destination
   - Set `OVERWRITE = True` in Step 1

4. **Processing errors**
   - Check AWS credentials
   - Verify S3 write permissions
   - Check available disk space for temp files

### Need More Control?

Use the full template at `disaster_processing_template.ipynb` for:
- Manual chunk configuration
- Custom compression settings
- Detailed memory management
- Advanced processing options