# 🌍 Disaster COG Processing - VIIRS Flood

This notebook converts VIIRS flood imagery to Cloud Optimized GeoTIFFs (COGs).

## Workflow:
1. Configure basic settings
2. **List S3 files to see naming patterns**
3. Define filename transformations based on what you see
4. Preview and process

---

## 📋 Step 1: Basic Configuration

Set your event details and S3 paths:

In [19]:
# ========================================
# BASIC CONFIGURATION
# ========================================

# Event Details
EVENT_NAME = '202504_SevereWx_US'  # Your disaster event name
PRODUCT_NAME = 'viirs_flood'       # Product type

# S3 Paths
BUCKET = 'nasa-disasters'                                          # S3 bucket
SOURCE_PATH = f'drcs_activations/{EVENT_NAME}/{PRODUCT_NAME}'      # Where your files are
DESTINATION_BASE = 'drcs_activations_new'                          # Where to save COGs

# Processing Options
OVERWRITE = True  # Set to True to replace existing files
VERIFY = True      # Set to True to verify results after processing

print("✅ Basic configuration loaded")
print(f"Event: {EVENT_NAME}")
print(f"Source: s3://{BUCKET}/{SOURCE_PATH}")
print(f"Destination: s3://{BUCKET}/{DESTINATION_BASE}/")

✅ Basic configuration loaded
Event: 202504_SevereWx_US
Source: s3://nasa-disasters/drcs_activations/202504_SevereWx_US/viirs_flood
Destination: s3://nasa-disasters/drcs_activations_new/


## 🔍 Step 2: Connect to S3 and List Files

Let's see what files are available before configuring filename transformations:

In [20]:
# Import necessary modules
import sys
import os
from pathlib import Path

# Add parent directory to path
sys.path.insert(0, str(Path('..').resolve()))

# Import S3 operations
from core.s3_operations import (
    initialize_s3_client,
    list_s3_files,
    get_file_size_from_s3
)

# Initialize S3 client
print("🌐 Connecting to S3...")
s3_client, _ = initialize_s3_client(bucket_name=BUCKET, verbose=False)

if s3_client:
    print("✅ Connected to S3\n")
    
    # List all TIF files
    print(f"📂 Files in s3://{BUCKET}/{SOURCE_PATH}:")
    print("="*60)
    
    files = list_s3_files(s3_client, BUCKET, SOURCE_PATH, suffix='.tif')
    
    if files:
        print(f"Found {len(files)} .tif files:\n")
        for i, file_path in enumerate(files[:20], 1):  # Show first 20 for VIIRS
            filename = os.path.basename(file_path)
            try:
                size_gb = get_file_size_from_s3(s3_client, BUCKET, file_path)
                print(f"{i:2}. {filename:<60} ({size_gb:.2f} GB)")
            except:
                print(f"{i:2}. {filename}")
        
        if len(files) > 20:
            print(f"\n... and {len(files) - 20} more files")
        
        print("\n" + "="*60)
        print("\n💡 Use this information to create filename functions in Step 3")
    else:
        print("⚠️ No .tif files found in the specified path.")
        print("   Check your SOURCE_PATH configuration.")
else:
    print("❌ Could not connect to S3. Check your AWS credentials.")
    files = []

🌐 Connecting to S3...
✅ Connected to S3

📂 Files in s3://nasa-disasters/drcs_activations/202504_SevereWx_US/viirs_flood:
Found 1 .tif files:

 1. day_WATER_VIIRS_2025_04_07.tif                               (0.10 GB)


💡 Use this information to create filename functions in Step 3


## 🏷️ Step 3: Define Filename Transformations

Based on the VIIRS files above, define how to transform the filenames:

In [21]:
# ========================================
# FILENAME GENERATION FUNCTIONS FOR VIIRS
# ========================================

import re

def extract_date_from_viirs(filename):
    """Extract date from VIIRS filename.
    VIIRS files often have format: day_WATER_VIIRS_2025_04_07.tif
    """
    # Try to find YYYY_MM_DD pattern
    pattern = r'(\d{4})_(\d{2})_(\d{2})'
    match = re.search(pattern, filename)
    if match:
        year, month, day = match.groups()
        return f"{year}-{month}-{day}"
    
    # Try YYYYMMDD format as fallback
    dates = re.findall(r'\d{8}', filename)
    if dates:
        date_str = dates[0]
        return f"{date_str[0:4]}-{date_str[4:6]}-{date_str[6:8]}"
    
    return None

def create_viirs_flood_filename(original_path, event_name):
    """Create filename for VIIRS flood products.
    Example: day_WATER_VIIRS_2025_04_07.tif → 202504_SevereWx_US_VIIRS_WATER_2025-04-07_day.tif
    """
    filename = os.path.basename(original_path)
    stem = os.path.splitext(filename)[0]
    
    # Extract date
    date = extract_date_from_viirs(stem)
    
    # Determine if day or night
    time_of_day = 'day' if 'day' in stem.lower() else 'night' if 'night' in stem.lower() else 'day'
    
    # Extract product type
    product = 'WATER_VIIRS'
    
    if date:
        return f"{event_name}_{product}_{date}_{time_of_day}.tif"
    else:
        # Fallback - keep original structure
        return f"{event_name}_{stem}.tif"




In [22]:
# Map product types to filename creators (THESE ARE REGEX PATTERNS)
# For VIIRS, we'll use the specific function for all categories
# IMPORTANT: Include 'other' to handle uncategorized files
FILENAME_CREATORS = {
    'water': create_viirs_flood_filename,
}

# Output directories for VIIRS products
# IMPORTANT: Include 'other' to specify where uncategorized files go
OUTPUT_DIRS = {

    'water': 'VIIRS/water',

}

# No-data values for VIIRS (typically use auto-detect)
NODATA_VALUES = {
    # VIIRS flood products often use 0 or -9999
    # Leave empty for auto-detection
}

print("✅ VIIRS filename functions defined")
print(f"📂 Output directories configured:")
for category, path in OUTPUT_DIRS.items():
    print(f"   • {category} → {path}")

# Test with sample filenames
if files:
    print("\n📝 Example transformations:")
    for i, file_path in enumerate(files[:3], 1):
        sample_name = os.path.basename(file_path)
        new_name = create_viirs_flood_filename(file_path, EVENT_NAME)
        print(f"\n{i}. Original: {sample_name}")
        print(f"   → New:    {new_name}")

✅ VIIRS filename functions defined
📂 Output directories configured:
   • water → VIIRS/water

📝 Example transformations:

1. Original: day_WATER_VIIRS_2025_04_07.tif
   → New:    202504_SevereWx_US_WATER_VIIRS_2025-04-07_day.tif


## 🚀 Step 4: Initialize Processor and Preview

Set up the processor and preview all transformations:

In [23]:
# Import our simplified helper
from notebooks.notebook_helpers import SimpleProcessor

# Create full configuration
config = {
    'event_name': EVENT_NAME,
    'bucket': BUCKET,
    'source_path': SOURCE_PATH,
    'destination_base': DESTINATION_BASE,
    'overwrite': OVERWRITE,
    'verify': VERIFY,
    'filename_creators': FILENAME_CREATORS,
    'output_dirs': OUTPUT_DIRS,
    'nodata_values': NODATA_VALUES
}

# Initialize processor
processor = SimpleProcessor(config)

# Connect to S3
if processor.connect_to_s3():
    print("✅ Processor ready\n")
    
    # Discover and categorize files
    num_files = processor.discover_files()
    
    if num_files > 0:
        # Show preview
        processor.preview_processing()
        
        print("\n📌 Review the transformations above.")
        print("   If incorrect, adjust the filename functions in Step 3.")
        print("   When ready, proceed to Step 5 to process.")
    else:
        print("⚠️ No files found to process.")
else:
    print("❌ Could not initialize processor.")

✅ All modules loaded successfully

🌐 Connecting to S3...
✅ Connected to S3 successfully
✅ Processor ready


🔍 Searching for files in: drcs_activations/202504_SevereWx_US/viirs_flood
✅ Found 1 files

📊 File Categories:
  • other: 1 files

📋 PROCESSING PREVIEW

Total files to process: 1
Event: 202504_SevereWx_US
Source: s3://nasa-disasters/drcs_activations/202504_SevereWx_US/viirs_flood
Destination: s3://nasa-disasters/drcs_activations_new/

File categories:
  • other: 1 files
    Example: day_WATER_VIIRS_2025_04_07.tif
    → 202504_SevereWx_US_day_WATER_VIIRS_2025_04_07_day.tif

Settings:
  • Compression: ZSTD level 22
  • Overwrite existing: True
  • Verify results: True

📌 Review the transformations above.
   If incorrect, adjust the filename functions in Step 3.
   When ready, proceed to Step 5 to process.


## ⚙️ Step 5: Process Files

Process all VIIRS files:

In [24]:
# Process all files
if 'num_files' in locals() and num_files > 0:
    print("🚀 Starting VIIRS flood data processing...")
    print("This may take several minutes depending on file sizes.\n")
    
    # Process everything
    results = processor.process_all()
    
    # Display results
    if not results.empty:
        print("\n✅ VIIRS Processing Complete!")
        display(results) if 'display' in dir() else print(results)
else:
    print("⚠️ No files to process. Complete Steps 1-4 first.")

🚀 Starting VIIRS flood data processing...
This may take several minutes depending on file sizes.


🚀 Starting processing...

📦 Processing other (1 files)
⚠️ No filename creator for other, using default
  ⚙️ Processing: day_WATER_VIIRS_2025_04_07.tif (0.1GB)
   [CHECK] Checking if file already exists in S3: s3://nasa-disasters/drcs_activations_new/misc/202504_SevereWx_US_day_WATER_VIIRS_2025_04_07_day.tif
   [OVERWRITE] File exists but overwrite=True, reprocessing: 202504_SevereWx_US_day_WATER_VIIRS_2025_04_07_day.tif
   [INFO] File size: 0.1 GB
   [CONFIG] Using fixed chunks
   [TEMP] Using temp directory: /home/jovyan/disasters-aws-conversion/templates/temp_cog
   [MEMORY] Initial: 264.1 MB, Available: 27718.9 MB
   [DOWNLOAD] Downloading from S3...
   [DOWNLOAD] Downloading from S3: s3://nasa-disasters/drcs_activations/202504_SevereWx_US/viirs_flood/day_WATER_VIIRS_2025_04_07.tif
   [DOWNLOAD] ✅ Downloaded 104.0 MB to data_download/drcs_activations/202504_SevereWx_US/viirs_flood/day_

## 📈 Step 6: Review Results

Analyze processing results:

In [None]:
# Analyze results
if 'results' in locals() and not results.empty:
    print("📊 VIIRS PROCESSING STATISTICS")
    print("="*40)
    
    # Success rate
    total = len(results)
    success = len(results[results['status'] == 'success'])
    failed = len(results[results['status'] == 'failed'])
    skipped = len(results[results['status'] == 'skipped'])
    
    print(f"Total VIIRS files: {total}")
    print(f"✅ Success: {success}")
    print(f"❌ Failed: {failed}")
    print(f"⏭️ Skipped: {skipped}")
    print(f"\nSuccess rate: {(success/total*100):.1f}%")
    
    # Failed files
    if failed > 0:
        print("\n❌ Failed files:")
        failed_df = results[results['status'] == 'failed']
        for idx, row in failed_df.iterrows():
            print(f"  - {row['file']}: {row.get('error', 'Unknown error')}")
    
    # Processing times
    if 'time_seconds' in results.columns:
        success_df = results[results['status'] == 'success']
        if not success_df.empty:
            avg_time = success_df['time_seconds'].mean()
            max_time = success_df['time_seconds'].max()
            total_time = success_df['time_seconds'].sum()
            print(f"\n⏱️ Timing:")
            print(f"Total time: {total_time/60:.1f} minutes")
            print(f"Average: {avg_time:.1f} seconds per file")
            print(f"Slowest: {max_time:.1f} seconds")
else:
    print("No results to analyze. Run Step 5 first.")

## 💡 VIIRS-Specific Tips

### VIIRS Flood Data Characteristics:
- **Temporal coverage**: Daily observations (day and night)
- **Spatial resolution**: ~375m
- **Data values**: Often binary or categorical (water/no-water)
- **No-data**: Usually 0 or -9999

### Common VIIRS File Patterns:
- `day_WATER_VIIRS_YYYY_MM_DD.tif`
- `night_WATER_VIIRS_YYYY_MM_DD.tif`
- `VIIRS_flood_YYYYMMDD.tif`

### Processing Notes:
1. VIIRS files are typically smaller than optical imagery
2. Categorical data uses nearest-neighbor resampling
3. COG compression is very effective on binary data
4. Processing is usually faster than high-res optical

### Troubleshooting:
- If dates aren't extracted correctly, check the pattern in Step 3
- VIIRS flood products may have different naming conventions
- Adjust the `extract_date_from_viirs()` function as needed