# üéØ Thin Cloud Detection with Reinforcement Learning

**Research Goal**: Specifically improve CNN's detection of THIN/CIRRUS clouds (their main weakness)

**Approach**: Multi-Feature RL with thin cloud boost action

---

## Background

**CNN Weakness**: Thin clouds have low reflectance ‚Üí low CNN probability ‚Üí missed detection

**Our Solution**: RL agent learns to:
- Identify thin cloud patterns (blue/red ratio, moderate reflectance)
- Apply probability boost specifically for thin clouds
- Filter false positives on shadows using spectral indices

**Key Innovation**: `thin_cloud_boost` action [0, 0.4] that increases confidence for thin cloud pixels

## 1Ô∏è‚É£ Setup Environment

In [None]:
# Clone repository
!git clone https://github.com/Usernamenisiya/thesis-cloud-rl.git
%cd thesis-cloud-rl

# Verify
!pwd
!ls -la | head -15

In [None]:
# Install dependencies
!pip install -r requirements.txt
!pip install gymnasium scikit-image

import torch
import stable_baselines3
import rasterio

print("‚úÖ Dependencies installed")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

In [None]:
# Check GPU
!nvidia-smi

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"\n‚úÖ Using device: {device}")

## 2Ô∏è‚É£ Load CloudSEN12 Data

In [None]:
# Mount Google Drive
from google.colab import drive
import os
from pathlib import Path

drive.mount('/content/drive')

# Verify CloudSEN12 data exists
cloudsen_path = '/content/drive/MyDrive/Colab_Data/cloudsen12_subset'

if os.path.exists(cloudsen_path):
    num_patches = len([d for d in Path(cloudsen_path).iterdir() if d.is_dir()])
    print(f"‚úÖ CloudSEN12 data found: {num_patches} patches")
    print(f"üìÇ Location: {cloudsen_path}")

    # Process CloudSEN12 data
    print("\nüîß Processing CloudSEN12 patches...")
    !python cloudsen12_loader.py
else:
    print(f"‚ùå CloudSEN12 data not found at: {cloudsen_path}")

In [None]:
# Verify processed data
import glob

processed_dir = 'data/cloudsen12_processed'
image_files = glob.glob(f'{processed_dir}/*_image.tif')
mask_files = glob.glob(f'{processed_dir}/*_mask.tif')

print(f"‚úÖ Found {len(image_files)} image patches")
print(f"‚úÖ Found {len(mask_files)} mask patches")
print(f"\nüéØ Ready for thin cloud detection training!")

## 3Ô∏è‚É£ Baseline CNN Performance

In [None]:
# Evaluate baseline CNN on all patches
from cnn_inference import load_sentinel2_image, get_cloud_mask
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import numpy as np

print("üß† Evaluating CNN Baseline")
print("="*60)

all_gt = []
all_cnn = []

for img_path, mask_path in zip(image_files, mask_files):
    image = load_sentinel2_image(img_path)
    cnn_prob = get_cloud_mask(image)
    
    with rasterio.open(mask_path) as src:
        ground_truth = src.read(1)
    
    gt_binary = (ground_truth > 0).astype(np.uint8)
    cnn_binary = (cnn_prob > 0.5).astype(np.uint8)
    
    all_gt.append(gt_binary.flatten())
    all_cnn.append(cnn_binary.flatten())

all_gt = np.concatenate(all_gt)
all_cnn = np.concatenate(all_cnn)

print(f"\nüìä CNN Baseline (threshold=0.5):")
print(f"  Accuracy:  {accuracy_score(all_gt, all_cnn):.4f}")
print(f"  Precision: {precision_score(all_gt, all_cnn, zero_division=0):.4f}")
print(f"  Recall:    {recall_score(all_gt, all_cnn, zero_division=0):.4f}")
print(f"  F1-Score:  {f1_score(all_gt, all_cnn, zero_division=0):.4f}")
print(f"\n‚ö†Ô∏è Hypothesis: Poor performance on THIN clouds specifically")

## 4Ô∏è‚É£ Pull Latest Code

In [None]:
# Get latest thin cloud detection code
!git pull origin master
print("‚úÖ Repository updated with thin cloud detection features")

## üéØ Train Thin Cloud Detection Agent

**Features**:
- Optical thickness indicators (blue/red ratio, reflectance levels)
- Thin vs thick cloud classification
- Spectral indices (NDSI, NDVI)

**Actions**:
- `threshold_delta` [-0.3, +0.3]: Base threshold adjustment
- `thin_cloud_boost` [0, 0.4]: **Extra boost for thin clouds** (KEY!)
- `spectral_weight` [0, 1]: Filter false positives

**Reward**:
- **BIG BONUS** for detecting thin clouds (5x multiplier)
- Extra rewards for high thin cloud recall (>50%, >70%)
- Penalties for false positives on shadows
- Penalties for missing thick clouds

**Training Time**: 2-3 hours

In [None]:
# Train thin cloud detection agent
print("üöÄ Training Thin Cloud Detection RL Agent...")
print("This will take approximately 2-3 hours")
print("="*80)

!python train_ppo_multifeature.py

## üìä Results Analysis

**Key Metrics to Check**:
1. **Thin Cloud Recall**: Did we detect more thin clouds?
2. **Thin Cloud F1-Score**: Overall thin cloud performance
3. **Thick Cloud Recall**: Did we maintain performance on thick clouds?
4. **Overall Improvement**: Total F1-score gain
5. **Thin Cloud Boost Action**: How much boost did the agent learn?

In [None]:
# Load and display results
import json

print("\n" + "="*80)
print("üìä THIN CLOUD DETECTION RESULTS")
print("="*80)

# Load results
with open('results/multifeature_rl_results.json') as f:
    results = json.load(f)

baseline = results['baseline_cnn']
rl_model = results['multifeature_rl']
thin = results['thin_cloud_metrics']
thick = results['thick_cloud_metrics']
actions = results['action_statistics']

print("\nüß† Baseline CNN:")
print(f"  F1-Score: {baseline['f1_score']:.4f}")
print(f"  Recall:   {baseline['recall']:.4f}")

print("\nüéØ Thin Cloud Detection RL:")
print(f"  Overall F1: {rl_model['f1_score']:.4f} ({results['improvement_percent']:+.2f}% improvement)")

print("\nüí° THIN CLOUD PERFORMANCE (Key Goal!):")
print(f"  Total Thin Clouds: {thin['thin_pixels_total']:,} pixels")
print(f"  Detected: {thin['thin_pixels_detected']:,} pixels")
print(f"  Recall: {thin['recall']:.4f} ({thin['recall']*100:.1f}%)")
print(f"  Precision: {thin['precision']:.4f}")
print(f"  F1-Score: {thin['f1_score']:.4f}")

print("\n‚òÅÔ∏è THICK CLOUD PERFORMANCE (Baseline):")
print(f"  Recall: {thick['recall']:.4f} ({thick['recall']*100:.1f}%)")

print("\nüìä Learned Actions:")
print(f"  Threshold Delta: {actions['threshold_delta']['mean']:+.4f}")
print(f"  Thin Cloud Boost: {actions['thin_cloud_boost']['mean']:.4f} (How much extra boost for thin clouds)")
print(f"  Spectral Weight: {actions['spectral_weight']['mean']:.4f}")

print("\n" + "="*80)

# Key insight
if thin['recall'] > 0.5:
    print("‚úÖ SUCCESS: Detected >50% of thin clouds!")
else:
    print("‚ö†Ô∏è Needs improvement: Thin cloud recall < 50%")
    
if actions['thin_cloud_boost']['mean'] > 0.1:
    print("‚úÖ Agent learned to use thin cloud boost effectively!")
else:
    print("‚ö†Ô∏è Agent didn't learn to use thin cloud boost much")

## üì∏ Visual Comparison: Thin vs Thick Clouds

In [None]:
# Visualize thin vs thick cloud detection
import matplotlib.pyplot as plt
from stable_baselines3 import PPO
from rl_multifeature_environment import MultiFeatureRefinementEnv

# Load model
model_path = sorted(glob.glob('models/ppo_multifeature_*'))[-1]
model = PPO.load(f"{model_path}/model")
print(f"‚úÖ Loaded model: {os.path.basename(model_path)}")

# Select test patches with thin and thick clouds
split_idx = int(0.8 * len(image_files))
test_images = image_files[split_idx:]
test_masks = mask_files[split_idx:]

# Find patches with different cloud types
thin_patch_idx = None
thick_patch_idx = None

for idx, (img_path, mask_path) in enumerate(zip(test_images, test_masks)):
    image = load_sentinel2_image(img_path)
    with rasterio.open(mask_path) as src:
        gt = src.read(1)
    
    # Calculate reflectance
    reflectance = (image[:, :, 1] + image[:, :, 2] + image[:, :, 3] + image[:, :, 7]) / 4.0
    cloud_mask = gt > 0
    
    if cloud_mask.sum() > 1000:  # Has clouds
        mean_reflectance = reflectance[cloud_mask].mean()
        
        if mean_reflectance < 4000 and thin_patch_idx is None:
            thin_patch_idx = idx
        elif mean_reflectance >= 4000 and thick_patch_idx is None:
            thick_patch_idx = idx
    
    if thin_patch_idx is not None and thick_patch_idx is not None:
        break

print(f"Selected patches: Thin={thin_patch_idx}, Thick={thick_patch_idx}")

# Visualize both patches
fig, axes = plt.subplots(2, 5, figsize=(20, 8))

for row, patch_idx in enumerate([thin_patch_idx, thick_patch_idx]):
    if patch_idx is None:
        continue
        
    img_path = test_images[patch_idx]
    mask_path = test_masks[patch_idx]
    
    # Load data
    image = load_sentinel2_image(img_path)
    cnn_prob = get_cloud_mask(image)
    with rasterio.open(mask_path) as src:
        ground_truth = src.read(1)
    
    # Baseline
    baseline_pred = (cnn_prob > 0.5).astype(np.uint8)
    
    # RL prediction
    env = MultiFeatureRefinementEnv(image, cnn_prob, ground_truth, patch_size=64)
    rl_pred = np.zeros_like(ground_truth, dtype=np.uint8)
    
    obs, _ = env.reset()
    for _ in range(env.num_patches):
        i, j = env.current_pos
        action, _ = model.predict(obs, deterministic=True)
        
        # Apply thin cloud boost
        thin_cloud_boost = np.clip(action[1], 0.0, 0.4)
        cnn_patch = cnn_prob[i:i+64, j:j+64].copy()
        
        is_thin = np.logical_and(
            np.logical_and(env.normalized_reflectance[i:i+64, j:j+64] > 1000,
                          env.normalized_reflectance[i:i+64, j:j+64] < 4000),
            env.blue_red_ratio[i:i+64, j:j+64] > 1.05
        )
        cnn_patch[is_thin] += thin_cloud_boost
        rl_pred[i:i+64, j:j+64] = (cnn_patch > 0.5).astype(np.uint8)
        
        obs, _, done, _, _ = env.step(action)
        if done:
            break
    
    # Ground truth
    gt_binary = (ground_truth > 0).astype(np.uint8)
    
    # Classify thin vs thick in GT
    thin_gt = env.thin_clouds_gt
    thick_gt = env.thick_clouds_gt
    
    # RGB
    rgb = image[:, :, [3, 2, 1]]
    rgb = np.clip((rgb - np.percentile(rgb, 2)) / (np.percentile(rgb, 98) - np.percentile(rgb, 2)), 0, 1)
    
    cloud_type = "THIN" if row == 0 else "THICK"
    
    axes[row, 0].imshow(rgb)
    axes[row, 0].set_title(f'{cloud_type} Cloud Patch\nRGB Image', fontsize=10)
    axes[row, 0].axis('off')
    
    axes[row, 1].imshow(gt_binary, cmap='gray')
    axes[row, 1].set_title(f'Ground Truth\n{thin_gt.sum():,} thin, {thick_gt.sum():,} thick', fontsize=10)
    axes[row, 1].axis('off')
    
    axes[row, 2].imshow(baseline_pred, cmap='gray')
    baseline_f1 = f1_score(gt_binary.flatten(), baseline_pred.flatten())
    axes[row, 2].set_title(f'Baseline CNN\nF1: {baseline_f1:.3f}', fontsize=10)
    axes[row, 2].axis('off')
    
    axes[row, 3].imshow(rl_pred, cmap='gray')
    rl_f1 = f1_score(gt_binary.flatten(), rl_pred.flatten())
    axes[row, 3].set_title(f'Thin Cloud Detection RL\nF1: {rl_f1:.3f}', fontsize=10)
    axes[row, 3].axis('off')
    
    # Difference
    diff = rl_pred.astype(int) - baseline_pred.astype(int)
    axes[row, 4].imshow(diff, cmap='RdYlGn', vmin=-1, vmax=1)
    axes[row, 4].set_title(f'Improvement\nGreen=Fixed, Red=Lost', fontsize=10)
    axes[row, 4].axis('off')

plt.tight_layout()
plt.savefig('results/thin_cloud_detection_visual.png', dpi=150, bbox_inches='tight')
print("\n‚úÖ Visualization saved to: results/thin_cloud_detection_visual.png")
plt.show()

## üéì Thesis Conclusions

### Key Findings:

1. **Thin Cloud Detection**:
   - Baseline CNN recall on thin clouds: ~___%
   - Our RL approach: ~___% (improvement: ___)
   
2. **Novel Contribution**:
   - Introduced `thin_cloud_boost` action
   - Agent learns to identify and boost confidence for thin cloud patterns
   - Uses optical thickness indicators (blue/red ratio, reflectance)
   
3. **Maintains Performance on Thick Clouds**:
   - Thick cloud recall: ~___% (should be >90%)
   - No degradation on easy cases
   
4. **Addresses Original Research Question**:
   - CNN weakness on thin clouds: **Confirmed**
   - RL can improve thin cloud detection: **[Validated/Needs more work]**
   - Methodology applicable to other CNN weaknesses: **Yes**

### Next Steps:
- Phase 2: Shadow detection and removal
- Phase 3: Hierarchical refinement for cloud boundaries
- Phase 4: Ensemble and comprehensive validation