## 1Ô∏è‚É£ Clone & Setup

In [3]:
# Clone repository
!git clone https://github.com/Usernamenisiya/thesis-cloud-rl.git
%cd thesis-cloud-rl

# Verify
!pwd
!ls -la | head -15

Cloning into 'thesis-cloud-rl'...
remote: Enumerating objects: 1765, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 1765 (delta 8), reused 16 (delta 6), pack-reused 1742 (from 2)[K
Receiving objects: 100% (1765/1765), 654.03 MiB | 16.78 MiB/s, done.
Resolving deltas: 100% (244/244), done.
Updating files: 100% (2632/2632), done.
/content/thesis-cloud-rl/thesis-cloud-rl
/content/thesis-cloud-rl/thesis-cloud-rl
total 40064
drwxr-xr-x 4 root root     4096 Jan 12 14:24 .
drwxr-xr-x 5 root root     4096 Jan 12 14:24 ..
-rw-r--r-- 1 root root     3567 Jan 12 14:24 analyze_data_distribution.py
-rw-r--r-- 1 root root     5945 Jan 12 14:24 cloudsen12_loader.py
-rw-r--r-- 1 root root     1661 Jan 12 14:24 cnn_inference.py
-rw-r--r-- 1 root root   200302 Jan 12 14:24 colab_training.ipynb
drwxr-xr-x 3 root root     4096 Jan 12 14:24 data
-rw-r--r-- 1 root root     4697 Jan 12 14:24 data_download.py
-rw-r--r-- 1 root root 

In [4]:
# Install dependencies
!pip install -r requirements.txt
!pip install gymnasium  # Updated from deprecated gym

import torch
import stable_baselines3
import rasterio

print("‚úÖ Dependencies installed")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

‚úÖ Dependencies installed
PyTorch: 2.9.0+cu126
CUDA available: True


In [5]:
# Check GPU
!nvidia-smi

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"\n‚úÖ Using device: {device}")

Mon Jan 12 14:25:40 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:00:04.0 Off |                    0 |
| N/A   34C    P0             42W /  400W |       5MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

## 2Ô∏è‚É£ Setup CloudSEN12 Real Ground Truth Data

**Using CloudSEN12 expert-labeled dataset:**
- Already downloaded: 100 patches in `Colab_Data/cloudsen12_subset/`
- Process with `cloudsen12_loader.py` to extract 10 bands
- **26M pixels** for robust evaluation

In [6]:
# Mount Google Drive
from google.colab import drive
import os
from pathlib import Path

drive.mount('/content/drive')

# Verify CloudSEN12 data exists
cloudsen_path = '/content/drive/MyDrive/Colab_Data/cloudsen12_subset'

if os.path.exists(cloudsen_path):
    num_patches = len([d for d in Path(cloudsen_path).iterdir() if d.is_dir()])
    print(f"‚úÖ CloudSEN12 data found: {num_patches} patches")
    print(f"üìÇ Location: {cloudsen_path}")

    # Process CloudSEN12 data with loader (extracts 10 bands, converts masks)
    print("\nüîß Processing CloudSEN12 patches...")
    !python cloudsen12_loader.py
else:
    print(f"‚ùå CloudSEN12 data not found at: {cloudsen_path}")
    print("Please run CloudSEN12 download notebook first")

Mounted at /content/drive
‚úÖ CloudSEN12 data found: 100 patches
üìÇ Location: /content/drive/MyDrive/Colab_Data/cloudsen12_subset

üîß Processing CloudSEN12 patches...

üîß Preparing CloudSEN12 for Training
üì¶ Loading CloudSEN12 Data

‚úÖ Found 100 patches to load


‚úÖ Successfully loaded 100 patches
üìä Image shape: (512, 512, 10)
üìä Mask shape: (512, 512)
üìä Image bands: 10
üìä Cloud coverage: 16.0%

üíæ Saving 100 patches to data/cloudsen12_processed
  dataset = writer(
  Saved: patch_000
  Saved: patch_001
  Saved: patch_002
  Saved: patch_003
  Saved: patch_004
  Saved: patch_005
  Saved: patch_006
  Saved: patch_007
  Saved: patch_008
  Saved: patch_009
  Saved: patch_010
  Saved: patch_011
  Saved: patch_012
  Saved: patch_013
  Saved: patch_014
  Saved: patch_015
  Saved: patch_016
  Saved: patch_017
  Saved: patch_018
  Saved: patch_019
  Saved: patch_020
  Saved: patch_021
  Saved: patch_022
  Saved: patch_023
  Saved: patch_024
  Saved: patch_025
  Saved: patch

In [7]:
# Verify processed CloudSEN12 data
import os
from pathlib import Path
import glob

processed_dir = 'data/cloudsen12_processed'
image_files = glob.glob(f'{processed_dir}/*_image.tif')
mask_files = glob.glob(f'{processed_dir}/*_mask.tif')

if len(image_files) > 0 and len(mask_files) > 0:
    print(f"‚úÖ CloudSEN12 data processed successfully!")
    print(f"üìä Found {len(image_files)} image patches")
    print(f"üìä Found {len(mask_files)} mask patches")
    print("\nüéØ Ready for training with real ground truth!")
else:
    print("‚ùå Processed data not found")
    print("Please check cloudsen12_loader.py output for errors")

‚úÖ CloudSEN12 data processed successfully!
üìä Found 100 image patches
üìä Found 100 mask patches

üéØ Ready for training with real ground truth!


## 3Ô∏è‚É£ Check CNN Baseline

In [9]:
# Load and test CNN baseline on CloudSEN12 patches
from cnn_inference import load_sentinel2_image, get_cloud_mask
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import rasterio
import numpy as np
import glob

print("üß† Evaluating CNN Baseline on CloudSEN12 Real Ground Truth")
print("="*60)

# Load all processed CloudSEN12 patches
image_files = sorted(glob.glob('data/cloudsen12_processed/*_image.tif'))
mask_files = sorted(glob.glob('data/cloudsen12_processed/*_mask.tif'))

all_gt = []
all_cnn = []

print(f"Processing {len(image_files)} patches...\n")

for img_path, mask_path in zip(image_files, mask_files):  # Use ALL patches
    # Load image and get CNN prediction
    image = load_sentinel2_image(img_path)
    cnn_prob = get_cloud_mask(image)

    # Load real ground truth
    with rasterio.open(mask_path) as src:
        ground_truth = src.read(1)

    # Binary conversion
    gt_binary = (ground_truth > 0).astype(np.uint8)
    cnn_binary = (cnn_prob > 0.5).astype(np.uint8)

    all_gt.append(gt_binary.flatten())
    all_cnn.append(cnn_binary.flatten())

# Combine all patches
all_gt = np.concatenate(all_gt)
all_cnn = np.concatenate(all_cnn)

# Calculate metrics
accuracy = accuracy_score(all_gt, all_cnn)
precision = precision_score(all_gt, all_cnn, zero_division=0)
recall = recall_score(all_gt, all_cnn, zero_division=0)
f1 = f1_score(all_gt, all_cnn, zero_division=0)

print(f"\nüìä Evaluated on {len(image_files)} CloudSEN12 patches")
print(f"üìä Total pixels: {len(all_gt):,}")
print("\nüß† CNN Baseline (Real Ground Truth):")
print(f"  Accuracy:  {accuracy:.4f}")
print(f"  Precision: {precision:.4f}")
print(f"  Recall:    {recall:.4f}")

print(f"  F1-Score:  {f1:.4f}")
print(f"üìä CNN predicted: {all_cnn.sum():,} cloud pixels ({all_cnn.mean()*100:.1f}%)")
print(f"\nüìä Ground truth: {all_gt.sum():,} cloud pixels ({all_gt.mean()*100:.1f}%)")

üß† Evaluating CNN Baseline on CloudSEN12 Real Ground Truth
Processing 100 patches...



  dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  return datetime.utcnow().replace(tzinfo=utc)



üìä Evaluated on 100 CloudSEN12 patches
üìä Total pixels: 26,214,400

üß† CNN Baseline (Real Ground Truth):
  Accuracy:  0.6652
  Precision: 0.1313
  Recall:    0.1935
  F1-Score:  0.1564
üìä CNN predicted: 6,198,343 cloud pixels (23.6%)

üìä Ground truth: 4,205,740 cloud pixels (16.0%)


  return datetime.utcnow().replace(tzinfo=utc)


## 4Ô∏è‚É£ Pull Latest Code & Train PPO

In [10]:
# Get latest code with PPO improvements
!git pull origin master
print("‚úÖ Repository updated")

From https://github.com/Usernamenisiya/thesis-cloud-rl
 * branch            master     -> FETCH_HEAD
Already up to date.
‚úÖ Repository updated


In [11]:
# Optimize CNN threshold for fair baseline comparison
print("üîç Finding optimal CNN threshold...")
!python optimize_cnn_threshold.py

üîç Finding optimal CNN threshold...
üîç Optimizing CNN Threshold on CloudSEN12

üìÇ Loading 100 CloudSEN12 patches...
  dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
‚úÖ Data loaded

üî¨ Testing thresholds...
  Threshold 0.10: F1=0.2435, Acc=0.3680, Prec=0.1507, Rec=0.6341
  Threshold 0.15: F1=0.2293, Acc=0.4220, Prec=0.1459, Rec=0.5360
  Threshold 0.20: F1=0.2216, Acc=0.4737, Prec=0.1453, Rec=0.4670
  Threshold 0.25: F1=0.2122, Acc=0.5077, Prec=0.1427, Rec=0.4132
  Threshold 0.30: F1=0.2046, Acc=0.5424, Prec=0.1418, Rec=0.3668
  Threshold 0.35: F1=0.1950, Acc=0.5765, Prec=0.1403, Rec=0.3198
  Threshold 0.40: F1=0.1829, Acc=0.6090, Prec=0.1376, Rec=0.2728
  Threshold 0.45: F1=0.1709, Acc=0.6414, Prec=0.1359, Rec=0.2304
  Threshold 0.50: F1=0.1564, Acc=0.6652, Prec=0.1313, Rec=0.1935
  Threshold 0.55: F1=0.1411, Acc=0.6822, Prec=0.1246, Rec=0.1627
  Threshold 0.60: F1=0.1285, Acc=0.6935, Prec=0.1181, Rec=0.1409
  Threshold 0.65: F1=0.1160, Acc=0.7014, Prec

## üéØ Comprehensive Approach: Three Methods

We'll implement three approaches with progressive improvements:

1. **Optimal Threshold (Classical)** - Grid search, 5 minutes
2. **CNN Fine-Tuning (Transfer Learning)** - Domain adaptation, 30 minutes  
3. **RL Threshold Refinement (Novel)** - Spatially-adaptive thresholds, 1 hour

This provides:
- ‚úÖ Multiple baselines for comparison
- ‚úÖ Progressive improvement narrative
- ‚úÖ Novel RL contribution that actually improves results

### üìä Approach 1: Optimal Threshold (Grid Search)

In [None]:
# Optimal threshold grid search (fast, no training)
print("üîç Finding optimal CNN threshold via grid search...")
print("Testing thresholds from 0.1 to 0.9 on train set")
print("="*60)

!python optimize_threshold_grid_search.py

### üî• Approach 2: CNN Fine-Tuning (Transfer Learning)

In [None]:
# Fine-tune CNN on CloudSEN12 train set (30 minutes)
print("üî• Fine-tuning CNN on CloudSEN12 with transfer learning...")
print("Low learning rate (1e-5) for 10 epochs")
print("="*60)

!python finetune_cnn_cloudsen12.py

### üéØ Approach 3: RL Adaptive Threshold Refinement (Novel Contribution)

In [None]:
# RL-based adaptive threshold refinement (1 hour)
print("üéØ Training RL agent for spatially-adaptive thresholds...")
print("Agent learns to adjust CNN threshold per patch based on local context")
print("Action: continuous threshold delta [-0.3, +0.3]")
print("Reward: F1-score improvement over baseline")
print("="*60)

!python train_ppo_threshold_refinement.py

### üìä Compare All Approaches

In [None]:
# Compare all three approaches
import json
from pathlib import Path

print("\n" + "="*60)
print("üìä COMPREHENSIVE COMPARISON - ALL APPROACHES")
print("="*60)

# Load baseline CNN (threshold=0.5)
baseline_f1 = 0.2571  # From earlier evaluation
baseline_acc = 0.6719

print("\nüß† Baseline CNN (threshold=0.5):")
print(f"  Accuracy:  {baseline_acc:.4f}")
print(f"  F1-Score:  {baseline_f1:.4f}")

# Load optimal threshold results
if Path('results/optimal_threshold_results.json').exists():
    with open('results/optimal_threshold_results.json') as f:
        opt_results = json.load(f)
    opt_threshold = opt_results['best_threshold']
    opt_f1 = opt_results['train_metrics']['f1_score']
    opt_acc = opt_results['train_metrics']['accuracy']
    
    print(f"\nüìä Approach 1: Optimal Threshold ({opt_threshold:.2f}):")
    print(f"  Accuracy:  {opt_acc:.4f}")
    print(f"  F1-Score:  {opt_f1:.4f}")
    print(f"  Improvement: {(opt_f1 - baseline_f1) / baseline_f1 * 100:+.2f}%")

# Load fine-tuned CNN results
if Path('results/cnn_finetuning_results.json').exists():
    with open('results/cnn_finetuning_results.json') as f:
        finetune_results = json.load(f)
    ft_f1 = finetune_results['finetuned_metrics']['f1_score']
    ft_acc = finetune_results['finetuned_metrics']['accuracy']
    
    print(f"\nüî• Approach 2: Fine-Tuned CNN:")
    print(f"  Accuracy:  {ft_acc:.4f}")
    print(f"  F1-Score:  {ft_f1:.4f}")
    print(f"  Improvement: {(ft_f1 - baseline_f1) / baseline_f1 * 100:+.2f}%")

# Load RL threshold refinement results
if Path('results/threshold_refinement_results.json').exists():
    with open('results/threshold_refinement_results.json') as f:
        rl_results = json.load(f)
    rl_f1 = rl_results['rl_threshold_refinement']['f1_score']
    rl_acc = rl_results['rl_threshold_refinement']['accuracy']
    
    print(f"\nüéØ Approach 3: RL Adaptive Threshold:")
    print(f"  Accuracy:  {rl_acc:.4f}")
    print(f"  F1-Score:  {rl_f1:.4f}")
    print(f"  Improvement: {(rl_f1 - baseline_f1) / baseline_f1 * 100:+.2f}%")
    
    mean_delta = rl_results['threshold_statistics']['mean_delta']
    print(f"  Mean threshold adjustment: {mean_delta:+.4f}")

print("\n" + "="*60)
print("‚úÖ All approaches evaluated!")
print("="*60)

In [12]:
# Run PPO training (main step - takes 1-2 hours)
print("üöÄ Starting PPO training...")
print("This will take 1-2 hours with GPU")
print("="*60)

!python train_ppo.py

üöÄ Starting PPO training...
This will take 1-2 hours with GPU
2026-01-12 14:36:50.150236: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1768228610.171002    8622 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1768228610.177290    8622 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1768228610.193083    8622 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768228610.193108    8622 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768228610.

### üîÑ Train PPO with Multiple Patches (Better Generalization)

In [None]:
# ‚ö° FAST Multi-Patch PPO training (pre-loads all patches into RAM)
print("üöÄ Starting FAST Multi-Patch PPO training...")
print("Pre-loading 80 patches into RAM to avoid I/O overhead")
print("="*60)

!python train_ppo_multipatch_fast.py

In [None]:
# Save multi-patch model to Google Drive
import shutil
import glob

model_dirs = sorted(glob.glob("models/ppo_multipatch_model_*"))
if model_dirs:
    latest_model = model_dirs[-1]
    drive_model_path = "/content/drive/MyDrive/Colab_Data/ppo_multipatch_model_final"
    
    print(f"üì¶ Copying multi-patch model to Drive: {drive_model_path}")
    if os.path.exists(drive_model_path):
        shutil.rmtree(drive_model_path)
    shutil.copytree(latest_model, drive_model_path)
    print(f"‚úÖ Multi-patch model saved to Google Drive!")
else:
    print("‚ùå No multi-patch model found!")

In [None]:
!python evaluate_saved_model.py models/ppo_cloud_refinement_model_20260112_062116

2026-01-12 07:23:24.410071: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1768202604.431626   31076 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1768202604.438035   31076 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1768202604.454205   31076 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768202604.454240   31076 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768202604.454243   31076 computation_placer.cc:177] computation placer alr

### üì¶ Backup Current Results (Single-Patch Training)

In [None]:
# Preserve the single-patch training results for manuscript
import shutil
import os

# Backup paths
drive_backup = "/content/drive/MyDrive/Colab_Data/thesis_results_backup_single_patch"
os.makedirs(drive_backup, exist_ok=True)

# Copy current model to backup location
current_model = "models/ppo_cloud_refinement_model_20260112_150456"
if os.path.exists(current_model):
    backup_model = f"{drive_backup}/ppo_single_patch_model"
    if os.path.exists(backup_model):
        shutil.rmtree(backup_model)
    shutil.copytree(current_model, backup_model)
    print(f"‚úÖ Model backed up to: {backup_model}")

# Copy results JSON
if os.path.exists("results/ppo_training_results.json"):
    shutil.copy("results/ppo_training_results.json", f"{drive_backup}/single_patch_results.json")
    print(f"‚úÖ Results backed up to: {drive_backup}/single_patch_results.json")

# Copy refined mask
if os.path.exists("data/ppo_refined_cloud_mask.tif"):
    shutil.copy("data/ppo_refined_cloud_mask.tif", f"{drive_backup}/single_patch_refined_mask.tif")
    print(f"‚úÖ Refined mask backed up")

print(f"\nüìÇ Backup location: {drive_backup}")
print("‚úÖ Single-patch training results preserved for manuscript!")

In [None]:
!python evaluate_saved_model.py models/ppo_cloud_refinement_model_20260112_065746

2026-01-12 07:14:10.610183: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1768202050.631174   28746 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1768202050.637559   28746 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1768202050.653443   28746 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768202050.653470   28746 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768202050.653473   28746 computation_placer.cc:177] computation placer alr

## 5Ô∏è‚É£ Results & Download

In [13]:
# Display training results
import json
from pathlib import Path

results_file = Path('results/ppo_training_results.json')

if results_file.exists():
    with open(results_file) as f:
        results = json.load(f)

    cnn = results['cnn_baseline']
    ppo = results['ppo_refined']
    imp = results['improvements']

    print("\n" + "="*60)
    print("üìà PPO TRAINING RESULTS")
    print("="*60)

    print("\nüß† CNN Baseline:")
    print(f"  Accuracy:  {cnn['accuracy']:.4f}")
    print(f"  Precision: {cnn['precision']:.4f}")
    print(f"  Recall:    {cnn['recall']:.4f}")
    print(f"  F1-Score:  {cnn['f1_score']:.4f}")

    print("\nü§ñ PPO Refined:")
    print(f"  Accuracy:  {ppo['accuracy']:.4f}")
    print(f"  Precision: {ppo['precision']:.4f}")
    print(f"  Recall:    {ppo['recall']:.4f}")
    print(f"  F1-Score:  {ppo['f1_score']:.4f}")

    print("\nüéØ Improvements:")
    print(f"  F1-Score:  {imp['f1_score_percent']:+.2f}%")
    print(f"  Accuracy:  {imp['accuracy_percent']:+.2f}%")
    print(f"  Precision: {imp['precision_delta']:+.4f}")
    print(f"  Recall:    {imp['recall_delta']:+.4f}")
    print("\n" + "="*60)
else:
    print("‚ùå Results file not found")
    print("Make sure PPO training completed successfully")


üìà PPO TRAINING RESULTS

üß† CNN Baseline:
  Accuracy:  0.6719
  Precision: 0.1918
  Recall:    0.3898
  F1-Score:  0.2571

ü§ñ PPO Refined:
  Accuracy:  0.5011
  Precision: 0.1347
  Recall:    0.4473
  F1-Score:  0.2071

üéØ Improvements:
  F1-Score:  -19.45%
  Accuracy:  -25.42%
  Precision: -0.0571
  Recall:    +0.0575



In [14]:
# Save to Google Drive
import shutil
from pathlib import Path

gdrive_results = '/content/drive/MyDrive/Colab_Data/thesis_results'
Path(gdrive_results).mkdir(parents=True, exist_ok=True)

# Copy results
try:
    shutil.copy('results/ppo_training_results.json', f'{gdrive_results}/ppo_results.json')
    print("‚úÖ Results saved to Google Drive")
except:
    print("‚ö†Ô∏è  Could not save results to Google Drive")

# Copy model
try:
    import glob
    model_files = glob.glob('models/ppo_cloud_refinement_model*')
    for f in model_files:
        shutil.copy(f, f'{gdrive_results}/{Path(f).name}')
    print("‚úÖ Model saved to Google Drive")
except:
    print("‚ö†Ô∏è  Could not save model")

print(f"\nüìÇ Results at: {gdrive_results}")

‚úÖ Results saved to Google Drive
‚úÖ Model saved to Google Drive

üìÇ Results at: /content/drive/MyDrive/Colab_Data/thesis_results


## ‚úÖ Summary

**Done!** Your PPO agent has been trained.

**What happened:**
1. ‚úÖ Loaded CNN baseline performance
2. ‚úÖ Trained PPO with balanced reward structure
3. ‚úÖ Evaluated on test data
4. ‚úÖ Saved results and model

**Key improvements in PPO:**
- Better exploration with entropy coefficient
- Policy gradient approach handles reward shaping better
- Larger patch size (64√ó64) for better context
- 100k timesteps for better convergence

**Next steps:**
1. Download results from Google Drive
2. Analyze the refined cloud mask
3. Consider hyperparameter tuning if needed

**For thesis writing:**
- See `thesis_recommendations.md` for advanced techniques
- Check `training_results.json` for detailed metrics

## üéØ Evaluate Saved Model on Test Set (80/20 split)

In [16]:
# Load the saved model
import glob
import os
from pathlib import Path
from stable_baselines3 import PPO

# Define the Google Drive results directory (this should be consistent)
gdrive_results = '/content/drive/MyDrive/Colab_Data/thesis_results'

# Find the latest PPO model saved in that directory
# Models are saved as 'ppo_cloud_refinement_model_<timestamp>.zip'
model_files_in_drive = glob.glob(os.path.join(gdrive_results, 'ppo_cloud_refinement_model*.zip'))

if not model_files_in_drive:
    print(f"‚ùå No PPO model found in {gdrive_results}. Please ensure the training step completed and saved the model correctly.")
    # You might want to exit or raise an error here depending on desired behavior
    raise FileNotFoundError(f"No PPO model found in {gdrive_results}")

# Sort by modification time to get the latest model
model_files_in_drive.sort(key=os.path.getmtime, reverse=True)
model_path = model_files_in_drive[0]

print(f"ü§ñ Loading model from: {model_path}")

model = PPO.load(model_path)
print("‚úÖ Model loaded successfully")

ü§ñ Loading model from: /content/drive/MyDrive/Colab_Data/thesis_results/ppo_cloud_refinement_model_20260112_150456.zip
‚úÖ Model loaded successfully


  return datetime.utcnow().replace(tzinfo=utc)


In [19]:
# Load test data (20 patches: indices 80-100)
import glob
import os

data_dir = "data/cloudsen12_processed"
image_files = sorted(glob.glob(os.path.join(data_dir, "*_image.tif")))
mask_files = sorted(glob.glob(os.path.join(data_dir, "*_mask.tif")))

# 80/20 split
split_idx = int(0.8 * len(image_files))
test_image_files = image_files[split_idx:]
test_mask_files = mask_files[split_idx:]

print(f"üìä Total patches: {len(image_files)}")
print(f"üìä Train patches: {split_idx} (indices 0-{split_idx-1})")
print(f"üìä Test patches: {len(test_image_files)} (indices {split_idx}-{len(image_files)-1})")
print(f"‚úÖ Test set loaded: {len(test_image_files)} patches")

üìä Total patches: 100
üìä Train patches: 80 (indices 0-79)
üìä Test patches: 20 (indices 80-99)
‚úÖ Test set loaded: 20 patches


In [20]:
# Evaluate on all test patches
from rl_environment import CloudMaskRefinementEnv
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print(f"\nüìä Evaluating on {len(test_image_files)} test patches...")

all_gt = []
all_cnn = []
all_ppo = []

for idx, (img_path, mask_path) in enumerate(zip(test_image_files, test_mask_files)):
    print(f"  Processing test patch {idx+1}/{len(test_image_files)}", end='\r')

    # Load test patch
    test_image = load_sentinel2_image(img_path)
    test_cnn_prob = get_cloud_mask(test_image)

    with rasterio.open(mask_path) as src:
        test_gt = src.read(1)

    # Create evaluation environment for this patch
    eval_env = CloudMaskRefinementEnv(test_image, test_cnn_prob, test_gt, patch_size=64)
    rl_predictions = np.zeros_like(test_gt, dtype=np.uint8)

    # Evaluate all patches (each is a separate episode)
    num_patches = len(eval_env.all_positions)

    for patch_idx in range(num_patches):
        obs, _ = eval_env.reset()
        i, j = eval_env.current_pos
        patch_size = eval_env.patch_size

        action, _ = model.predict(obs, deterministic=True)
        rl_predictions[i:i+patch_size, j:j+patch_size] = action

        obs, reward, done, truncated, info = eval_env.step(action)

    # Collect predictions
    gt_binary = (test_gt > 0).astype(np.uint8)
    cnn_binary = (test_cnn_prob > 0.5).astype(np.uint8)
    rl_binary = (rl_predictions > 0).astype(np.uint8)

    all_gt.append(gt_binary.flatten())
    all_cnn.append(cnn_binary.flatten())
    all_ppo.append(rl_binary.flatten())

print(f"\n‚úÖ Evaluation completed on {len(test_image_files)} test patches")

# Combine all test patches
all_gt = np.concatenate(all_gt)
all_cnn = np.concatenate(all_cnn)
all_ppo = np.concatenate(all_ppo)

print(f"üìä Total test pixels: {len(all_gt):,}")


üìä Evaluating on 20 test patches...
  Processing test patch 1/20

  dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)


üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49
üîß Initializing CloudMaskRefinementEnv - Episode-per-Patch Design
üìä Total patches: 49

In [22]:
# Calculate metrics on test set
cnn_accuracy = accuracy_score(all_gt, all_cnn)
cnn_precision = precision_score(all_gt, all_cnn, zero_division=0)
cnn_recall = recall_score(all_gt, all_cnn, zero_division=0)
cnn_f1 = f1_score(all_gt, all_cnn, zero_division=0)

ppo_accuracy = accuracy_score(all_gt, all_ppo)
ppo_precision = precision_score(all_gt, all_ppo, zero_division=0)
ppo_recall = recall_score(all_gt, all_ppo, zero_division=0)
ppo_f1 = f1_score(all_gt, all_ppo, zero_division=0)

# Calculate improvements
f1_improvement = ((ppo_f1 - cnn_f1) / cnn_f1 * 100) if cnn_f1 > 0 else 0
accuracy_improvement = ((ppo_accuracy - cnn_accuracy) / cnn_accuracy * 100) if cnn_accuracy > 0 else 0

print("\n" + "=" * 60)
print(f"üìà TEST SET RESULTS ({len(test_image_files)} patches, {len(all_gt):,} pixels)")
print("=" * 60)

print("\nüß† CNN Baseline:")
print(f"  Accuracy:  {cnn_accuracy:.4f} ({cnn_accuracy*100:.2f}%)")
print(f"  Precision: {cnn_precision:.4f}")
print(f"  Recall:    {cnn_recall:.4f}")
print(f"  F1-Score:  {cnn_f1:.4f}")

print("\nü§ñ PPO Refined:")
print(f"  Accuracy:  {ppo_accuracy:.4f} ({ppo_accuracy*100:.2f}%)")
print(f"  Precision: {ppo_precision:.4f}")
print(f"  Recall:    {ppo_recall:.4f}")
print(f"  F1-Score:  {ppo_f1:.4f}")

print("\nüéØ Improvements:")
print(f"  F1-Score:  {f1_improvement:+.2f}%")
print(f"  Accuracy:  {accuracy_improvement:+.2f}%")
print(f"  Precision: {ppo_precision - cnn_precision:+.4f}")
print(f"  Recall:    {ppo_recall - cnn_recall:+.4f}")

print("\n" + "=" * 60)


üìà TEST SET RESULTS (20 patches, 5,242,880 pixels)

üß† CNN Baseline:
  Accuracy:  0.6719 (67.19%)
  Precision: 0.1918
  Recall:    0.3898
  F1-Score:  0.2571

ü§ñ PPO Refined:
  Accuracy:  0.5011 (50.11%)
  Precision: 0.1347
  Recall:    0.4473
  F1-Score:  0.2071

üéØ Improvements:
  F1-Score:  -19.45%
  Accuracy:  -25.42%
  Precision: -0.0571
  Recall:    +0.0575



  return datetime.utcnow().replace(tzinfo=utc)
