## ðŸ“Š Satellite Water Detection â€” Evaluation & Visualization Notebook Overview

This notebook loads model outputs and generates all evaluation metrics and visuals needed to analyze the water-detection system.

### 1. Download Evaluation Files  
Files (bands, masks, samples, metadata, metrics, model) are pulled from the Hugging Face dataset repo and stored locally.

### 2. Load & Validate Data  
Predicted masks, true masks, and sample arrays are loaded, and their shapes are verified before analysis.

### 3. Pixel-Level Evaluation  
A confusion matrix is computed from flattened masks and saved as a heatmap, giving a global view of classification accuracy.

### 4. Tile-Level IoU Analysis  
IoU is calculated for each tile and visualized as a histogram, with mean/median values captured for summary statistics.

### 5. Precision/Recall/F1 Per Tile + Tile Ranking  
Tile-level precision, recall, and F1 are computed, then tiles are ranked to select the best, median, and worst performers.

### 6. Visual Panels for Selected Tiles  
For each representative tile, RGB, true mask, predicted mask, difference map, and overlay images are generated and saved.

### 7. Feature Importance Plot  
Feature importance is extracted from the trained model and visualized, showing which spectral or texture features matter most.

### 8. Automated Evaluation Report  
A Markdown report is generated summarizing metrics, IoU statistics, key visuals, and recommended improvements.

### 9. Upload Results to Hugging Face  
All evaluation artifacts (plots, reports, sample visuals) are uploaded back to the dataset repository for sharing and documentation.

---

This notebook provides a complete evaluation toolkit, turning raw predictions into clear, actionable insights.


In [1]:
!pip install huggingface_hub --quiet


In [2]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.svâ€¦

### Step 0 â€” Download All Required Project Files from Hugging Face  
This cell pulls all necessary artifacts (datasets, metadata, models, predictions, and metrics) from the Hugging Face dataset repository.  
Each file is downloaded using `hf_hub_download` and stored in a local project directory for easy access during visualization.

In [3]:
from huggingface_hub import hf_hub_download
import os

# Your dataset repo ID
REPO_ID = "mishhkaa/satellite-water-detection"  # dataset repo

# Folder where we will store the files locally in Kaggle
OUT_DIR = "/kaggle/working/project_files"
os.makedirs(OUT_DIR, exist_ok=True)

# List of required files from Member 1, 2, 3
files_to_download = [
    "final_bands_merged.npz",
    "final_masks_merged.npz",
    "preprocessing_metadata.json",
    "X_sample.npy",
    "y_sample.npy",
    "feature_metadata.json",
    "best_model.pkl",
    "predicted_masks.npy",
    "true_masks.npy",
    "metrics.json"
]

print("Downloading files...\n")

for file in files_to_download:
    try:
        downloaded_path = hf_hub_download(
            repo_id=REPO_ID,
            filename=file,
            repo_type="dataset"  # important â€” since this is a dataset
        )
        
        # Move file into OUT_DIR for convenience
        dest_path = os.path.join(OUT_DIR, file)
        os.system(f"cp '{downloaded_path}' '{dest_path}'")

        print(f"âœ“ Downloaded: {file}")
    except Exception as e:
        print(f"âœ— Failed to download {file}: {e}")

print("\nAll files stored inside:", OUT_DIR)
print("Contents:", os.listdir(OUT_DIR))


Downloading files...



final_bands_merged.npz:   0%|          | 0.00/3.85G [00:00<?, ?B/s]

âœ“ Downloaded: final_bands_merged.npz


final_masks_merged.npz:   0%|          | 0.00/12.0M [00:00<?, ?B/s]

âœ“ Downloaded: final_masks_merged.npz


preprocessing_metadata.json: 0.00B [00:00, ?B/s]

âœ“ Downloaded: preprocessing_metadata.json


X_sample.npy:   0%|          | 0.00/160M [00:00<?, ?B/s]

âœ“ Downloaded: X_sample.npy


y_sample.npy:   0%|          | 0.00/4.00M [00:00<?, ?B/s]

âœ“ Downloaded: y_sample.npy


feature_metadata.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

âœ“ Downloaded: feature_metadata.json


best_model.pkl:   0%|          | 0.00/1.44M [00:00<?, ?B/s]

âœ“ Downloaded: best_model.pkl


predicted_masks.npy:   0%|          | 0.00/4.78M [00:00<?, ?B/s]

âœ“ Downloaded: predicted_masks.npy


true_masks.npy:   0%|          | 0.00/598k [00:00<?, ?B/s]

âœ“ Downloaded: true_masks.npy


metrics.json:   0%|          | 0.00/178 [00:00<?, ?B/s]

âœ“ Downloaded: metrics.json

All files stored inside: /kaggle/working/project_files
Contents: ['best_model.pkl', 'feature_metadata.json', 'X_sample.npy', 'final_bands_merged.npz', 'true_masks.npy', 'predicted_masks.npy', 'final_masks_merged.npz', 'preprocessing_metadata.json', 'y_sample.npy', 'metrics.json']


### Step 1 â€” Load Downloaded Data & Validate Shapes  
This cell loads all key arrays (predicted masks, true masks, X_sample, y_sample) along with the stored metrics file.  
It prints their shapes and contents to confirm that the downloaded files are intact and ready for visualization.


In [4]:
import numpy as np, json, os

OUT_DIR = "/kaggle/working/project_files"

print("Files found:", os.listdir(OUT_DIR))

# Try loading the core files
preds = np.load(f"{OUT_DIR}/predicted_masks.npy")
trues = np.load(f"{OUT_DIR}/true_masks.npy")
X = np.load(f"{OUT_DIR}/X_sample.npy")
y = np.load(f"{OUT_DIR}/y_sample.npy")

with open(f"{OUT_DIR}/metrics.json","r") as f:
    metrics = json.load(f)

print("\nShapes:")
print("predicted_masks:", preds.shape)
print("true_masks:", trues.shape)
print("X_sample:", X.shape)
print("y_sample:", y.shape)

print("\nmetrics.json:")
print(metrics)


Files found: ['best_model.pkl', 'feature_metadata.json', 'X_sample.npy', 'final_bands_merged.npz', 'true_masks.npy', 'predicted_masks.npy', 'final_masks_merged.npz', 'preprocessing_metadata.json', 'y_sample.npy', 'metrics.json']

Shapes:
predicted_masks: (146, 64, 64)
true_masks: (146, 64, 64)
X_sample: (4000000, 10)
y_sample: (4000000,)

metrics.json:
{'xgboost_accuracy': 0.949235, 'xgboost_f1': 0.7089691283119465, 'lightgbm_accuracy': 0.953795, 'lightgbm_f1': 0.7474285506044842, 'best_model': 'lightgbm'}


### Step 2 â€” Pixel-Level Confusion Matrix Visualization  
This cell computes a pixel-wise confusion matrix between predicted masks and ground-truth masks.  
It plots and saves a visual matrix showing true vs. predicted land/water classifications, helping evaluate model accuracy at the pixel level.


In [5]:
# STEP 2 â€” Confusion Matrix Visualization

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import os

OUT_DIR = "/kaggle/working/project_files"
SAVE_DIR = "/kaggle/working/evaluation_results"
os.makedirs(SAVE_DIR, exist_ok=True)

# Load masks
preds = np.load(f"{OUT_DIR}/predicted_masks.npy") > 0
trues = np.load(f"{OUT_DIR}/true_masks.npy") > 0

# Flatten for pixel-level confusion matrix
preds_flat = preds.ravel().astype(int)
trues_flat = trues.ravel().astype(int)

# Compute confusion matrix
cm = confusion_matrix(trues_flat, preds_flat)

# Plot
plt.figure(figsize=(5,4))
plt.imshow(cm, cmap='Blues')
plt.title("Confusion Matrix (Pixel-Level)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.xticks([0,1], ["Land (0)", "Water (1)"])
plt.yticks([0,1], ["Land (0)", "Water (1)"])

# Annotate values
for i in range(2):
    for j in range(2):
        plt.text(j, i, cm[i,j], ha='center', va='center', color='black')

plt.colorbar()
plt.tight_layout()

# Save
save_path = f"{SAVE_DIR}/confusion_matrix.png"
plt.savefig(save_path, dpi=200)
plt.close()

print("Confusion Matrix saved at:", save_path)


Confusion Matrix saved at: /kaggle/working/evaluation_results/confusion_matrix.png


### Step 3 â€” Per-Tile IoU Distribution  
This cell computes Intersection-over-Union (IoU) for each 64Ã—64 tile by comparing predicted and true masks.  
It plots a histogram of IoU scores, marks the mean and median, saves the visualization, and stores the IoU values for later analysis.

In [6]:
# STEP 3 â€” IoU histogram (per-tile)
import numpy as np
import matplotlib.pyplot as plt
import os

OUT_DIR = "/kaggle/working/project_files"
SAVE_DIR = "/kaggle/working/evaluation_results"
os.makedirs(SAVE_DIR, exist_ok=True)

preds = np.load(f"{OUT_DIR}/predicted_masks.npy") > 0
trues = np.load(f"{OUT_DIR}/true_masks.npy") > 0

n_tiles = preds.shape[0]
ious = []
for i in range(n_tiles):
    p = preds[i].ravel()
    t = trues[i].ravel()
    inter = np.logical_and(p, t).sum()
    union = np.logical_or(p, t).sum()
    iou = inter / union if union > 0 else np.nan
    ious.append(iou)
ious = np.array(ious, dtype=float)

# Stats
valid = ious[~np.isnan(ious)]
mean_iou = np.nanmean(ious)
median_iou = np.nanmedian(valid) if valid.size>0 else float('nan')

# Plot histogram
plt.figure(figsize=(6,4))
plt.hist(valid, bins=30, edgecolor='k', alpha=0.8)
plt.axvline(mean_iou, color='red', linestyle='--', label=f"mean={mean_iou:.3f}")
plt.axvline(median_iou, color='green', linestyle='--', label=f"median={median_iou:.3f}")
plt.title("Per-tile IoU Distribution")
plt.xlabel("IoU")
plt.ylabel("Number of tiles")
plt.legend()
plt.tight_layout()

save_path = os.path.join(SAVE_DIR, "iou_histogram.png")
plt.savefig(save_path, dpi=200)
plt.close()

# Also save numeric ious for later steps
np.save(os.path.join(SAVE_DIR, "tile_ious.npy"), ious)

print("Saved IoU histogram at:", save_path)
print(f"Mean IoU: {mean_iou:.4f}, Median IoU: {median_iou:.4f}, Tiles: {n_tiles}")


Saved IoU histogram at: /kaggle/working/evaluation_results/iou_histogram.png
Mean IoU: 0.5966, Median IoU: 0.5945, Tiles: 146


### Step 4 â€” Compute Tile-Level Precision, Recall, F1 & Select Representative Tiles  
This cell calculates precision, recall, and F1 score for every tile by comparing predicted vs. true masks.  
It saves these metrics, then uses IoU values to identify the **best**, **median**, and **worst** performing tilesâ€”useful for qualitative visualization and error analysis.


In [7]:
# STEP 4 â€” Per-tile precision/recall/F1 and select sample tiles
import numpy as np, json, os
from sklearn.metrics import precision_score, recall_score, f1_score

OUT_DIR = "/kaggle/working/project_files"
SAVE_DIR = "/kaggle/working/evaluation_results"
os.makedirs(SAVE_DIR, exist_ok=True)

preds = np.load(f"{OUT_DIR}/predicted_masks.npy") > 0
trues = np.load(f"{OUT_DIR}/true_masks.npy") > 0

n_tiles = preds.shape[0]
tile_precisions = np.zeros(n_tiles, dtype=float)
tile_recalls = np.zeros(n_tiles, dtype=float)
tile_f1s = np.zeros(n_tiles, dtype=float)
tile_ious = np.load(f"{SAVE_DIR}/tile_ious.npy")

for i in range(n_tiles):
    p = preds[i].ravel().astype(int)
    t = trues[i].ravel().astype(int)

    tile_precisions[i] = precision_score(t, p, zero_division=0)
    tile_recalls[i] = recall_score(t, p, zero_division=0)
    tile_f1s[i] = f1_score(t, p, zero_division=0)

# Save arrays
np.save(os.path.join(SAVE_DIR, "tile_precisions.npy"), tile_precisions)
np.save(os.path.join(SAVE_DIR, "tile_recalls.npy"), tile_recalls)
np.save(os.path.join(SAVE_DIR, "tile_f1s.npy"), tile_f1s)

print("Saved precision/recall/F1 arrays.")

# Use IoU for sorting
metric = tile_ious.copy()

# Sort tile indices
sorted_idx = np.argsort(metric)

worst = sorted_idx[:3].tolist()
best = sorted_idx[-3:].tolist()

median_val = np.nanmedian(metric)
median_idx = np.argsort(np.abs(metric - median_val))[:3].tolist()

sample_indices = {
    "best_tiles": best,
    "median_tiles": median_idx,
    "worst_tiles": worst
}

with open(os.path.join(SAVE_DIR, "sample_tile_indices.json"), "w") as f:
    json.dump(sample_indices, f, indent=2)

print("\nTile selection saved to sample_tile_indices.json")
print("Best:", best)
print("Median:", median_idx)
print("Worst:", worst)


Saved precision/recall/F1 arrays.

Tile selection saved to sample_tile_indices.json
Best: [80, 128, 58]
Median: [17, 75, 87]
Worst: [16, 0, 115]


### Step 5 â€” Generate Best/Median/Worst Tile Visualizations  
This cell loads band data, predicted masks, true masks, and the selected tile indices.  
For each chosen tile, it builds RGB imagery, computes overlay and difference maps (TP/TN/FP/FN), and saves a 4-panel diagnostic figure along with a separate overlay image.  
These visuals help interpret model behavior on strong, typical, and poor predictions.


In [8]:
#Step 5 â€” Create side-by-side visuals (RGB + true + pred + diff) and overlays
import numpy as np
import json, os
import matplotlib.pyplot as plt

OUT_DIR = "/kaggle/working/project_files"
SAVE_DIR = "/kaggle/working/evaluation_results/sample_masks"
os.makedirs(SAVE_DIR, exist_ok=True)

# Try reading band order; fallback to default
preproc_meta_path = os.path.join(OUT_DIR, "preprocessing_metadata.json")
band_order = None
try:
    with open(preproc_meta_path, 'r') as f:
        pm = json.load(f)
    band_order = pm.get("bands", None)
    print("Preprocessing metadata bands:", band_order)
except Exception:
    print("No preprocessing metadata or failed to read it. Assuming band order [B,G,R,NIR].")
    band_order = ["B2","B3","B4","B8"]

# Load bands archive (memory-map)
bands_npz = np.load(os.path.join(OUT_DIR, "final_bands_merged.npz"), mmap_mode='r')
bands_arr = bands_npz['bands']  # shape (N, H, W, C)
print("Bands array shape:", bands_arr.shape)

# Load masks & samples
preds = np.load(os.path.join(OUT_DIR, "predicted_masks.npy")) > 0
trues = np.load(os.path.join(OUT_DIR, "true_masks.npy")) > 0

with open(os.path.join("/kaggle/working/evaluation_results", "sample_tile_indices.json"), 'r') as f:
    samples = json.load(f)

all_indices = samples["best_tiles"] + samples["median_tiles"] + samples["worst_tiles"]
labels_map = (["best"] * len(samples["best_tiles"])
              + ["median"] * len(samples["median_tiles"])
              + ["worst"] * len(samples["worst_tiles"]))

def make_rgb_from_tile(tile):
    # tile shape: (H, W, C) with C>=3, assumed order [B,G,R,NIR] if present
    c = tile.shape[-1]
    if c >= 3:
        B = tile[:,:,0]
        G = tile[:,:,1]
        R = tile[:,:,2]
    else:
        # fallback: duplicate channel
        R = G = B = tile[:,:,0]
    rgb = np.stack([R, G, B], axis=-1).astype(float)
    maxv = np.nanmax(rgb)
    if maxv > 1.0:
        rgb = rgb / 255.0
    rgb = np.clip(rgb, 0.0, 1.0)
    return rgb

def create_diff_map(true_mask, pred_mask):
    H,W = true_mask.shape
    diff = np.zeros((H,W,3), dtype=np.uint8)
    TP = (true_mask == 1) & (pred_mask == 1)
    TN = (true_mask == 0) & (pred_mask == 0)
    FN = (true_mask == 1) & (pred_mask == 0)  # missed water
    FP = (true_mask == 0) & (pred_mask == 1)  # wrong water
    diff[TP] = [0, 0, 255]      # blue
    diff[TN] = [0, 255, 0]      # green
    diff[FN] = [255, 0, 0]      # red
    diff[FP] = [255, 255, 0]    # yellow
    return diff

def overlay_mask_on_rgb(rgb, mask, color=[0,1,1], alpha=0.4):
    # rgb: float [0,1], mask: boolean 2D, color: 3 floats 0..1
    overlay = rgb.copy()
    # Create color vector and convert to same scale as rgb
    color_vec = np.array(color).reshape(1,3)
    # Find pixel indices where mask is True
    mask_inds = np.where(mask)
    if mask_inds[0].size == 0:
        return overlay
    # Extract the pixels at the mask positions
    pixels = overlay[mask_inds]               # shape (num_mask_pixels, 3)
    # Blend
    blended = (1 - alpha) * pixels + alpha * color_vec
    # Assign back
    overlay[mask_inds] = blended
    overlay = np.clip(overlay, 0.0, 1.0)
    return overlay

# Iterate and save visuals
for idx, lab in zip(all_indices, labels_map):
    tile = bands_arr[idx]  # (H,W,C)
    rgb = make_rgb_from_tile(tile)
    true_mask = trues[idx].astype(bool)
    pred_mask = preds[idx].astype(bool)
    diff_map = create_diff_map(true_mask, pred_mask)
    overlay = overlay_mask_on_rgb(rgb, pred_mask, color=[0,1,1], alpha=0.4)  # cyan overlay

    # Plot 4-panel figure: RGB | true | pred | diff
    fig, axes = plt.subplots(1,4, figsize=(16,4))
    axes[0].imshow(rgb)
    axes[0].set_title(f"Tile {idx} RGB ({lab})")
    axes[0].axis('off')

    axes[1].imshow(true_mask, cmap='gray', vmin=0, vmax=1)
    axes[1].set_title("True Mask")
    axes[1].axis('off')

    axes[2].imshow(pred_mask, cmap='gray', vmin=0, vmax=1)
    axes[2].set_title("Predicted Mask")
    axes[2].axis('off')

    axes[3].imshow(diff_map)
    axes[3].set_title("Difference (TP/TN/FP/FN)")
    axes[3].axis('off')

    plt.tight_layout()
    out_path = os.path.join(SAVE_DIR, f"tile_{idx:03d}_{lab}_4panel.png")
    plt.savefig(out_path, dpi=200)
    plt.close()

    # Save overlay separately
    overlay_path = os.path.join(SAVE_DIR, f"tile_{idx:03d}_{lab}_overlay.png")
    plt.imsave(overlay_path, overlay)

    print("Saved visuals for tile", idx, "->", out_path, "and overlay ->", overlay_path)

print("\nAll sample tile visuals saved to:", SAVE_DIR)


Preprocessing metadata bands: None
Bands array shape: (205259, 64, 64, 4)
Saved visuals for tile 80 -> /kaggle/working/evaluation_results/sample_masks/tile_080_best_4panel.png and overlay -> /kaggle/working/evaluation_results/sample_masks/tile_080_best_overlay.png
Saved visuals for tile 128 -> /kaggle/working/evaluation_results/sample_masks/tile_128_best_4panel.png and overlay -> /kaggle/working/evaluation_results/sample_masks/tile_128_best_overlay.png
Saved visuals for tile 58 -> /kaggle/working/evaluation_results/sample_masks/tile_058_best_4panel.png and overlay -> /kaggle/working/evaluation_results/sample_masks/tile_058_best_overlay.png
Saved visuals for tile 17 -> /kaggle/working/evaluation_results/sample_masks/tile_017_median_4panel.png and overlay -> /kaggle/working/evaluation_results/sample_masks/tile_017_median_overlay.png
Saved visuals for tile 75 -> /kaggle/working/evaluation_results/sample_masks/tile_075_median_4panel.png and overlay -> /kaggle/working/evaluation_results/sam

### Step 6 â€” Feature Importance Analysis  
This cell loads the trained model and retrieves feature importance scores using multiple fallback methods (sklearn, LightGBM, XGBoost).  
It aligns the importances with feature names, normalizes them, and plots a horizontal bar chart showing which features contribute most to the modelâ€™s decisions.  
The final plot is saved for reporting.

In [9]:
# Step 6 â€” Robust Feature Importance extraction & plot
import numpy as np
import json, pickle, os
import matplotlib.pyplot as plt

OUT_DIR = "/kaggle/working/project_files"
SAVE_DIR = "/kaggle/working/evaluation_results"
os.makedirs(SAVE_DIR, exist_ok=True)

# Load feature names
feat_meta_path = f"{OUT_DIR}/feature_metadata.json"
with open(feat_meta_path, "r") as f:
    feat_meta = json.load(f)
feature_names = feat_meta.get("features") or feat_meta.get("feature_names")
print("Feature names:", feature_names)

# Load the model
model_path = f"{OUT_DIR}/best_model.pkl"
with open(model_path, "rb") as f:
    model = pickle.load(f)

print("Loaded model type:", type(model))

# Try multiple ways to get importances
importances = None

# 1) sklearn-style (.feature_importances_)
if hasattr(model, "feature_importances_"):
    try:
        importances = model.feature_importances_
        print("Used sklearn-style feature_importances_.")
    except Exception:
        importances = None

# 2) LightGBM Booster (.feature_importance() or .dump_model())
if importances is None:
    try:
        # LightGBM Booster or sklearn API
        if hasattr(model, "feature_importance"):
            importances = model.feature_importance(importance_type="gain")
            print("Used model.feature_importance(importance_type='gain').")
    except Exception:
        importances = None

# 3) XGBoost Booster (get_score)
if importances is None:
    try:
        # xgboost.Booster
        if hasattr(model, "get_score"):
            score_dict = model.get_score(importance_type="gain")
            # score_dict keys like 'f0','f1' â€” map to feature_names order
            # create array
            importances = np.zeros(len(feature_names), dtype=float)
            for fname, sc in score_dict.items():
                # fname may be 'f3' -> index 3
                if fname.startswith("f"):
                    idx = int(fname[1:])
                    if idx < len(importances):
                        importances[idx] = sc
            print("Used xgboost get_score(importance_type='gain').")
    except Exception:
        importances = None

# 4) Try model.dump_model() for structured info
if importances is None:
    try:
        dumped = None
        if hasattr(model, "dump_model"):
            dumped = model.dump_model()
        elif hasattr(model, "booster_") and hasattr(model.booster_, "dump_model"):
            dumped = model.booster_.dump_model()
        if dumped and "feature_importance" in dumped:
            # fallback; try to parse if present
            print("Found dump_model output; attempting to parse.")
    except Exception:
        pass

# Final check
if importances is None:
    raise RuntimeError("Could not extract feature importances from this model object. "
                       "You can still present feature names in report, or use SHAP for per-feature explanations.")
else:
    importances = np.array(importances, dtype=float)

# If lengths mismatch and importances length equals number of features, continue; else try to trim or pad
if importances.size != len(feature_names):
    print("Warning: importance length", importances.size, "!= feature names", len(feature_names))
    # Try to handle simple case: if importances longer, take first N; if shorter, pad with zeros
    if importances.size > len(feature_names):
        importances = importances[:len(feature_names)]
    else:
        padded = np.zeros(len(feature_names), dtype=float)
        padded[:importances.size] = importances
        importances = padded

# Normalize for better display (optional)
if importances.sum() > 0:
    norm_imp = (importances / (importances.sum())) * 100.0
else:
    norm_imp = importances

# Plot horizontal bar chart sorted by importance
order = np.argsort(norm_imp)
sorted_names = [feature_names[i] for i in order]
sorted_vals = norm_imp[order]

plt.figure(figsize=(8,5))
plt.barh(sorted_names, sorted_vals, color="steelblue")
plt.xlabel("Relative importance (% of total)")
plt.title("Feature importance (best_model)")
plt.tight_layout()

save_path = os.path.join(SAVE_DIR, "feature_importance.png")
plt.savefig(save_path, dpi=200)
plt.close()

print("Saved feature importance plot at:", save_path)
print("Feature importance values (percent):")
for name, val in zip(sorted_names[::-1], sorted_vals[::-1]):  # print descending
    print(f"  {name}: {val:.2f}%")


Feature names: ['blue_mean', 'green_mean', 'red_mean', 'nir_mean', 'ndwi_mean', 'ndvi_mean', 'glcm_contrast', 'glcm_homogeneity', 'glcm_energy', 'glcm_dissimilarity']
Loaded model type: <class 'lightgbm.basic.Booster'>
Used model.feature_importance(importance_type='gain').
Saved feature importance plot at: /kaggle/working/evaluation_results/feature_importance.png
Feature importance values (percent):
  blue_mean: 32.68%
  green_mean: 14.55%
  red_mean: 13.95%
  ndvi_mean: 11.97%
  nir_mean: 9.83%
  ndwi_mean: 6.73%
  glcm_energy: 4.45%
  glcm_contrast: 3.16%
  glcm_homogeneity: 1.84%
  glcm_dissimilarity: 0.84%


### Step 7 â€” Generate Final Evaluation Summary & Markdown Report  
This cell compiles all computed metricsâ€”pixel accuracy, precision, recall, F1, IoU statisticsâ€”and organizes them into a clean `evaluation_summary.json`.  
It then auto-generates a Markdown report listing key metrics, plots, sample visual folders, observations, and recommended next steps, saving everything inside the evaluation directory for easy sharing or presentation.


In [10]:
# STEP 7 â€” Generate evaluation_summary.json and evaluation_report.md
import json, os, numpy as np
from pathlib import Path

OUT_DIR = Path("/kaggle/working/project_files")
SAVE_DIR = Path("/kaggle/working/evaluation_results")
SAVE_DIR.mkdir(parents=True, exist_ok=True)

# Load metrics previously computed
summary_path = OUT_DIR / "evaluation_summary.json"
if summary_path.exists():
    with open(summary_path, 'r') as f:
        eval_summary = json.load(f)
else:
    # compute quickly from preds/trues if missing
    preds = np.load(OUT_DIR / "predicted_masks.npy") > 0
    trues = np.load(OUT_DIR / "true_masks.npy") > 0
    preds_flat = preds.ravel().astype(int)
    trues_flat = trues.ravel().astype(int)
    from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
    eval_summary = {
        "pixel_accuracy": float(accuracy_score(trues_flat, preds_flat)),
        "pixel_precision": float(precision_score(trues_flat, preds_flat, zero_division=0)),
        "pixel_recall": float(recall_score(trues_flat, preds_flat, zero_division=0)),
        "pixel_f1": float(f1_score(trues_flat, preds_flat, zero_division=0)),
        "n_tiles": int(preds.shape[0])
    }

# Load tile IoU stats if available
tile_ious_path = SAVE_DIR / "tile_ious.npy"
if tile_ious_path.exists():
    tile_ious = np.load(tile_ious_path)
    valid = tile_ious[~np.isnan(tile_ious)]
    eval_summary.update({
        "mean_tile_iou": float(np.nanmean(tile_ious)),
        "median_tile_iou": float(np.nanmedian(valid)) if valid.size>0 else None,
        "std_tile_iou": float(np.nanstd(valid)) if valid.size>0 else None
    })

# Feature importances (if saved)
feat_imp_path = SAVE_DIR / "feature_importance.png"
fi_available = feat_imp_path.exists()

# sample images folder
sample_dir = SAVE_DIR / "sample_masks"

# Save evaluation_summary.json
with open(SAVE_DIR / "evaluation_summary.json", "w") as f:
    json.dump(eval_summary, f, indent=2)

# Create a Markdown report
md_lines = []
md_lines.append("# Evaluation Report â€” Satellite Water Detection")
md_lines.append("")
md_lines.append("## Key Metrics (pixel-level)")
md_lines.append("")
md_lines.append("| Metric | Value |")
md_lines.append("|---|---:|")
md_lines.append(f"| Accuracy | {eval_summary.get('pixel_accuracy', 'N/A'):.4f} |")
md_lines.append(f"| Precision | {eval_summary.get('pixel_precision', 'N/A'):.4f} |")
md_lines.append(f"| Recall | {eval_summary.get('pixel_recall', 'N/A'):.4f} |")
md_lines.append(f"| F1 score | {eval_summary.get('pixel_f1', 'N/A'):.4f} |")
md_lines.append("")
if "mean_tile_iou" in eval_summary:
    md_lines.append("## Tile-level IoU")
    md_lines.append("")
    md_lines.append(f"- Mean IoU: **{eval_summary['mean_tile_iou']:.4f}**")
    md_lines.append(f"- Median IoU: **{eval_summary.get('median_tile_iou', 'N/A'):.4f}**")
    md_lines.append(f"- Std IoU: **{eval_summary.get('std_tile_iou', 'N/A'):.4f}**")
    md_lines.append("")

md_lines.append("## Plots & Visuals")
md_lines.append("")
md_lines.append(f"- Confusion matrix: `evaluation_results/confusion_matrix.png`")
md_lines.append(f"- Per-tile IoU histogram: `evaluation_results/iou_histogram.png`")
if fi_available:
    md_lines.append(f"- Feature importance: `evaluation_results/feature_importance.png`")
md_lines.append(f"- Sample visuals (best/median/worst): `evaluation_results/sample_masks/`")
md_lines.append("")

md_lines.append("## Observations (short)")
md_lines.append("")
md_lines.append("- Model shows high overall pixel accuracy due to strong land detection.")
md_lines.append("- Water detection F1 ~ (see metrics) indicates good but improvable performance on small or thin water features.")
md_lines.append("- Feature importances indicate strong reliance on blue/visible bands; NDVI/NIR also useful; NDWI contributed but less than raw band means.")
md_lines.append("")
md_lines.append("## Recommended next steps")
md_lines.append("")
md_lines.append("- Consider adding per-pixel context (small CNN/U-Net) if higher IoU is required.")
md_lines.append("- Augment training with edge-focused examples (thin rivers, turbid water).")
md_lines.append("- Experiment with adjusting NDWI threshold or calibrating model output probabilities to optimize IoU.")
md_lines.append("")
md_lines.append("## Files produced")
md_lines.append("")
md_lines.append("- `evaluation_results/` â€” all saved PNGs and summary JSON")
md_lines.append("")

# Write the markdown file
with open(SAVE_DIR / "evaluation_report.md", "w") as f:
    f.write("\n".join(md_lines))

print("Saved evaluation_summary.json and evaluation_report.md to:", SAVE_DIR)


Saved evaluation_summary.json and evaluation_report.md to: /kaggle/working/evaluation_results


### Upload Evaluation Outputs to Hugging Face  
This cell collects all generated evaluation files (confusion matrix, IoU plots, feature importances, markdown report, sample visuals) and uploads them to your Hugging Face dataset repository.  
It preserves folder structure, retries failed uploads, and ensures all evaluation artifacts are stored online for sharing and reproducibility.


In [11]:
# Upload evaluation_results/ files into existing dataset repo on Hugging Face
# Option A â€” add files to existing repo: "mishhkaa/satellite-water-detection"

from huggingface_hub import HfApi
import os, glob, time
from pathlib import Path

OUT_DIR = Path("/kaggle/working/evaluation_results")
REPO_ID = "mishhkaa/satellite-water-detection"   # existing dataset repo
REPO_TYPE = "dataset"  # since this is a dataset
token = os.environ.get("HF_TOKEN")

if token is None:
    raise RuntimeError("HF_TOKEN not found. Add token to Kaggle Secrets or run login() again.")

api = HfApi(token=token)

# List local files to upload
files = []
for root, _, filenames in os.walk(OUT_DIR):
    for fn in filenames:
        local_path = Path(root) / fn
        # remote path inside the repo (preserve folder)
        rel_path = str(local_path.relative_to(OUT_DIR))
        remote_path = f"evaluation_results/{rel_path}"
        files.append((local_path, remote_path))

print(f"Found {len(files)} files to upload from {OUT_DIR}")

# Upload each file
for local_path, remote_path in files:
    print("Uploading", local_path, "->", remote_path)
    # API call
    retry = 0
    while True:
        try:
            api.upload_file(
                path_or_fileobj=str(local_path),
                path_in_repo=remote_path,
                repo_id=REPO_ID,
                repo_type=REPO_TYPE,
                commit_message=f"Add evaluation outputs: {remote_path}",
                token=token,
            )
            print("  âœ“ uploaded")
            break
        except Exception as e:
            retry += 1
            print("  upload failed:", e)
            if retry > 3:
                print("  giving up after 3 retries.")
                break
            print("  retrying in 3s...")
            time.sleep(3)

print("All uploads attempted. Check your dataset repo on Hugging Face to confirm.")


RuntimeError: HF_TOKEN not found. Add token to Kaggle Secrets or run login() again.

In [None]:
from huggingface_hub import login
login()