# 🌍 World Discovery Engine (WDE) — Kaggle Starter Notebook

This reproducible starter demonstrates the WDE pipeline stages (`ingest → detect → evaluate → report`) with **open data principles** and Kaggle-friendly scaffolding.

- Prints environment/runtime info
- Lists inputs in `/kaggle/input`
- Reads config (if available) or defaults
- Generates a demo candidate CSV + manifest
- Embeds CARE & reproducibility notes

➡️ Replace placeholders with real data fetchers, anomaly detection, ADE fingerprints, and dossier builders as you develop:contentReference[oaicite:5]{index=5}:contentReference[oaicite:6]{index=6}.

In [None]:
import os, sys, platform, json, csv, random
from pathlib import Path
from datetime import datetime

INPUT_DIR = Path('/kaggle/input')
WORK_DIR = Path('/kaggle/working') if Path('/kaggle/working').exists() else Path('.')
OUT_DIR = WORK_DIR / 'wde_outputs'
OUT_DIR.mkdir(parents=True, exist_ok=True)

print('Python:', sys.version.split()[0])
print('Platform:', platform.platform())
print('INPUT_DIR exists:', INPUT_DIR.exists())
print('WORK_DIR:', WORK_DIR)
print('OUT_DIR:', OUT_DIR)

## 1) Inspect inputs

In [None]:
def list_tree(root, max_depth=2, max_files=10):
    root = Path(root)
    if not root.exists():
        print('No input directory found.')
        return
    base_depth = len(root.parts)
    for p, dnames, fnames in os.walk(root):
        depth = len(Path(p).parts) - base_depth
        if depth > max_depth:
            dnames[:] = []
            continue
        print(str(Path(p)), f'(dirs={len(dnames)} files={len(fnames)})')
        for fn in fnames[:max_files]:
            print('  -', fn)
        if len(fnames) > max_files:
            print('  ...')

list_tree(INPUT_DIR)

## 2) Load config (`configs/kaggle.yaml` if available)

In [None]:
CFG_PATH = Path('./configs/kaggle.yaml')
CFG = {}
if CFG_PATH.exists():
    try:
        import yaml
        with CFG_PATH.open('r', encoding='utf-8') as f:
            CFG = yaml.safe_load(f)
    except Exception as e:
        print('Config parse error, using defaults:', e)
        CFG = {}
else:
    CFG = {
        'random_seed': 42,
        'sample_rows': 1000,
        'outputs_dir': 'wde_outputs'
    }
print('Config:', json.dumps(CFG, indent=2))

## 3) Generate demo candidates (top 50 synthetic)

In [None]:
random.seed(CFG.get('random_seed', 42))
N = 50
lat_min, lat_max = -15.0, -2.0
lon_min, lon_max = -75.0, -45.0
rows = []
for i in range(N):
    lat = random.uniform(lat_min, lat_max)
    lon = random.uniform(lon_min, lon_max)
    rows.append({
        'id': f'cand_{i+1}',
        'lat': round(lat, 6),
        'lon': round(lon, 6),
        'score': round(random.random(), 4),
        'confidence': round(0.6 + 0.4*random.random(), 3),
        'notes': 'demo'
    })

cand_path = OUT_DIR / 'demo_candidates_top50.csv'
with cand_path.open('w', newline='', encoding='utf-8') as f:
    w = csv.DictWriter(f, fieldnames=['id','lat','lon','score','confidence','notes'])
    w.writeheader()
    for r in rows:
        w.writerow(r)

print('Saved demo candidates to:', cand_path)

## 4) Save run manifest

In [None]:
artifacts = []
for p in OUT_DIR.rglob('*'):
    if p.is_file():
        artifacts.append({'path': str(p.relative_to(OUT_DIR)), 'size_bytes': p.stat().st_size})

manifest = {
    'generated': datetime.utcnow().isoformat()+'Z',
    'outputs_dir': str(OUT_DIR),
    'artifacts': artifacts
}
with (OUT_DIR / 'run_manifest.json').open('w', encoding='utf-8') as f:
    json.dump(manifest, f, indent=2)
print('Manifest entries:', len(artifacts))

## 5) Next Steps
- Replace synthetic demo with full **WDE pipeline** (`ade_discovery_pipeline.ipynb`).
- Attach open geospatial datasets (Sentinel-1/2, Landsat, DEM, SoilGrids, GEDI):contentReference[oaicite:7]{index=7}.
- Implement anomaly detection (CV, VLM, ADE fingerprints, fractal analysis):contentReference[oaicite:8]{index=8}.
- Integrate causal graphs + Bayesian uncertainty:contentReference[oaicite:9]{index=9}.
- Generate site dossiers (maps, overlays, narratives):contentReference[oaicite:10]{index=10}.

⚠️ **Ethics Reminder**: All discoveries must respect CARE Principles and Indigenous sovereignty. Do not release precise coordinates without community/authority consent:contentReference[oaicite:11]{index=11}.