# 01 — Pipelines PDAL (setup reproducible)

Este notebook **crea y guarda** los archivos JSON de PDAL que usa el proyecto.
No ejecuta PDAL (eso lo hacen los otros notebooks); solo deja los pipelines listos en `pipelines/`.

**Genera:**
- `pipelines/pipeline_make_hag.json` → ground + HAG  
- `pipelines/pipeline_filter_veg.json` → filtrar HAG ≥ 2 m  
- `pipelines/pipeline_dbscan.json` → DBSCAN (deja ruido -1)  
- `pipelines/cluster_dbscan.json` → DBSCAN + quita ruido en el pipeline  
- (opcional) `pipelines/pipeline_ground_hag_pmf.json` → variante de ground+HAG

In [8]:
# === Celda 1: Rutas y parámetros ===
import json
from pathlib import Path

ROOT = Path(".").resolve()
PIPES = ROOT / "pipelines"
DATA  = ROOT / "data"
RAW   = DATA / "raw" / "sample.laz"
PROC  = DATA / "processed"

PIPES.mkdir(parents=True, exist_ok=True)
PROC.mkdir(parents=True, exist_ok=True)

# Parámetros DBSCAN (ajustables)
DBSCAN_EPS = 0.8
DBSCAN_MINPTS = 20

# Salidas usadas en el flujo
LAS_HAG = PROC / "sample_hag.las"
LAS_VEG = PROC / "veg_gt2m.las"
LAS_DB  = PROC / "veg_gt2m_dbscan.las"
CSV_DB  = PROC / "veg_gt2m_dbscan.csv"
CSV_DB_CLEAN = PROC / "veg_gt2m_dbscan_clean.csv"

print("Pipelines →", PIPES)
print("RAW esperado →", RAW)

Pipelines → /Users/cecilialedesma/Library/Mobile Documents/com~apple~CloudDocs/projects_2025/lidar_vegetation_classification/notebooks/pipelines
RAW esperado → /Users/cecilialedesma/Library/Mobile Documents/com~apple~CloudDocs/projects_2025/lidar_vegetation_classification/notebooks/data/raw/sample.laz


## Rutas y parámetros

Define rutas base (`data/raw`, `data/processed`, `pipelines`) y los hiperparámetros de
DBSCAN que se inyectarán en los JSON:

- `DBSCAN_EPS` → radio de vecindad (m)  
- `DBSCAN_MINPTS` → puntos mínimos por clúster

> Podés ajustar estos valores acá y **regenerar** los JSON para re‐correr el pipeline con nuevos parámetros.

In [9]:
# === Celda 2: pipeline_make_hag.json ===
# Lee sample.laz → filtra terreno (PMF) → calcula HAG → escribe LAS 1.4 formato 6 con extra_dims
pipeline_make_hag = {
  "pipeline": [
    {"type": "readers.las", "filename": str(RAW)},
    {"type": "filters.pmf",
     "max_window_size": 18, "slope": 0.15,
     "initial_distance": 0.5, "cell_size": 1.0},
    {"type": "filters.hag_delaunay"},
    {"type": "writers.las",
     "filename": str(LAS_HAG),
     "minor_version": 4, "dataformat_id": 6,
     "extra_dims": "all"}
  ]
}

PIPE_MAKE_HAG = PIPES / "pipeline_make_hag.json"
PIPE_MAKE_HAG.write_text(json.dumps(pipeline_make_hag, indent=2))
print("✅ Escrito:", PIPE_MAKE_HAG)

✅ Escrito: /Users/cecilialedesma/Library/Mobile Documents/com~apple~CloudDocs/projects_2025/lidar_vegetation_classification/notebooks/pipelines/pipeline_make_hag.json


## `pipeline_make_hag.json` — Ground + Height Above Ground (HAG)

**Entrada:** `data/raw/sample.laz`  
**Proceso:**  
1) `filters.pmf` (suelo)  
2) `filters.hag_delaunay` (calcula `HeightAboveGround`)  
**Salida:** `data/processed/sample_hag.las` (LAS 1.4, formato 6, `extra_dims=all`)

> Este paso es la base: añade la dimensión `HeightAboveGround` que usan todos los siguientes.

In [10]:
# === Celda 3: pipeline_filter_veg.json (HAG ≥ 2 m) ===
# Filtra por HeightAboveGround >= 2 y preserva extra_dims.
pipeline_filter_veg = {
  "pipeline": [
    str(LAS_HAG),
    {"type": "filters.range",
     "limits": "HeightAboveGround[2:]"},
    {"type": "writers.las",
     "filename": str(LAS_VEG),
     "minor_version": 4, "dataformat_id": 6,
     "extra_dims": "all"}
  ]
}

PIPE_FILTER_VEG = PIPES / "pipeline_filter_veg.json"
PIPE_FILTER_VEG.write_text(json.dumps(pipeline_filter_veg, indent=2))
print("✅ Escrito:", PIPE_FILTER_VEG)

✅ Escrito: /Users/cecilialedesma/Library/Mobile Documents/com~apple~CloudDocs/projects_2025/lidar_vegetation_classification/notebooks/pipelines/pipeline_filter_veg.json


## `pipeline_filter_veg.json` — Vegetación HAG ≥ 2 m

**Entrada:** `data/processed/sample_hag.las`  
**Proceso:** `filters.range` con `HeightAboveGround[2:]`  
**Salida:** `data/processed/veg_gt2m.las`

> Deja solo puntos con altura relativa ≥ 2 m (copas / arbustos altos).

In [11]:
# === Celda 4: pipeline_dbscan.json (deja ruido; CSV + LAS) ===
# Corre DBSCAN y exporta LAS + CSV con ClusterID (incluye ruido -1, se limpia luego en pandas).
pipeline_dbscan = {
  "pipeline": [
    str(LAS_VEG),
    {"type": "filters.dbscan",
     "min_points": DBSCAN_MINPTS,
     "eps": DBSCAN_EPS,
     "dimensions": "X,Y,Z"},
    {"type": "writers.las",
     "filename": str(LAS_DB),
     "minor_version": 4, "dataformat_id": 6,
     "extra_dims": "all"},
    {"type": "writers.text",
     "filename": str(CSV_DB),
     "format": "csv",
     "order": "X,Y,Z,HeightAboveGround,ClusterID",
     "keep_unspecified": False,
     "quote_header": False}
  ]
}

PIPE_DBSCAN = PIPES / "pipeline_dbscan.json"
PIPE_DBSCAN.write_text(json.dumps(pipeline_dbscan, indent=2))
print("✅ Escrito:", PIPE_DBSCAN)

✅ Escrito: /Users/cecilialedesma/Library/Mobile Documents/com~apple~CloudDocs/projects_2025/lidar_vegetation_classification/notebooks/pipelines/pipeline_dbscan.json


## `pipeline_dbscan.json` — DBSCAN (con ruido)

**Entrada:** `data/processed/veg_gt2m.las`  
**Proceso:** `filters.dbscan` con `eps` y `min_points` definidos más arriba  
**Salida:**  
- `data/processed/veg_gt2m_dbscan.las`  
- `data/processed/veg_gt2m_dbscan.csv` (`X,Y,Z,HeightAboveGround,ClusterID`)

> **Incluye ruido**: los puntos no asignados quedan con `ClusterID = -1`.
  Luego los limpiamos en pandas (e.g. en `05_exploration.ipynb` o `06_end_to_end.ipynb`).

In [12]:
# === Celda 5: cluster_dbscan.json (DBSCAN + filtrar ruido en el pipeline) ===
# Variante que además remueve ruido (ClusterID = -1) dentro del pipeline.
cluster_dbscan = {
  "pipeline": [
    str(LAS_VEG),
    {"type": "filters.dbscan",
     "min_points": DBSCAN_MINPTS,
     "eps": DBSCAN_EPS,
     "dimensions": "X,Y,Z"},
    {"type": "filters.range",
     "limits": "ClusterID![-1]"},
    {"type": "writers.las",
     "filename": str(PROC / "veg_gt2m_dbscan_clean.las"),
     "minor_version": 4, "dataformat_id": 6,
     "extra_dims": "all"},
    {"type": "writers.text",
     "filename": str(CSV_DB_CLEAN),
     "format": "csv",
     "order": "X,Y,Z,HeightAboveGround,ClusterID",
     "keep_unspecified": False,
     "quote_header": False}
  ]
}

PIPE_CLUSTER_DBSCAN = PIPES / "cluster_dbscan.json"
PIPE_CLUSTER_DBSCAN.write_text(json.dumps(cluster_dbscan, indent=2))
print("✅ Escrito:", PIPE_CLUSTER_DBSCAN)

✅ Escrito: /Users/cecilialedesma/Library/Mobile Documents/com~apple~CloudDocs/projects_2025/lidar_vegetation_classification/notebooks/pipelines/cluster_dbscan.json


## `cluster_dbscan.json` — DBSCAN + quitar ruido en el pipeline

Variante de DBSCAN que **filtra el ruido dentro del JSON** usando `filters.range` con `ClusterID![-1]`.

**Salidas:**  
- `data/processed/veg_gt2m_dbscan_clean.las`  
- `data/processed/veg_gt2m_dbscan_clean.csv`

> Útil si preferís obtener directamente datos sin ruido desde PDAL (en lugar de limpiar en pandas).

In [13]:
# === Celda 6: (Opcional) variantes ground+HAG por si deseás dejar más JSON a mano ===
pipeline_ground_hag_pmf = {
  "pipeline": [
    {"type": "readers.las", "filename": str(RAW)},
    {"type": "filters.pmf",
     "max_window_size": 18, "slope": 0.15,
     "initial_distance": 0.5, "cell_size": 1.0},
    {"type": "filters.hag_delaunay"},
    {"type": "writers.las",
     "filename": str(PROC / "sample_hag_pmf.las"),
     "minor_version": 4, "dataformat_id": 6,
     "extra_dims": "all"}
  ]
}

for name, obj in [
    ("pipeline_ground_hag_pmf.json", pipeline_ground_hag_pmf),
]:
    (PIPES / name).write_text(json.dumps(obj, indent=2))
    print("✅ Escrito:", PIPES / name)

✅ Escrito: /Users/cecilialedesma/Library/Mobile Documents/com~apple~CloudDocs/projects_2025/lidar_vegetation_classification/notebooks/pipelines/pipeline_ground_hag_pmf.json


## Cómo ejecutar los pipelines (manual/CLI)

Cuando quieras correrlos (desde terminal):

```bash
pdal pipeline pipelines/pipeline_make_hag.json
pdal pipeline pipelines/pipeline_filter_veg.json
pdal pipeline pipelines/pipeline_dbscan.json       # o bien:
# pdal pipeline pipelines/cluster_dbscan.json

In [14]:
# === Celda 7: Recordatorio de ejecución con PDAL (manual) ===
print("""
Ejecutá en orden (desde la terminal o con subprocess si querés):
1) pdal pipeline pipelines/pipeline_make_hag.json
2) pdal pipeline pipelines/pipeline_filter_veg.json
3) pdal pipeline pipelines/pipeline_dbscan.json       (o)   pdal pipeline pipelines/cluster_dbscan.json
""")


Ejecutá en orden (desde la terminal o con subprocess si querés):
1) pdal pipeline pipelines/pipeline_make_hag.json
2) pdal pipeline pipelines/pipeline_filter_veg.json
3) pdal pipeline pipelines/pipeline_dbscan.json       (o)   pdal pipeline pipelines/cluster_dbscan.json



---
