# 02_feature_engineering
- Goal: review slope, road/power distances, and landcover distributions to pick thresholds.
- Base grid: `data/interim/irradiance_reproj.tif` (~4.4 km), features aligned.
- Env: `ai_renewable` (geopandas/rasterio/matplotlib).

Step note: import libs and plotting style.
Inputs: none.
Outputs: plotting style setup.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

plt.style.use("ggplot")  # simple style
plt.rcParams["figure.figsize"] = (8, 5)

Step note: add project root to sys.path to import src.* modules.
Inputs: none (infer from cwd).
Outputs: updated sys.path.

In [None]:
from pathlib import Path
import sys

ROOT = Path.cwd().resolve()
if not (ROOT / "src").exists():
    ROOT = ROOT.parent  # if src missing, move up one level
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))
print("Project root:", ROOT)

Step note: load aligned slope/distance/landcover rasters.
Inputs: `data/interim/slope_resampled_to_irradiance.tif`, `dist_roads.tif`, `dist_grid.tif`, `landcover_resampled_to_irradiance.tif`.
Outputs: masked arrays (numpy.ma) and profiles for plotting/QA.

In [None]:
from src.features import (
    load_dist_grid,
    load_dist_roads,
    load_landcover,
    load_slope,
)

slope, slope_profile = load_slope()
dist_roads, dist_roads_profile = load_dist_roads()
dist_grid, dist_grid_profile = load_dist_grid()
landcover, landcover_profile = load_landcover()

Step note: simple histogram helper.
Inputs: masked array, title, labels.
Outputs: one histogram.

In [None]:
def plot_hist(masked_arr, title, bins=20, xlabel=None):
    data = np.ma.compressed(masked_arr)  # drop masked values
    plt.hist(data, bins=bins, color="steelblue", alpha=0.8)
    plt.title(title)
    plt.xlabel(xlabel or title)
    plt.ylabel("Count")
    plt.show()

Step note: inspect slope and distance distributions for threshold intuition.
Inputs: `slope`, `dist_roads`, `dist_grid` masked arrays.
Outputs: three histograms.

In [None]:
plot_hist(slope, "Slope (degrees)", bins=20, xlabel="Degrees")
plot_hist(dist_roads, "Road Distance (m)", bins=20, xlabel="Meters")
plot_hist(dist_grid, "Grid Distance (m)", bins=20, xlabel="Meters")

Step note: landcover class counts to aid whitelist selection.
Inputs: `landcover` masked array.
Outputs: class values with counts.

In [None]:
vals, counts = np.unique(np.ma.compressed(landcover), return_counts=True)
print("Landcover class counts:")
for v, c in zip(vals, counts):
    print(f"class {int(v)}: {c}")