# Quick local verification (conda)

This notebook contains a short conda-friendly verification workflow for the sweep-1 changes.

Recommended shell steps (run in a terminal before using this notebook):

1. Pull latest main:
   git checkout main
   git pull origin main

2. Create & activate a conda environment (example):
   conda create -n dea-env python=3.9 -y
   conda activate dea-env

3. Install key geospatial deps from conda-forge (recommended) and then the remaining pip reqs:
   conda install -c conda-forge geopandas rasterio xarray numpy shapely fiona imageio pyyaml -y
   pip install -r requirements.txt

4. Open this notebook:
   jupyter lab  # or jupyter notebook

Notes:
- The DEA fetching backend is intentionally a template and will raise NotImplementedError when called. The notebook checks for that expected behaviour.
- Running processing for NSW + QLD end-to-end requires implementing the fetch backend (datacube or STAC) or using the GEE export route.


In [None]:
# Verify Python environment and import core libraries
import sys
from pathlib import Path
print('Python executable:', sys.executable)
print('Working dir:', Path.cwd())
print('Python version:', sys.version.split()[0])


## Create AOI GeoJSONs (run as a shell command or from this notebook)

This runs the repository script to fetch Australian state boundaries and produce data/nsw.geojson and data/qld.geojson.
Run the cell below to execute the script. (It will try to download GADM shapefiles; ensure you have network access.)

In [None]:
import subprocess
import sys
from pathlib import Path

repo_root = Path.cwd()
script = repo_root / 'scripts' / 'fetch_australian_state_geojson.py'
if script.exists():
    print('Running fetch script...')
    # Run the script in a subprocess so the conda environment and sys.path remain clean
    cmd = [sys.executable, str(script), '--out-dir', str(repo_root / 'data')]
    res = subprocess.run(cmd, capture_output=True, text=True)
    print('returncode:', res.returncode)
    print(res.stdout)
    if res.stderr:
        print('stderr:\n', res.stderr)
else:
    print('fetch_australian_state_geojson.py not found at', script)


## Unit test converted to notebook cells

This runs the same simple reclassification test that was in tests/test_dea_processor.py.
It also demonstrates importing the processing module from src/ (the notebook ensures src/ is on sys.path).


In [None]:
# Ensure the 'src' directory is importable
import sys
from pathlib import Path
repo_root = Path.cwd()
src_dir = repo_root / 'src'
if str(src_dir) not in sys.path:
    sys.path.insert(0, str(src_dir))
print('Added to sys.path:', src_dir)

import numpy as np
try:
    from aus_land_clearing.dea_processor import reclassify_array
except Exception as e:
    raise ImportError('Failed to import dea_processor. Check that src/aus_land_clearing/dea_processor.py exists and src is on sys.path') from e

# Test data
arr = np.array([[10, 20, 30], [30, 50, 60]])
classes_map = {10: 0, 20: 2, 30: 1, 50: 2}
out = reclassify_array(arr, classes_map, default=0)
assert out.shape == arr.shape, 'Shape mismatch'
assert out[0,0] == 0
assert out[0,1] == 2
assert out[0,2] == 1
assert out[1,1] == 2
print('Reclassification test passed — reclassify_array behaves as expected')


## Check the fetch backend behaviour (expected NotImplementedError)

We attempt to call the template fetch_dea_raster_for_year for 1988; the function is a NotImplementedError stub in the sweep-1 codebase. This cell verifies that the pipeline is wired and returns the expected exception so we know where to implement the backend next.

In [None]:
from aus_land_clearing.dea_processor import fetch_dea_raster_for_year, load_config, load_aoi
cfg = load_config('config.yaml')
aoi_path = cfg['dea_profile']['aoi_paths'].get('nsw')
if aoi_path is None:
    print('No AOI path for nsw in config.yaml — ensure config.yaml contains dea_profile.aoi_paths.nsw')
else:
    try:
        aoi_gdf = load_aoi(aoi_path)
    except Exception as e:
        print('Could not load AOI file (expected if not created).', e)
        aoi_gdf = None

    try:
        print('Attempting to fetch DEA raster for 1988 (this should raise NotImplementedError until a backend is implemented)')
        arr, profile = fetch_dea_raster_for_year(1988, cfg['dea_profile']['product_id'], aoi_gdf)
        print('fetch_dea_raster_for_year returned array with shape', arr.shape)
    except NotImplementedError as nie:
        print('Expected behavior: fetch backend is a template and is not implemented yet.')
        print('Exception message:', nie)
    except Exception as e:
        print('fetch_dea_raster_for_year raised an unexpected exception:', type(e).__name__, e)


## (Optional) Run the processing driver in dry-run mode

The scripts/run_dea_processing.py driver will iterate the years and call process_year. Since the backend is a template, running it will surface the NotImplementedError quickly. You can run the driver from the terminal:

```bash
python scripts/run_dea_processing.py --config config.yaml --states nsw qld
```
If you want a notebook-driven invocation, run the following (uncomment to execute).

In [None]:
# Example: call the driver (commented out by default)
## import subprocess, sys
## subprocess.run([sys.executable, 'scripts/run_dea_processing.py', '--config', 'config.yaml', '--states', 'nsw', 'qld'])
print('Driver run example is shown above (commented out). Run from terminal to see the pipeline iterate years.')


## Next steps after this verification
- Implement a fetch backend (preferred: Open Data Cube datacube API if you have DEA indexed). I can add an initial datacube-based implementation if you want — tell me whether your environment uses ODC and which version.
- Or implement a STAC/odc-stac client to stream the DEA annual landcover assets.
- For large runs across NSW+QLD use tiling / dask / chunking and run on a machine with sufficient RAM / disk or use GEE exports.
If you want, I can now convert this notebook to a lightweight CI job that runs the reclassification test on each push (no large downloads) — that helps catch regressions quickly.