# Configuration and Setup

## Overview

Before running geospatial analyses in PyGILE-Plus, you need to configure various parameters for your specific environment and data. This notebook helps you generate a JSON configuration file that will be used across all analysis workflows.

## What You Will Configure

This notebook walks you through setting up data paths for your DEM and output directories, tool paths for SAGA, GRASS, OTB, and WhiteboxTools executables, processing parameters like thresholds and algorithm settings, and visualization settings for map styles and output formats.

## Why Use JSON Configuration?

Using a JSON configuration file offers several advantages. You can save your settings once and reuse them across multiple analyses without reconfiguring each time. Sharing configurations with colleagues becomes straightforward, making your work reproducible. When you need to adjust parameters, you simply edit the JSON file rather than hunting through code. Finally, the configuration file serves as clear documentation of exactly what parameters were used in your analysis.

## DEM Acquisition Options

You have two options for obtaining a DEM:

**Option A**: Use a local DEM file (skip to next section)

**Option B**: Download DEM from cloud sources using STAC (run the cell below)

In [2]:
# Option B: Download DEM from Microsoft Planetary Computer
# Set use_cloud_dem = True to download, False to use local file

use_cloud_dem = True  # Change to True to download from cloud

if use_cloud_dem:
    import pystac_client
    import planetary_computer
    import rasterio
    from rasterio.merge import merge
    from rasterio.mask import mask
    from shapely.geometry import box
    import os
    
    # Define your study area bounding box
    # Format: [min_longitude, min_latitude, max_longitude, max_latitude]
    bbox = [80.5, 29.2, 80.7, 29.4]  # Example: Nepal region
    
    print(f"Downloading DEM for bbox: {bbox}")
    
    # Connect to Microsoft Planetary Computer
    catalog = pystac_client.Client.open(
        "https://planetarycomputer.microsoft.com/api/stac/v1",
        modifier=planetary_computer.sign_inplace
    )
    
    # Search for Copernicus DEM 30m
    search = catalog.search(
        collections=["cop-dem-glo-30"],
        bbox=bbox
    )
    
    items = list(search.items())
    print(f"Found {len(items)} DEM tiles")
    
    # Download and merge tiles if multiple
    datasets = []
    for item in items:
        signed_href = item.assets["data"].href
        datasets.append(rasterio.open(signed_href))
    
    # Merge tiles
    mosaic, mosaic_transform = merge(datasets)
    
    # Close datasets
    for ds in datasets:
        ds.close()
    
    # Create clipping geometry
    clip_geom = box(*bbox)
    
    # Save clipped DEM
    output_dem = "clipped_dem.tif"
    
    # Create output with proper metadata
    with rasterio.open(
        output_dem,
        'w',
        driver='GTiff',
        height=mosaic.shape[1],
        width=mosaic.shape[2],
        count=1,
        dtype=mosaic.dtype,
        crs='EPSG:4326',
        transform=mosaic_transform,
        compress='lzw'
    ) as dst:
        dst.write(mosaic[0], 1)
    
    print(f"DEM downloaded and saved to: {output_dem}")
    dem_path = output_dem
else:
    print("Using local DEM file option")

Downloading DEM for bbox: [80.5, 29.2, 80.7, 29.4]
Found 1 DEM tiles
DEM downloaded and saved to: clipped_dem.tif


## Step 1: Define Your Configuration

Modify the variables below to match your environment and requirements. The default values are set for a standard PyGILE-Plus Docker installation.

In [3]:
import json

# Path to your Digital Elevation Model
# If you used cloud download above, use clipped_dem.tif
# Otherwise, provide path to your local DEM file

if use_cloud_dem:
    dem_path = "clipped_dem.tif"
else:
    dem_path = "/workspace/Watershed/Orig_Dem/Orig_Dem.tif"  # Change this to your DEM path

# Temporary directory for intermediate processing files
temp_dir = "temp_watershed"

# Output directory for final results
output_dir = "results"

# Target coordinate reference system (CRS) for outputs
target_crs = "EPSG:4326"

print(f"DEM path configured: {dem_path}")

DEM path configured: clipped_dem.tif


### Understanding Data Paths

The DEM path points to your elevation raster file, which can be in GeoTIFF, IMG, or other GDAL-compatible formats. The temp directory stores intermediate files during processing and can be safely deleted after your analysis completes. The output directory is where all final results including maps, statistics, and shapefiles are saved. The target CRS defines the coordinate reference system for your outputs, and data will be reprojected to this system if needed.

In [4]:
# Tool Paths (Default PyGILE-Plus Installation)

# Conda environment path
conda_env_path = "/opt/conda/envs/pygile"

# SAGA GIS command-line tool
saga_cmd = "/opt/saga/bin/saga_cmd"
saga_lib = "/opt/saga/lib/saga"

# GRASS GIS binary directory
grass_bin = "/opt/grass/grass84/bin"

# Orfeo Toolbox (OTB) binary directory
otb_bin = "/opt/otb/bin"

# WhiteboxTools executable
whitebox_tools = "/opt/conda/envs/pygile/bin/whitebox_tools"

print("Tool paths configured")

Tool paths configured


### Tool Paths Explained

These paths point to the executables for each GIS tool in PyGILE-Plus. If you're using the standard Docker container, these defaults should work. For custom installations, verify paths using:

```bash
which saga_cmd
which whitebox_tools
```

In [5]:
# Hydrological Processing Parameters

# Flow accumulation threshold for stream network extraction
# Higher values = fewer, larger streams
# Lower values = more detailed stream network
flow_accumulation_threshold = 500

# Distance multiplier for snapping pour points to streams (in cells)
# Larger values allow snapping from further away
snap_distance_multiplier = 20

# Fill depressions in DEM before processing?
# Recommended: True (prevents false sinks in watersheds)
depression_filling = True

# Enable verbose output during processing?
verbose_mode = False

print("Hydrological parameters configured")

Hydrological parameters configured


### Hydrological Parameters Guide

The flow accumulation threshold controls how detailed your stream network will be. For small watersheds, use values between 100 and 500. Medium-sized watersheds work well with thresholds of 500 to 2000, while large watersheds typically need values above 2000.

Depression filling is essential for realistic flow routing. Natural DEMs often contain small depressions that act as artificial sinks where water appears to disappear. Filling these ensures continuous flow paths.

The snap distance parameter helps ensure your pour points connect to actual stream channels, compensating for any inaccuracies in DEM resolution.

In [6]:
# Map Visualization Settings

# Default zoom level for interactive maps
default_zoom = 12

# Boundary and point colors
boundary_color = "red"
pour_point_color = "red"
pour_point_size = 100
boundary_weight = 2

# Base map tile layer (OpenStreetMap)
tile_layer = "https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png"
tile_attribution = "Â© OpenStreetMap contributors"

print("Map visualization settings configured")

Map visualization settings configured


### Visualization Options

These settings control how your results appear in interactive Folium maps. The zoom level determines the initial view, with values of 8 to 10 working well for large regions, 12 to 14 for watersheds, and 15 or higher for detailed views.

Colors can be specified using any valid CSS color name like red or blue, or hex codes such as #FF0000. The tile layer setting allows you to use different base maps including OpenStreetMap, Esri imagery, CartoDB, or custom tile servers.

In [7]:
# Environment Settings

# GDAL data and projection library paths
gdal_data_path = "/opt/conda/envs/pygile/share/gdal"
proj_lib_path = "/opt/conda/envs/pygile/share/proj"

# Locale setting (important for consistent number formatting)
locale = "C"

print("Environment settings configured")

Environment settings configured


### Environment Variables

These paths are required for GDAL and PROJ to find their data files (coordinate systems, datum transformations, etc.). The locale setting ensures consistent behavior across different systems.

In [8]:
# Terrain Analysis Parameters

# Slope calculation settings
slope_units = "degrees"  # Options: "degrees", "radians", "percent"
z_factor = 1.0  # Vertical exaggeration (1.0 = no exaggeration)
slope_method = "ta_morphometry"  # SAGA library for slope calculation
slope_unit_saga = 1  # SAGA slope unit: 0=radians, 1=degrees, 2=percent
format_grass = "degrees"  # GRASS slope format

# Edge detection parameters (for OTB)
edge_filter_type = "sobel"  # Edge detection algorithm
edge_channel = 1  # Band/channel to process
edge_x_radius = 1  # Kernel size in X direction
edge_y_radius = 1  # Kernel size in Y direction

# Hillshade parameters
hillshade_azimuth = 315  # Sun azimuth angle (0-360 degrees)
hillshade_altitude = 45  # Sun altitude angle (0-90 degrees)
hillshade_scale = 1  # Scale factor for hillshade

print("Terrain analysis parameters configured")

Terrain analysis parameters configured


### Terrain Analysis Settings

Slope can be expressed in three different units. Degrees ranging from 0 to 90 are the most intuitive for human interpretation. Percent values, commonly used in engineering applications, can exceed 100 for steep terrain. Radians are primarily useful for mathematical calculations.

The z-factor controls vertical exaggeration in visualizations. A value of 1.0 means no exaggeration, while higher values help emphasize subtle terrain features that might otherwise be hard to see.

Hillshade settings control the simulated illumination of your terrain. An azimuth of 315 degrees places the light source in the northwest, which is the standard convention. An altitude of 45 degrees creates balanced shadows that reveal terrain structure without being too harsh or too flat.

In [9]:
# Output Visualization Settings

# Figure dimensions and quality
figure_width = 15  # Width in inches
figure_height = 10  # Height in inches
dpi = 300  # Resolution (dots per inch) - 300 is publication quality

# Color schemes for different analyses
elevation_colormap = "terrain"  # Good for elevation: terrain, gist_earth, viridis
slope_colormap = "Reds"  # Good for slope: Reds, YlOrRd, hot
edge_colormap = "viridis"  # Good for edges: viridis, plasma, inferno
hillshade_colormap = "gray"  # Grayscale for hillshade

# Slope visualization limits
slope_min_value = 0  # Minimum slope to display
slope_max_absolute = 45  # Maximum slope (degrees) to display
slope_max_percentile = 95  # Use 95th percentile if less than max_absolute

print("Visualization settings configured")

Visualization settings configured


### Choosing Color Maps

Different types of analysis benefit from different color schemes. For elevation data, color maps like terrain, gist_earth, or elevation provide natural earth tones that intuitively represent topography.

When displaying slope or intensity values, sequential color ramps such as Reds, YlOrRd, or hot work well because they show a clear progression from low to high values.

For data that diverges around a central value like zero, consider using RdBu or coolwarm, which show both positive and negative values distinctly. Categorical data with distinct classes is best displayed using tab10 or Set3.

The DPI setting controls output resolution. For screen viewing, 72 to 150 DPI is sufficient. Web graphics typically use around 150 DPI, while print and publication quality requires 300 DPI or higher.

## Step 2: Generate Configuration JSON

Now we'll compile all settings into a single configuration dictionary and save it as a JSON file.

In [10]:
# Create configuration dictionary with all parameters
config = {
    # Data paths
    "dem_path": dem_path,
    "temp_dir": temp_dir,
    "output_dir": output_dir,
    "target_crs": target_crs,
    
    # Tool paths
    "conda_env_path": conda_env_path,
    "saga_cmd": saga_cmd,
    "saga_lib": saga_lib,
    "grass_bin": grass_bin,
    "otb_bin": otb_bin,
    "whitebox_tools": whitebox_tools,
    
    # Hydrological processing
    "flow_accumulation_threshold": flow_accumulation_threshold,
    "snap_distance_multiplier": snap_distance_multiplier,
    "depression_filling": depression_filling,
    "verbose_mode": verbose_mode,
    
    # Map visualization
    "default_zoom": default_zoom,
    "boundary_color": boundary_color,
    "pour_point_color": pour_point_color,
    "pour_point_size": pour_point_size,
    "boundary_weight": boundary_weight,
    "tile_layer": tile_layer,
    "tile_attribution": tile_attribution,
    
    # Environment
    "gdal_data_path": gdal_data_path,
    "proj_lib_path": proj_lib_path,
    "locale": locale,
    
    # Terrain analysis
    "slope_units": slope_units,
    "z_factor": z_factor,
    "slope_method": slope_method,
    "slope_unit_saga": slope_unit_saga,
    "format_grass": format_grass,
    "edge_filter_type": edge_filter_type,
    "edge_channel": edge_channel,
    "edge_x_radius": edge_x_radius,
    "edge_y_radius": edge_y_radius,
    "hillshade_azimuth": hillshade_azimuth,
    "hillshade_altitude": hillshade_altitude,
    "hillshade_scale": hillshade_scale,
    
    # Visualization
    "figure_width": figure_width,
    "figure_height": figure_height,
    "dpi": dpi,
    "elevation_colormap": elevation_colormap,
    "slope_colormap": slope_colormap,
    "edge_colormap": edge_colormap,
    "hillshade_colormap": hillshade_colormap,
    "slope_min_value": slope_min_value,
    "slope_max_absolute": slope_max_absolute,
    "slope_max_percentile": slope_max_percentile
}

# Save configuration to JSON file
config_filename = 'watershed_analysis.json'
with open(config_filename, 'w') as f:
    json.dump(config, f, indent=4)

print(f"Configuration saved to {config_filename}")
print(f"Configuration contains {len(config)} parameters")
print("You can now use this configuration file in other notebooks.")

Configuration saved to watershed_analysis.json
Configuration contains 46 parameters
You can now use this configuration file in other notebooks.


## Step 3: View Your Configuration

Let's display a formatted view of your configuration to verify everything is correct.

In [11]:
# Display configuration summary
print("Configuration Summary")
print()

sections = {
    "Data Paths": ["dem_path", "temp_dir", "output_dir", "target_crs"],
    "Tool Paths": ["saga_cmd", "grass_bin", "otb_bin", "whitebox_tools"],
    "Hydrological": ["flow_accumulation_threshold", "snap_distance_multiplier", 
                     "depression_filling", "minimum_threshold"],
    "Terrain Analysis": ["slope_units", "z_factor", "hillshade_azimuth", "hillshade_altitude"],
    "Visualization": ["figure_width", "figure_height", "dpi", "elevation_colormap"]
}

for section_name, keys in sections.items():
    print(f"{section_name}:")
    for key in keys:
        if key in config:
            print(f"  {key}: {config[key]}")
    print()

Configuration Summary

Data Paths:
  dem_path: clipped_dem.tif
  temp_dir: temp_watershed
  output_dir: results
  target_crs: EPSG:4326

Tool Paths:
  saga_cmd: /opt/saga/bin/saga_cmd
  grass_bin: /opt/grass/grass84/bin
  otb_bin: /opt/otb/bin
  whitebox_tools: /opt/conda/envs/pygile/bin/whitebox_tools

Hydrological:
  flow_accumulation_threshold: 500
  snap_distance_multiplier: 20
  depression_filling: True

Terrain Analysis:
  slope_units: degrees
  z_factor: 1.0
  hillshade_azimuth: 315
  hillshade_altitude: 45

Visualization:
  figure_width: 15
  figure_height: 10
  dpi: 300
  elevation_colormap: terrain



## Step 4: Test Configuration Loading

Let's verify that the configuration file can be loaded correctly in other scripts.

In [12]:
# Test loading the configuration
with open(config_filename, 'r') as f:
    loaded_config = json.load(f)

print(f"Successfully loaded {config_filename}")
print(f"Contains {len(loaded_config)} parameters")
print(f"DEM path: {loaded_config['dem_path']}")
print(f"Output directory: {loaded_config['output_dir']}")
print(f"Slope units: {loaded_config['slope_units']}")
print("Configuration is ready to use.")

Successfully loaded watershed_analysis.json
Contains 46 parameters
DEM path: clipped_dem.tif
Output directory: results
Slope units: degrees
Configuration is ready to use.


## Using Your Configuration

In subsequent notebooks and scripts, you can load this configuration with:

```python
import json

with open('watershed_analysis.json', 'r') as f:
    config = json.load(f)

# Access any parameter
dem_path = config['dem_path']
slope_units = config['slope_units']
```

## Updating Configuration

To modify your configuration, edit the variables in the cells above and re-run all cells to regenerate the JSON file. Your changes will be reflected in all subsequent analyses.

## Next Steps

Now that you have your configuration set up, proceed to the next notebook in the series. The Python Geospatial Ecosystem notebook will help you verify your environment is working correctly. After that, the Watershed Delineation notebook will use your DEM to create a watershed boundary, which will then be used for terrain analysis tutorials with SAGA GIS, GRASS GIS, OTB, and WhiteboxTools.