# 5. Basin Attribute Extraction 

```{figure} img/nalcms_VI.gif
---
width: 600px
---
North American Land Change Monitoring System {cite}`latifovic2010north` rasters covering Vancouver Island for 2010, 2015, and 2020. 
```

The final step is to capture geospatial information describing the soil, land cover, and terrain of each basin using the polygons we developed in the previous notebook.  First we need to get the data and trim it to the Vancouver Island polygon.

In [None]:
# isolate the GLHYMPS data
import os
import time
import multiprocessing as mp
from utilities import retrieve_raster, clip_raster_to_basin
import geopandas as gpd
import numpy as np
import pandas as pd
from shapely.geometry import box
import xarray as xr
import rioxarray as rxr

from numba import jit

from scipy.stats.mstats import gmean


## Land Cover

### Download and Clip NALCMS Data

North American Coverage of the NALCMS can be downloaded from [North American Land Change Monitoring System (NALCMS)](http://www.cec.org/north-american-land-change-monitoring-system/).  

Download the files you want to work with from the link above, and use the steps below to crop the dataset to the region of interest (Vancouver Island).

Note there are NALCMS datasets for 2010, 2015, and 2020 but we only process the 2010 dataset in this example.

In [None]:
base_dir = os.path.dirname(os.getcwd())
# set the path to the downloaded NALCMS file
nalcms_fpath = '/home/danbot/Documents/code/23/bcub/input_data/NALCMS/NA_NALCMS_2010_v2_land_cover_30m.tif'
nalcms_raster, nalcms_crs, nalcms_affine = retrieve_raster(nalcms_fpath)
if nalcms_crs == None:
    nalcms_crs = nalcms_raster.rio.crs.wkt

Import the region polygon

In [None]:
# set the path to the polygon mask for clipping the NALCMS raster
year = 2010
region = 'Vancouver_Island'
polygon_path = os.path.join(os.getcwd(), f'data/region_polygons/{region}.geojson')
region_polygon = gpd.read_file(polygon_path)

In [None]:
# this should be the same path as Notebook 2
dem_dir = os.path.join(base_dir, 'notebooks/data/processed_dem/')
dem_fpath = os.path.join(dem_dir, f'{region}_3005.tif')
region_dem, dem_crs, dem_affine = retrieve_raster(dem_fpath)

We need the polygon mask used to clip the raster to have the same CRS as the raster.

```{note}
The mask has to be saved as a shp file, geojson doesn't work for some reason with gdalwarp.
```

In [None]:
reproj_mask_path = polygon_path.replace('.geojson', '_clipping_mask.shp')
if not os.path.exists(reproj_mask_path):
    mask = region_polygon.to_crs(nalcms_raster.rio.crs.wkt)
    mask.to_file(reproj_mask_path)

Set the DEM path which was created in notebook 2.  Use the original DEM, not the (pit/depression) filled DEM.

In [None]:
# create a folder for the output geospatial layers
output_folder = os.path.join(os.getcwd(), 'data/geospatial_layers')
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

In [None]:
clipped_path = os.path.join(output_folder, f'NALCMS_{year}_{region}.tif')
command = f"gdalwarp -s_srs '{nalcms_crs}' -cutline {reproj_mask_path} -crop_to_cutline -multi -of gtiff {nalcms_fpath} {clipped_path} -wo NUM_THREADS=ALL_CPUS"
if not os.path.exists(clipped_path):
    os.system(command)

## Soil Permeability and Porosity

### Download and Clip GLHYMPS Data

```{figure} img/perm_porosity.png
---
width: 600px
---
The GLobal HYdrogeology MaPS (GLHYMPS) {cite}`SP2_TTJNIU_2018` is global coverage of permeability and porosity in vector format.  

Download the file from [here](https://aquaknow.jrc.ec.europa.eu/en/content/global-hydrogeology-maps-glhymps-permeability-and-porosity), and use the steps below to clip the data to the region of interest.  

Note that this file is large, and it's necessary to use the `mask` feature when opening the file using geopandas. Expect the opening and masking to take several minutes.

In [None]:
# glhymps is in EPSG 4326, ensure the polygon is the same CRS
region_polygon = region_polygon.to_crs(4326)

In [None]:
glhymps_path = 'data/geospatial_layers/GLHYMPS_Vancouver_Island.gpkg'
gldf = gpd.read_file(glhymps_path, mask=region_polygon)

## Climate Data

### Accessing and Registering for NASA DAYMET Data

NASA's DAYMET provides daily surface weather and climatological summaries for North America.  To access and automate the download of DAYMET data, follow these steps:

1. **Register**: Before you can download data, you need to [register with ORNL DAAC](https://urs.earthdata.nasa.gov/). 
   
2. **Access the Data**: Once registered, navigate to the [DAYMET Data Collection page](https://daymet.ornl.gov/) where you can explore available data sets.
   
3. **Automated Download**: For automated data downloads, you can use the DAYMET web services. Detailed instructions and examples for using these services can be found in the [DAYMET documentation](https://daymet.ornl.gov/web_services.html).

A listing of all available daily DAYMET spatial time series can be found [here](https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/2129/catalog.html).

Available climate variables are as follows:

| Variable | Description (units) |
|---|---|
| tmax | Daily maximum 2-meter air temperature (°C) |
| tmin | Daily minimum 2-meter air temperature (°C) |
| prcp | Daily total precipitation (mm/day) |
| srad | Incident shortwave radiation flux density ($W/m^2$) |
| vp | Water vapor pressure (Pa) |
| swe | Snow water equivalent ($kg/m^2$) |
| dayl | Duration of the daylight period (seconds/day) |


In [None]:
# import the daymet tile index
tile_fpath = os.path.join(base_dir, 'notebooks/data/daymet_data/Daymet_v4_Tiles.geojson')
dm_tiles = gpd.read_file(tile_fpath)

# get the intersection with the region polygon
tiles_df = dm_tiles.sjoin(region_polygon)
tiles_df = tiles_df.sort_values(by=['Latitude (Min)', 'Longitude (Min)'])
tile_rows = tiles_df.groupby('Latitude (Min)')['TileID'].apply(list).tolist()
tile_ids = tiles_df['TileID'].values

In [None]:
daymet_url_base = 'https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/2130/catalog.html?'
daymet_params = ['tmax', 'tmin', 'prcp', 'srad', 'swe', 'vp']
years = list(range(1980,2023))
daymet_save_path ='/media/danbot/Samsung_T51/large_sample_hydrology/common_data/DAYMET'

base_command = 'wget -q --show-progress --progress=bar:force --limit-rate=3m https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/2129/tiles/'

for yr in years:
    batch_commands = []
    for param in daymet_params:
        for tile in tile_ids:
            file = f'{daymet_save_path}/{tile}_{yr}_{param}.nc'
            if not os.path.exists(file):
                cmd = base_command + f'{yr}/{tile}_{yr}/{param}.nc -O {file}'
                batch_commands.append(cmd)

# download the files in parallel
with mp.Pool() as pl:
    pl.map(os.system, batch_commands)

### Process NASA DAYMET Data

[Installing CDO](https://code.mpimet.mpg.de/projects/cdo/wiki) is a way to manage the climate data processing where we need to process spatial timeseries for several parameters.

To process the NASA DAYMET data spanning from 1980 to 2022 for specific parameters ('tmax', 'tmin', 'prcp', 'srad', 'swe', 'vp') within a set of polygons, we will use the following approach:

1. **Data Preparation**: Begin by organizing the .nc (NetCDF) files in an xarray dataset.

2. **Polygon Masking**: Convert your polygons into a raster format that matches the spatial resolution and extent of the DAYMET data. Each pixel within the polygon should have a value of 1 (or true), while those outside the polygon should be 0 (or false).

3. **Data Extraction**: For each year and parameter:  
  a. Load the .nc file using a spatial data processing library like GDAL in Python 
  b. Multiply the DAYMET raster data by the polygon mask raster. This operation will effectively 'zero out' the data outside of your polygons, leaving you with data values only within the desired regions.  
  c. Compute desired statistics for each masked region, such as mean, sum, max, or min, depending on your research objectives.

4. **Aggregation**: Once you've extracted the data for all parameters across the years, you can aggregate or analyze the time series data as per your needs, e.g., trend analysis, anomaly detection, or temporal summarization.
    

In [None]:
assert tiles_df.crs == region_polygon.crs
reproj_mask_path = polygon_path.replace('.geojson', '_daymet_mask.shp')
if not os.path.exists(reproj_mask_path):
    mask = region_polygon.to_crs(tiles_df.crs)
    mask.to_file(reproj_mask_path)

In [None]:
def merge_netcdf_data(fpath, param, region_mask):
    ds = rxr.open_rasterio(fpath, engine="rasterio", mask=region_mask.geometry[0])
    crs = ds.rio.crs.to_wkt()
    # region_polygon = gpd.read_file(reproj_mask_path).to_crs(crs)
    # ds = ds.rio.clip(region_polygon.geometry)
    ds = ds.to_dataset(name=param)    
    ds = ds.rename_dims({'band': 'dayofyear'}).rename_vars({'band': 'day'})
    return ds
    

For each daymet parameter:
* for each year, build a vrt from the tiled .nc files
* open the vrt as an xarray dataset and identify the time coordinate as dayofyear
* for each year, generate a summary statistic (sum for precip, max for swe, mean for remainder)
* concatenate the summarized annual values into a new xarray dataset where the time coordinate is now 'years'
* save the resulting dataset as an annual spatial time series in .nc format  
    * these new files will be masked with each basin polygon, and some index computed for each parameter

In [None]:
all_data = []
for param in daymet_params:
    param_folder = f'data/daymet_data/{param}'

    output_fpath = f'data/daymet_data/{param}_{min(years)}_to_{max(years)}.nc'
    if os.path.exists(output_fpath):
        continue
        
    region_mask = gpd.read_file(reproj_mask_path)
    
    if not os.path.exists(param_folder):
        os.mkdir(param_folder)
        
    data_arrays = []
    temp_files = []
    
    for year in years:
        print(f'   ...processing {year} {param}')
            
        tile_mosaic = []
        # Specify the pattern for your file paths.
        file_pattern = f'*_{year}_{param}.nc'
        vrt_fname = f'{year}_{param}.vrt'
        vrt_fpath = os.path.join(param_folder, vrt_fname)
        
        # assemble the mosaic
        cmd = f'gdalbuildvrt {vrt_fpath} {daymet_save_path}/{file_pattern}'
        if not os.path.exists(vrt_fpath):
            os.system(cmd)
        temp_files.append(vrt_fpath)

        # clipped_path = os.path.join(f'data/daymet_data', f'{year}_{param}_clipped.tif')
        data = merge_netcdf_data(vrt_fpath, param, region_mask)
        if param in ['prcp']:
            spatial_result = data.sum(dim='dayofyear')
        elif param == 'swe':
            spatial_result = data.max(dim='dayofyear')
        else:
            spatial_result = data.mean(dim='dayofyear')
        
        
        data_arrays.append(spatial_result[param])

    ts_dataset = xr.concat(data_arrays, dim=pd.Index(years, name='time'))
    ts_dataset.to_netcdf(output_fpath)
    print('')
    if os.path.exists(output_fpath):
        for f in temp_files:
            os.remove(f)

## Direct Attribute Retrieval

Here we load the basin polygon batch files produced in the previous notebook, and iterate through polygons to extract attributes as we go.  This method is not the most performant, but the details of the process are hopefully clear.

In [None]:
basins_folder = os.path.join(os.getcwd(), 'data/basins/')
file = 'Vancouver_Island_basins.gpkg'
basins_df = gpd.read_file(os.path.join(basins_folder, file))


**Table: Set of metadata and catchment attributes in the BCUB database derived from USGS 3DEP (DEM), NALCMS (land cover), and GLHYMPS (soil) datasets.**

| **Group**  | **Description (BCUB label)** | **Aggregation** | **Units**       |
|------------|------------------------------|-----------------|-----------------|
| Metadata   | Pour point (geom)            | -               | decimal deg.$^1$|
|            | Basin centroid point (centroid)| -             | decimal deg.    |
|            | Land Cover Flag (lulc_check) | -               | binary (0/1)    |
|------------|------------------------------|-----------------|-----------------|
| Terrain    | Drainage Area (drainage_area_km2)| at pour point | $km^2$          |
|            | Elevation (elevation_m)       | spatial mean    | $m$ above sea level|
|            | Terrain Slope (slope_deg)     | spatial mean    | $^\circ$ (degrees)|
|            | Terrain Aspect (aspect_deg)   | circular mean$^2$| $^\circ$ (degrees)|
|------------|------------------------------|-----------------|-----------------|
| Land Cover$^3$ | Cropland (land_use_crops_frac_<year>) | -     |                   |
|            | Forest (land_use_forest_frac_<year>)   | -      |                   |
|            | Grassland (land_grass_forest_frac_<year>)| -    |                   |
|            | Shrubs (land_use_shrubs_frac_<year>)     | spatial mean| $\%$ cover    |
|            | Snow & Ice (land_use_snow_ice_frac_<year>)| -  |                   |
|            | Urban (land_use_urban_frac_<year>)       | -      |                   |
|            | Water (land_use_water_frac_<year>)       | -      |                   |
|            | Wetland (land_use_wetland_frac_<year>)   | -      |                   |
|------------|------------------------------|-----------------|-----------------|
| Soil       | Permeability (permeability_logk_m2)      | geometric mean | $m^2$        |
|            | Porosity (porosity_frac)      | spatial mean    | $\%$ cover     |
|------------|------------------------------|-----------------|-----------------|
| Climate    | Precipitation (mean_precip_mm)    | sum | $mm$        |
|            | Shortwave Radiation (solar_rad_Wm2)   |  spatial mean | $W/m^2$    |
|            | Water Vapour Pressure (water_vap_Pa)   |  spatial mean | $Pa$    |
|            | Snow Water Equivalent (swe_kgm2) | spatial mean | $kg/m^2$    |
|            | Min Temperature (min_temp_C)  |  spatial mean | $C$    |
|            | Max Temperature (max_temp_C)  |  spatial mean | $C$    |

**Notes**:
1.  Geometries are formatted in the WSG84 coordinate reference system.
2.  Spatial aspect is expressed in degrees counter-clockwise from the east direction.
3.  The <year> suffix specifies the land cover dataset (2010, 2015, or 2020).
4.  Specifications on DAYMET data can be found [here](https://daac.ornl.gov/DAYMET/guides/Daymet_Daily_V4.html#:~:text=Daymet%20variables%20include%20the%20following,water%20equivalent%2C%20and%20day%20length.).
5.  All climate parameters are mean annual values over 1980-2022.


## Land Cover Data

Land use land cover (LULC) classes are grouped as in Arsenault (2020).

In [None]:
def check_lulc_sum(data):
    """
    Check if the sum of pct. land cover sums to 1.
    Return value is 1 - sum to correspond with 
    a more intuitive boolean flag, 
    i.e. data quality flags are 1 if the flag is raised,
    0 of no flag.
    """
    checksum = sum(list(data.values())) 
    lulc_check = 1-checksum
    if abs(lulc_check) >= 0.05:
        print(f'   ...checksum failed: {checksum:.3f}')   
    return lulc_check


def recategorize_lulc(data):    
    forest = ('Land_Use_Forest_frac', [1, 2, 3, 4, 5, 6])
    shrub = ('Land_Use_Shrubs_frac', [7, 8, 11])
    grass = ('Land_Use_Grass_frac', [9, 10, 12, 13, 16])
    wetland = ('Land_Use_Wetland_frac', [14])
    crop = ('Land_Use_Crops_frac', [15])
    urban = ('Land_Use_Urban_frac', [17])
    water = ('Land_Use_Water_frac', [18])
    snow_ice = ('Land_Use_Snow_Ice_frac', [19])
    lulc_dict = {}
    for label, p in [forest, shrub, grass, wetland, crop, urban, water, snow_ice]:
        prop_vals = round(sum([data[e] if e in data.keys() else 0.0 for e in p]), 2)
        lulc_dict[label] = prop_vals
    return lulc_dict
    

def get_value_proportions(data):
    # create a dictionary of land cover values by coverage proportion
    # assuming raster pixels are equally sized, we can keep the
    # raster in geographic coordinates and just count pixel ratios
    all_vals = data.data.flatten()
    vals = all_vals[~np.isnan(all_vals)]
    n_pts = len(vals)
    unique, counts = np.unique(vals, return_counts=True)    
    prop_dict = {k: 1.0*v/n_pts for k, v in zip(unique, counts)}

    # 15 represents cropland
    # if 15 in prop_dict.keys():
    #     if prop_dict[15] > 0.01:
    #         print(f'Land cover category 15 is found: {prop_dict[15]}%')
    #         print(prop_dict)
            
    prop_dict = recategorize_lulc(prop_dict)
    return prop_dict    


def process_lulc(i, basin_geom, nalcms_raster):
    # polygon = basin_polygon.to_crs(nalcms_crs)
    # assert polygon.crs == nalcms.rio.crs
    if nalcms_raster.rio.crs != basin_geom.crs:
        basin_geom = basin_geom.to_crs(nalcms_raster.rio.crs.to_wkt())
    raster_loaded, lu_raster_clipped = clip_raster_to_basin(basin_geom, nalcms_raster)
    # checksum verifies proportions sum to 1
    prop_dict = get_value_proportions(lu_raster_clipped)
    lulc_check = check_lulc_sum(prop_dict)
    prop_dict['lulc_check'] = lulc_check
    return pd.DataFrame(prop_dict, index=[i])

In [None]:
def check_and_repair_geometries(in_feature):

    # avoid changing original geodf
    in_feature = in_feature.copy(deep=True)    
        
    # drop any missing geometries
    in_feature = in_feature[~(in_feature.is_empty)]
    
    # Repair broken geometries
    for index, row in in_feature.iterrows(): # Looping over all polygons
        if row['geometry'].is_valid:
            next
        else:
            fix = make_valid(row['geometry'])
            try:
                in_feature.loc[[index],'geometry'] =  fix # issue with Poly > Multipolygon
            except ValueError:
                in_feature.loc[[index],'geometry'] =  in_feature.loc[[index], 'geometry'].buffer(0)
    return in_feature


In [None]:
def process_basin_elevation(clipped_raster):
    # evaluate masked raster data
    values = clipped_raster.data.flatten()
    mean_val = np.nanmean(values)
    median_val = np.nanmedian(values)
    min_val = np.nanmin(values)
    max_val = np.nanmax(values)
    return mean_val, median_val, min_val, max_val

In [None]:
def get_soil_properties(merged, col):
    # dissolve polygons by unique parameter values
    geometries = check_and_repair_geometries(merged)

    df = geometries[[col, 'geometry']].copy().dissolve(by=col, aggfunc='first')
    df[col] = df.index.values
    # re-sum all shape areas
    df['Shape_Area'] = df.geometry.area
    # calculuate area fractions of each unique parameter value
    df['area_frac'] = df['Shape_Area'] / df['Shape_Area'].sum()
    # check that the total area fraction = 1
    total = round(df['area_frac'].sum(), 1)
    sum_check = total == 1.0
    if not sum_check:
        print(f'    Area proportions do not sum to 1: {total:.2f}')
        if np.isnan(total):
            return np.nan
        elif total < 0.9:
            return np.nan
    
    # area_weighted_vals = df['area_frac'] * df[col]
    if 'Permeability' in col:
        # calculate geometric mean
        # here we change the sign (all permeability values are negative)
        # and add it back at the end by multiplying by -1 
        # otherwise the function tries to take the log of negative values
        return gmean(np.abs(df[col]), weights=df['area_frac']) * -1
    else:
        # calculate area-weighted arithmetic mean
        return (df['area_frac'] * df[col]).sum()
    

def process_glhymps(basin_geom, fpath):
    # import soil layer with polygon mask (both in 4326)
    basin_geom = basin_geom.to_crs(4326)
    # returns INTERSECTION
    gdf = gpd.read_file(fpath, mask=basin_geom)
    # now clip precisely to the basin polygon bounds
    merged = gpd.clip(gdf, mask=basin_geom)
    # now reproject to minimize spatial distortion
    merged = merged.to_crs(3005)
    return merged


In [None]:
@jit(nopython=True)
def process_slope_and_aspect(E, el_px, resolution, shape):
    # resolution = E.rio.resolution()
    # shape = E.rio.shape
    # note, distances are not meaningful in EPSG 4326
    # note, we can either do a costly reprojection of the dem
    # or just use the approximate resolution of 90x90m
    # dx, dy = 90, 90# resolution
    dx, dy = resolution
    # print(resolution)
    # print(asdfd)
    # dx, dy = 90, 90
    S, A = np.empty_like(E), np.empty_like(E)
    S[:] = np.nan # track slope (in degrees)
    A[:] = np.nan # track aspect (in degrees)
    # tot_p, tot_q = 0, 0
    for i, j in el_px:
        if (i == 0) | (j == 0) | (i == shape[0]) | (j == shape[1]):
            continue
            
        E_w = E[i-1:i+2, j-1:j+2]

        if E_w.shape != (3,3):
            continue

        a = E_w[0,0]
        b = E_w[1,0]
        c = E_w[2,0]
        d = E_w[0,1]
        f = E_w[2,1]
        g = E_w[0,2]
        h = E_w[1,2]
        # skip i and j because they're already used
        k = E_w[2,2]  

        all_vals = np.array([a, b, c, d, f, g, h, k])

        val_check = np.isfinite(all_vals)

        if np.all(val_check):
            p = ((c + 2*f + k) - (a + 2*d + g)) / (8 * abs(dx))
            q = ((c + 2*b + a) - (k + 2*h + g)) / (8 * abs(dy))
            cell_slope = np.sqrt(p*p + q*q)
            S[i, j] = (180 / np.pi) * np.arctan(cell_slope)
            A[i, j] = (180.0 / np.pi) * np.arctan2(q, p)

    return S, A


def calculate_circular_mean_aspect(a):
    """
    From RavenPy:
    https://github.com/CSHS-CWRA/RavenPy/blob/1b167749cdf5984545f8f79ef7d31246418a3b54/ravenpy/utilities/analysis.py#L118
    """
    angles = a[~np.isnan(a)]
    n = len(angles)
    sine_mean = np.divide(np.sum(np.sin(np.radians(angles))), n)
    cosine_mean = np.divide(np.sum(np.cos(np.radians(angles))), n)
    vector_mean = np.arctan2(sine_mean, cosine_mean)
    degrees = np.degrees(vector_mean)
    if degrees < 0:
        return degrees + 360
    else:
        return degrees


def calculate_slope_and_aspect(raster):  
    """Calculate mean basin slope and aspect 
    according to Hill (1981).

    Args:
        clipped_raster (array): dem raster

    Returns:
        slope, aspect: scalar mean values
    """

    resolution = raster.rio.resolution()
    raster_shape = raster[0].shape

    el_px = np.argwhere(np.isfinite(raster.data[0]))

    S, A = process_slope_and_aspect(raster.data[0], el_px, resolution, raster_shape)

    mean_slope_deg = np.nanmean(S)
    # should be within a hundredth of a degree or so.
    # print(f'my slope: {mean_slope_deg:.4f}, rdem: {np.nanmean(slope):.4f}')
    mean_aspect_deg = calculate_circular_mean_aspect(A)

    return mean_slope_deg, mean_aspect_deg

In [None]:
def process_climate_data_by_basin(param, basin, data):
    """
    Clip the daymet data by parameter for each basin polygon,
    and calculate the annual spatial mean. 
    Return an array of annual spatial mean values, 1980-2022.
    """
    basin = basin.to_crs(data.rio.crs)
    clipped_data = data.rio.clip(basin.geometry, all_touched=True)        
    spatial_means = clipped_data.mean(dim=['y', 'x'])
    return (param, spatial_means[param].values)

In [None]:
year = 2010
nalcms_fpath = 'data/geospatial_layers/NALCMS_2010_Vancouver_Island.tif'
nalcms, nalcms_crs, nalcms_affine = retrieve_raster(nalcms_fpath)
if not nalcms_crs:
    nalcms_crs = nalcms.rio.crs.to_wkt()

In [None]:
daymet_data = {}
for param in daymet_params:
    daymet_data[param] = xr.open_dataset(f'data/daymet_data/{param}_1980_to_2022.nc', decode_coords='all')

In [None]:
# to test, try on a few samples, note how basins_df is sliced in the for .. statement next 
# to run the whole batch, remove the [:n_samples] slice
# n_samples = 11

In [None]:
basins_df

In [None]:
all_basin_data = []
t0 = time.time()
# for i, row in basins_df[:n_samples].iterrows():
for i, row in basins_df.iterrows():
    if i % 100 == 0:
        t_int = time.time()
        ii = i
        if ii == 0:
            ii = 1
        ut = (t_int - t0) / ii
        print(f'    ...Processing {i}/{len(basins_df)} ({ut:.2f} s/basin)')
    basin_data = {}
    basin_data['region'] = region
    basin_data['id'] = i

    basin_polygon = basins_df.iloc[[i]].copy()
    
    clip_ok, clipped_dem = clip_raster_to_basin(basin_polygon, region_dem)    
    
    land_cover = process_lulc(i, basin_polygon, nalcms)
    land_cover = land_cover.to_dict('records')[0]
    # print('     lulc complate')
    basin_data.update(land_cover)
    
    soil = process_glhymps(basin_polygon, glhymps_path)
    porosity = get_soil_properties(soil, 'Porosity')
    permeability = get_soil_properties(soil, 'Permeability_no_permafrost')
    basin_data['Permeability_logk_m2'] = round(permeability, 2)
    basin_data['Porosity_frac'] = round(porosity, 5)
    
    slope, aspect = calculate_slope_and_aspect(clipped_dem)
    # print(f'aspect, slope: {aspect:.1f} {slope:.2f} ')
    basin_data['Slope_deg'] = slope
    basin_data['Aspect_deg'] = aspect


    mean_el, median_el, min_el, max_el = process_basin_elevation(clipped_dem)
    basin_data['median_el'] = median_el
    basin_data['mean_el'] = mean_el
    basin_data['max_el'] = max_el
    basin_data['min_el'] = min_el
    
    # geojson only supports one geometry column
    
    basin_data['geometry'] = row.geometry
    basin_data['ppt_x'] = row['ppt_x']
    basin_data['ppt_y'] = row['ppt_y']
    basin_data['centroid_x'] = row['centroid_x']
    basin_data['centroid_y'] = row['centroid_y']

    for climate_param in daymet_params:
        test_basin = basins_df.loc[[0]]
        basin_data[climate_param] = process_climate_data_by_basin(climate_param, basin_polygon, daymet_data[climate_param])
    
    all_basin_data.append(basin_data)


In [None]:
output = gpd.GeoDataFrame(all_basin_data, crs=3005)
output.head()
for param in daymet_params:
    print(param)
    param_vals = output.apply(lambda r: ','.join([str(round(e, 1)) for e in r[param][1]]), axis=1)
    output[param] = param_vals

output_fname = file.replace('_basins.gpkg', '_properties.gpkg')
output.to_file(os.path.join('data', output_fname))
print(f'Processed {len(output)} basins ({output_fname}).')



Processing time is roughly 1.3s/basin on a ~2018 Intel i7-8850H CPU @ 2.60GHz.  Processing time is proporti onal to the size of clipped DEM, and using JIT in the `process_slope_and_aspect` function yields ~3X performance improvement.

## References

```{bibliography} 
```