Problem
spk_join() (spacehakr) joins a target sf object against a mask layer from a geopackage (path_gpkg) or another sf object. It does not support PostgreSQL tables as the mask source.
For pipelines that already have the spatial data in PostgreSQL (loaded via bcdata, fwapg, or other sources), this means either:
- Pulling geometry across the wire into R via sf
- Writing the data to a temporary geopackage
- Building a custom SQL
ST_Intersects query
PostGIS handles spatial joins much faster than R/sf for large polygon layers. A direct DB-side spatial join would avoid the geometry transfer cost.
Use case
The Wedzin Kwa restoration prioritization (NewGraphEnvironment/restoration_wedzin_kwa_2024#144) loops over land tenure, parks, house groups, fire history, and other polygon layers to enrich a sites table. Currently runs from geopackages copied locally. Migrating to a fresh/link pipeline would have all the spatial data already in PostgreSQL on local Docker fwapg.
Proposed Solution
Add a conn parameter (DBI connection) so mask_tbl can be a schema-qualified PostgreSQL table:
spk_join(
target_tbl = my_sites,
mask_tbl = "whse_tantalis.ta_park_ecores_pa_svw",
mask_col_return = "protected_lands_name",
conn = lnk_db_conn(),
join_fun = sf::st_intersects
)
When conn is provided:
mask_tbl is a schema-qualified PostgreSQL table name
- The spatial join runs server-side via PostGIS
ST_Intersects (or whatever predicate matches join_fun)
- Only the requested columns plus matching geometry are returned
target_tbl is uploaded to a temporary table for the join, or rows are processed in batches
When conn is NULL (current behavior), mask_tbl is read from path_gpkg or used as an sf object directly.
Predicate mapping
join_fun from sf maps to PostGIS predicates:
sf::st_intersects → ST_Intersects
sf::st_within → ST_Within
sf::st_contains → ST_Contains
sf::st_overlaps → ST_Overlaps
nngeo::st_nn (k-nearest) → no direct equivalent, use ST_Distance ordering with LIMIT
Existing parameters to support
mask_col_return — SELECT clause filter
mask_col_filter / mask_col_filter_values / mask_col_filter_values_negate — WHERE clause on mask
target_col_return — SELECT clause on target
collapse / target_col_collapse — same aggregation behavior, applied after join
References
- spacehakr
R/spk_join.R — current geopackage/sf implementation
- restoration_wedzin_kwa_2024#144 — migration target that needs this
Problem
spk_join()(spacehakr) joins a target sf object against a mask layer from a geopackage (path_gpkg) or another sf object. It does not support PostgreSQL tables as the mask source.For pipelines that already have the spatial data in PostgreSQL (loaded via
bcdata, fwapg, or other sources), this means either:ST_IntersectsqueryPostGIS handles spatial joins much faster than R/sf for large polygon layers. A direct DB-side spatial join would avoid the geometry transfer cost.
Use case
The Wedzin Kwa restoration prioritization (NewGraphEnvironment/restoration_wedzin_kwa_2024#144) loops over land tenure, parks, house groups, fire history, and other polygon layers to enrich a sites table. Currently runs from geopackages copied locally. Migrating to a fresh/link pipeline would have all the spatial data already in PostgreSQL on local Docker fwapg.
Proposed Solution
Add a
connparameter (DBI connection) somask_tblcan be a schema-qualified PostgreSQL table:When
connis provided:mask_tblis a schema-qualified PostgreSQL table nameST_Intersects(or whatever predicate matchesjoin_fun)target_tblis uploaded to a temporary table for the join, or rows are processed in batchesWhen
connis NULL (current behavior),mask_tblis read frompath_gpkgor used as an sf object directly.Predicate mapping
join_funfrom sf maps to PostGIS predicates:sf::st_intersects→ST_Intersectssf::st_within→ST_Withinsf::st_contains→ST_Containssf::st_overlaps→ST_Overlapsnngeo::st_nn(k-nearest) → no direct equivalent, useST_Distanceordering with LIMITExisting parameters to support
mask_col_return— SELECT clause filtermask_col_filter/mask_col_filter_values/mask_col_filter_values_negate— WHERE clause on masktarget_col_return— SELECT clause on targetcollapse/target_col_collapse— same aggregation behavior, applied after joinReferences
R/spk_join.R— current geopackage/sf implementation