Skip to content

Add PostgreSQL support to spk_join() via conn parameter #16

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

spk_join() (spacehakr) joins a target sf object against a mask layer from a geopackage (path_gpkg) or another sf object. It does not support PostgreSQL tables as the mask source.

For pipelines that already have the spatial data in PostgreSQL (loaded via bcdata, fwapg, or other sources), this means either:

  • Pulling geometry across the wire into R via sf
  • Writing the data to a temporary geopackage
  • Building a custom SQL ST_Intersects query

PostGIS handles spatial joins much faster than R/sf for large polygon layers. A direct DB-side spatial join would avoid the geometry transfer cost.

Use case

The Wedzin Kwa restoration prioritization (NewGraphEnvironment/restoration_wedzin_kwa_2024#144) loops over land tenure, parks, house groups, fire history, and other polygon layers to enrich a sites table. Currently runs from geopackages copied locally. Migrating to a fresh/link pipeline would have all the spatial data already in PostgreSQL on local Docker fwapg.

Proposed Solution

Add a conn parameter (DBI connection) so mask_tbl can be a schema-qualified PostgreSQL table:

spk_join(
  target_tbl = my_sites,
  mask_tbl = "whse_tantalis.ta_park_ecores_pa_svw",
  mask_col_return = "protected_lands_name",
  conn = lnk_db_conn(),
  join_fun = sf::st_intersects
)

When conn is provided:

  • mask_tbl is a schema-qualified PostgreSQL table name
  • The spatial join runs server-side via PostGIS ST_Intersects (or whatever predicate matches join_fun)
  • Only the requested columns plus matching geometry are returned
  • target_tbl is uploaded to a temporary table for the join, or rows are processed in batches

When conn is NULL (current behavior), mask_tbl is read from path_gpkg or used as an sf object directly.

Predicate mapping

join_fun from sf maps to PostGIS predicates:

  • sf::st_intersectsST_Intersects
  • sf::st_withinST_Within
  • sf::st_containsST_Contains
  • sf::st_overlapsST_Overlaps
  • nngeo::st_nn (k-nearest) → no direct equivalent, use ST_Distance ordering with LIMIT

Existing parameters to support

  • mask_col_return — SELECT clause filter
  • mask_col_filter / mask_col_filter_values / mask_col_filter_values_negate — WHERE clause on mask
  • target_col_return — SELECT clause on target
  • collapse / target_col_collapse — same aggregation behavior, applied after join

References

  • spacehakr R/spk_join.R — current geopackage/sf implementation
  • restoration_wedzin_kwa_2024#144 — migration target that needs this

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions