# Chapter 43: Geospatial Data (PostGIS)

PostgreSQL's PostGIS extension transforms the database into a spatial database management system, supporting geographic objects, spatial indexing, and geodetic calculations. Unlike storing latitude/longitude as simple numeric columns, PostGIS enables geometric validity enforcement, coordinate system transformations, and spatial relationship queries (intersections, containment, distance). This chapter covers the operational patterns required for production geospatial applications—from nearest-neighbor searches to coordinate system architecture decisions.

## 43.1 PostGIS Architecture and Type System

PostGIS extends PostgreSQL with spatial types, functions, and indexing strategies that treat geometry as first-class data citizens rather than derivative numeric pairs.

### 43.1.1 Extension Setup and Version Verification

```sql
-- Installation (requires superuser or specific privileges)
CREATE EXTENSION IF NOT EXISTS postgis;

-- Verify installation and capabilities
SELECT PostGIS_Version();
-- Returns: 3.4.0 use_64bit_artifacts=1 GEOS=3.12.0 PROJ=9.3.0 GDAL=3.8.0

-- Check available spatial reference systems
SELECT COUNT(*) FROM spatial_ref_sys;
-- Standard installation includes 5,000+ EPSG codes

-- Validate geometry support libraries
SELECT postgis_full_version();
```

**Critical Dependencies:**
PostGIS relies on GEOS (geometry operations), PROJ (coordinate transformations), and GDAL (raster/vector import). Version mismatches between PostGIS and these libraries cause calculation errors or crashes. Always verify `postgis_full_version()` matches your deployment environment.

### 43.1.2 Geometry vs. Geography Types

PostGIS provides two primary spatial types with distinct performance and precision characteristics:

```sql
-- GEOMETRY: Cartesian coordinates on a flat plane (fast, accurate for local data)
CREATE TABLE local_buildings (
    id serial PRIMARY KEY,
    name text,
    geom geometry(Point, 4326)  -- SRID 4326 = WGS84 (GPS coordinates)
);

-- GEOGRAPHY: Spherical coordinates accounting for Earth curvature (slower, accurate globally)
CREATE TABLE global_cities (
    id serial PRIMARY KEY,
    name text,
    geog geography(Point, 4326)  -- Always stores as WGS84 internally
);
```

**Decision Matrix:**

| Factor | Geometry | Geography |
|--------|----------|-----------|
| **Coordinate System** | Any projected CRS (meters/feet) | WGS84 only (lon/lat) |
| **Performance** | Fast (Cartesian math) | Slower (spherical trig) |
| **Accuracy** | Distorts at large distances | Accurate for global distances |
| **Index Support** | GiST, SP-GiST, BRIN | GiST only |
| **Use Case** | Local/regional (city, state) | Global (international routing) |

**Industry Standard:**
Store data as `geometry` with a projected coordinate system appropriate for your operational region (e.g., SRID 3857 for web mapping, state-plane for local government), maintaining WGS84 (4326) only for interoperability or global datasets exceeding single UTM zones.

### 43.1.3 Spatial Reference Systems (SRID)

Coordinate reference systems define how latitude/longitude or X/Y values map to Earth's surface.

```sql
-- View specific CRS definition
SELECT srid, auth_name, srtext, proj4text 
FROM spatial_ref_sys 
WHERE srid = 4326;
-- srtext: GEOGCS["WGS 84", DATUM["WGS_1984", ...], PRIMEM["Greenwich",0], ...]

-- Common SRIDs:
-- 4326: WGS84 (GPS standard, degrees)
-- 3857: Web Mercator (Google Maps, OpenStreetMap, meters)
-- 4269: NAD83 (North American standard)
-- State Plane (e.g., 2263 for NYC in feet)

-- Verify CRS handling in your database
SELECT ST_AsText(ST_Transform(ST_SetSRID(ST_MakePoint(-74.006, 40.7128), 4326), 3857));
-- Transforms NYC coordinates from WGS84 to Web Mercator
```

**Critical Rule:**
Never mix SRIDs in operations without transformation. PostGIS raises errors when comparing geometries with different SRIDs:

```sql
-- ERROR: Operation on mixed SRID geometries
SELECT ST_Distance(
    ST_SetSRID(ST_MakePoint(0, 0), 4326),
    ST_SetSRID(ST_MakePoint(1, 1), 3857)
);
-- Fix: Transform to common SRID
SELECT ST_Distance(
    geom1,
    ST_Transform(geom2, 4326)
);
```

## 43.2 Spatial Indexing (GiST and SP-GiST)

Spatial queries without indexes require full table scans with expensive geometric calculations—unusable beyond thousands of rows.

### 43.2.1 GiST Indexing Strategy

Generalized Search Tree (GiST) is the default and most versatile spatial index, implementing R-Tree variants optimized for multidimensional data.

```sql
-- Standard 2D GiST index (most common)
CREATE INDEX idx_buildings_geom ON buildings 
USING GiST (geom);

-- Geography index (required for geography type queries)
CREATE INDEX idx_cities_geog ON global_cities 
USING GiST (geog);

-- Multi-column spatial index (less common, specific use cases)
CREATE INDEX idx_events_location_time ON events 
USING GiST (geom, event_time);
```

**Index Selectivity:**
GiST indexes efficiently prune false positives using bounding box comparisons before expensive exact geometry calculations:

```sql
-- Query plan shows index usage
EXPLAIN ANALYZE 
SELECT * FROM buildings 
WHERE ST_DWithin(geom, ST_SetSRID(ST_MakePoint(-74.006, 40.7128), 4326), 1000);
-- Index Cond: (geom && st_expand(...)  -- Bounding box check first
-- Filter: st_dwithin(...)              -- Exact distance calculation second
```

### 43.2.2 SP-GiST for Quadtree Partitioning

Space-Partitioned GiST (SP-GiST) uses quadtrees for point data, offering faster lookups for certain distributions:

```sql
-- Optimal for point clouds with clustering
CREATE INDEX idx_crime_locations ON crime_incidents 
USING SP_GiST (location);

-- Benefits: Better compression for uniform point distributions
-- Trade-off: Less effective for overlapping complex polygons
```

### 43.2.3 BRIN for Massive Static Datasets

Block Range Indexes suit append-only spatial data (IoT telemetry, GPS tracks) where data naturally clusters by time and location:

```sql
-- Assumes data loaded in spatial order (e.g., time-series GPS)
CREATE INDEX idx_gps_tracks_brin ON gps_logs 
USING BRIN (geom) 
WITH (pages_per_range = 128);

-- Tiny index size (~10MB vs 2GB GiST for 100M rows)
-- Effective only if table is naturally ordered by location
-- Use with clustering: CLUSTER gps_logs USING idx_gps_tracks_brin;
```

## 43.3 Spatial Query Patterns

### 43.3.1 Proximity Searches (Nearest Neighbor)

Finding closest locations efficiently requires K-Nearest Neighbor (KNN) operators:

```sql
-- Nearest 10 buildings to a point (efficient index usage)
SELECT 
    id, 
    name, 
    ST_Distance(geom, ref_geom) as distance_meters
FROM buildings
WHERE ST_DWithin(geom, ref_geom, 10000)  -- Bounding box filter first
ORDER BY geom <-> ref_geom  -- KNN operator uses index
LIMIT 10;
```

**The `<->` Operator:**
This "distance box" operator utilizes GiST index ordering to return nearest neighbors without calculating distance for all rows. Critical for high-performance location services.

```sql
-- Find nearest hospital with emergency services
SELECT 
    h.name,
    h.phone,
    ST_Distance(
        h.geom::geography, 
        user_location::geography
    ) / 1000 as distance_km
FROM hospitals h
WHERE h.has_emergency = true
ORDER BY h.geom <-> user_location
LIMIT 1;
```

### 43.3.2 Containment and Intersection

```sql
-- Points within polygon (e.g., neighborhoods)
SELECT count(*) 
FROM incidents 
WHERE ST_Within(
    geom, 
    (SELECT boundary FROM neighborhoods WHERE name = 'Downtown')
);

-- Polygons intersecting bounding box (viewport queries for mapping)
SELECT id, name, geom 
FROM parcels 
WHERE geom && ST_MakeEnvelope(-74.1, 40.7, -73.9, 40.8, 4326);
-- && operator: bounding box overlap (index-only scan possible)

-- Exact intersection (slower, requires geometry check)
SELECT id 
FROM roads 
WHERE ST_Intersects(geom, construction_zone_boundary);
```

### 43.3.3 Distance Calculations and Buffering

```sql
-- Create search radius (buffer) around point
SELECT ST_Buffer(
    ST_SetSRID(ST_MakePoint(-74.006, 40.7128), 4326)::geography, 
    5000  -- meters
) as search_area;

-- Important: Buffer size units depend on CRS
-- Geometry in 4326 (degrees): buffer size in degrees (avoid)
-- Geometry in 3857 (meters): buffer size in meters (better for local)
-- Geography: buffer size always in meters (safest for variable distances)

-- Find customers within delivery radius
SELECT c.* 
FROM customers c
JOIN stores s ON ST_DWithin(
    c.location::geography, 
    s.location::geography, 
    s.delivery_radius_meters
)
WHERE s.store_id = 123;
```

**Performance Note:**
Casting to `geography` enables accurate distance calculations but prevents index usage unless the index was created on the geography column. For geometry columns with projected CRS (meters), use `ST_DWithin` directly without casting.

## 43.4 Geometric Validity and Data Quality

Invalid geometries cause query failures and incorrect results.

### 43.4.1 Validity Checking and Repair

```sql
-- Check for invalid geometries
SELECT id, ST_IsValidReason(geom) 
FROM parcels 
WHERE NOT ST_IsValid(geom);

-- Repair strategies
UPDATE parcels 
SET geom = ST_MakeValid(geom)
WHERE NOT ST_IsValid(geom);

-- Or exclude/fix during load
INSERT INTO clean_parcels (geom)
SELECT ST_MakeValid(geom) 
FROM raw_parcels
WHERE ST_IsValid(geom) OR ST_MakeValid(geom) IS NOT NULL;
```

**Common Validity Issues:**
- Self-intersecting polygons ("bowtie" shapes)
- Unclosed rings (first point ≠ last point)
- Nested shells (interior rings outside exterior)

### 43.4.2 Simplification and Precision

```sql
-- Simplify geometries for display (reduce vertices)
SELECT ST_Simplify(geom, 0.0001)  -- Tolerance in CRS units
FROM countries 
WHERE name = 'USA';

-- Set coordinate precision (reduce storage for noisy GPS)
SELECT ST_SetPrecision(geom, 0.000001)  -- 6 decimal places ~ 0.1m
FROM gps_tracks;

-- Calculate area (returns square meters for geography)
SELECT ST_Area(boundary::geography) 
FROM properties;
```

## 43.5 Operational Patterns and Performance

### 43.5.1 Coordinate Transformation Strategy

Transform once on insert, not per query:

```sql
-- Store both WGS84 (for display) and local projection (for calculations)
ALTER TABLE assets 
ADD COLUMN geom_4326 geometry(Point, 4326),
ADD COLUMN geom_local geometry(Point, 2263);  -- NY State Plane (feet)

-- Trigger maintains both columns
CREATE OR REPLACE FUNCTION transform_coords()
RETURNS trigger AS $$
BEGIN
    NEW.geom_local = ST_Transform(NEW.geom_4326, 2263);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Query using local projection for distance (feet), return WGS84 for mapping
SELECT 
    asset_id,
    name,
    ST_Distance(a.geom_local, b.geom_local) as distance_feet,
    ST_AsGeoJSON(a.geom_4326) as geojson
FROM assets a, facilities b
WHERE ST_DWithin(a.geom_local, b.geom_local, 5280);  -- 1 mile in feet
```

### 43.5.2 Vacuum and Spatial Index Maintenance

Spatial indexes fragment rapidly with updates:

```sql
-- Monitor spatial index bloat (GiST indexes bloat more than B-tree)
SELECT 
    schemaname,
    relname as table,
    indexrelname as index,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
    idx_scan
FROM pg_stat_user_indexes
WHERE indexrelname LIKE 'idx_%geom%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Reindex strategy (CONCURRENTLY is safe but slower)
REINDEX INDEX CONCURRENTLY idx_buildings_geom;
```

### 43.5.3 Partitioning for Spatial Data

For billion-row spatial datasets, partition by spatial quadkeys or geohashes:

```sql
-- Partition by quadkey prefix (spatial subdivision)
CREATE TABLE global_points (
    id bigint,
    geom geometry(Point, 4326),
    quadkey text
) PARTITION BY LIST (substring(quadkey from 1 for 2));

-- Create partitions for major regions
CREATE TABLE global_points_qk_00 PARTITION OF global_points 
FOR VALUES IN ('00', '01', '02', '03');
-- ... additional partitions

-- Query planner eliminates partitions outside bounding box
SELECT * FROM global_points 
WHERE quadkey BETWEEN '0123' AND '0124'  -- Prunes other partitions
  AND geom && search_box;
```

## Chapter Summary

In this chapter, you learned:

1. **Type Architecture**: Use `geometry` with projected coordinate systems (SRID) for local/regional data (performance), `geography` only for global datasets requiring geodetic calculations. Always verify SRID compatibility before spatial operations.

2. **Spatial Indexing**: GiST indexes are mandatory for production spatial queries, utilizing bounding box filtering before expensive exact calculations. SP-GiST optimizes point clouds; BRIN suits append-only spatial time-series with natural ordering.

3. **Query Optimization**: Use the `<->` KNN operator for nearest-neighbor searches with `LIMIT` clauses. Apply `ST_DWithin` with geometry filters before distance calculations. Cast to `geography` only when necessary for accuracy, as it prevents index usage on geometry columns.

4. **Data Quality**: Validate geometries with `ST_IsValid()` during ETL; repair with `ST_MakeValid()`. Simplify complex geometries for display using `ST_Simplify()` to reduce transfer overhead.

5. **Coordinate Strategy**: Store data in both WGS84 (4326) for interoperability and a local projected CRS (e.g., State Plane, UTM, Web Mercator) for calculations. Transform on insert via triggers, not per-query.

6. **Operational Management**: Monitor spatial index bloat carefully—frequent updates to spatial data require aggressive autovacuum tuning or periodic `REINDEX CONCURRENTLY`. Consider spatial partitioning (quadkey/geohash) for billion-row datasets to enable partition pruning.

---

**Next:** In Chapter 44, we will explore Eventing and Asynchronous Work—covering the `LISTEN`/`NOTIFY` pub/sub mechanism, the Outbox Pattern for reliable message publishing, job queue implementations using PostgreSQL, and architectural considerations for work delegation beyond the synchronous request/response cycle.