# Spatial clustering

## Clustering Methods in Geographical Analysis

### DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
DBSCAN is a density-based clustering algorithm that groups together points that are closely packed together. It identifies areas of high density separated by areas of low density. DBSCAN is particularly useful for spatial data as it can discover clusters of arbitrary shape and identify outliers (noise).

Key characteristics:
- Does not require specifying the number of clusters in advance
- Can find arbitrarily shaped clusters
- Robust to outliers
- Works well with spatial data where proximity is important

- [DBSCAN on Wikipedia](https://en.wikipedia.org/wiki/DBSCAN)
- [PostGIS ST_ClusterDBSCAN Documentation](https://postgis.net/docs/ST_ClusterDBSCAN.html)

### K-means Clustering
K-means clustering is a partitioning method that divides the space into Voronoi cells. In spatial analysis, it groups geographical points into k clusters where each point belongs to the cluster with the nearest centroid.

Key characteristics:
- Requires specifying the number of clusters (k) beforehand
- Creates compact, spherical clusters
- Minimizes within-cluster variance
- Computationally efficient for large datasets
- Available in PostGIS via ST_ClusterKMeans function

The algorithm iteratively:
1. Places k centroids randomly in the space
2. Assigns each point to the nearest centroid
3. Recalculates centroids based on the current cluster assignments
4. Repeats until convergence (minimal centroid movement)

Limitations in spatial contexts include:
- Sensitivity to initial centroid placement
- Difficulty handling clusters of different sizes and densities
- Assumption of isotropic cluster shapes

- [PostGIS ST_ClusterKMeans Documentation](https://postgis.net/docs/ST_ClusterKMeans.html)
- [QGIS Processing Algorithms: K-means clustering](https://docs.qgis.org/latest/en/docs/user_manual/processing_algs/qgis/vectoranalysis.html#k-means-clustering)



### K-Nearest Neighbors (KNN)
KNN clustering in spatial analysis groups points based on their proximity to each other. The algorithm classifies points based on the majority class among its k nearest neighbors, making it useful for regionalization and pattern detection in geographic data.

Key characteristics:
- Simple implementation
- Distance-based classification
- Adaptable to different distance metrics (Euclidean, Manhattan, etc.)
- Useful for spatial interpolation


### Other Common Spatial Clustering Methods
- K-means: Partitions observations into k clusters where each observation belongs to the cluster with the nearest mean
- Hierarchical clustering: Builds nested clusters by either a bottom-up or top-down approach
- OPTICS: An extension of DBSCAN that addresses variable density clusters
- Mean-shift: A centroid-based algorithm that works by updating candidates for centroids to be the mean of the points within a given region

## Further Reading:
- [Turf.js Spatial Analysis Library](https://turfjs.org/) - A JavaScript library for spatial analysis and clustering in web applications
- [H3 Hexagonal Hierarchical Geospatial Indexing System](https://h3geo.org/) - Hexagonal grid system that can be used for efficient spatial clustering
- [PySAL (Python Spatial Analysis Library)](https://pysal.org/) - Comprehensive library for spatial analysis including various clustering methods
- [GeoDa Software for Spatial Data Analysis](https://geodacenter.github.io/) - Popular tool for exploratory spatial data analysis
- [Scikit-learn Clustering Documentation](https://scikit-learn.org/stable/modules/clustering.html)
- [ArcGIS Spatial Clustering Methods](https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/how-spatial-clustering-works.htm)


## Eksempel på DBScan clustering i PostGIS

https://postgis.net/docs/ST_ClusterDBSCAN.html

In [None]:
-- Create a sample table with point data
CREATE TABLE sample_points (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    geom GEOMETRY(POINT, 4326)
);

-- Insert some sample data
INSERT INTO sample_points (name, geom)
VALUES
    ('Point 1', ST_SetSRID(ST_MakePoint(10.0, 59.0), 4326)),
    ('Point 2', ST_SetSRID(ST_MakePoint(10.01, 59.01), 4326)),
    ('Point 3', ST_SetSRID(ST_MakePoint(10.02, 59.02), 4326)),
    ('Point 4', ST_SetSRID(ST_MakePoint(10.1, 59.1), 4326)),
    ('Point 5', ST_SetSRID(ST_MakePoint(10.12, 59.12), 4326)),
    ('Point 6', ST_SetSRID(ST_MakePoint(10.5, 59.5), 4326)),
    ('Point 7', ST_SetSRID(ST_MakePoint(10.51, 59.51), 4326));

-- Run DBSCAN clustering
-- Parameters:
-- 1. The geometry column
-- 2. Epsilon (maximum distance between points in the same cluster)
-- 3. MinPoints (minimum number of points to form a cluster)
SELECT
    id,
    name,
    ST_ClusterDBSCAN(geom, eps := 0.05, minpoints := 2) OVER() AS cluster_id,
    geom
FROM
    sample_points;

-- Visualize clusters by counting points per cluster
SELECT
    cluster_id,
    COUNT(*) AS point_count,
    ST_Centroid(ST_Collect(geom)) AS cluster_centroid,
    ST_ConvexHull(ST_Collect(geom)) AS cluster_hull
FROM (
    SELECT
        id,
        ST_ClusterDBSCAN(geom, eps := 0.05, minpoints := 2) OVER() AS cluster_id,
        geom
    FROM
        sample_points
) clusters
WHERE
    cluster_id IS NOT NULL
GROUP BY
    cluster_id;