# Polygon Matching Walkthrough

In this notebook we show how to utilse our PolgonMatcher class to find areas where existing conservation area boundaries do not match polygons found via other resources such as Open Street Map or OS Zoomstack.

In [None]:
import urllib

import geopandas as gpd
import osmnx as ox
import polars as pl
import requests
from brdr.enums import OpenbaarDomeinStrategy
from shapely.wkt import loads

from data_quality_utils.polygon_matching.polygon_matching import PolygonMatcher
from data_quality_utils.polygon_matching.polygon_plotting import (
    get_plotting_polygons,
    plot_area_with_sliders,
)

In [None]:
datasette_base_url = "https://datasette.planning.data.gov.uk/conservation-area.csv"

query = """
select * 
from entity
"""
encoded_query = urllib.parse.urlencode({"sql": query})

r = requests.get(f"{datasette_base_url}?{encoded_query}", auth=("user", "pass"))

filename = "datasette_data.csv"
with open(filename, "wb") as f_out:
    f_out.write(r.content)

data = pl.read_csv(filename)

## Polygon Matcher class

We initialise our class here with parameters that specify how sensitive and far reaching we wish our matcher to be. We also specify the co-ordinate systems we want to work in.

In [None]:
polygon_snap_distance = 20
input_brdr_threshold = 1
snapping_strategy = OpenbaarDomeinStrategy.SNAP_PREFER_VERTICES
base_crs = "EPSG:4326"
mercator_crs = "EPSG:3857"
used_osm_indices = None
line_buffer = 10
polygon_detection_buffer = 1

polygon_matcher = PolygonMatcher(
    base_crs=base_crs,
    polygon_snap_distance=polygon_snap_distance,
    brdr_threshold=input_brdr_threshold,
    snapping_strategy=snapping_strategy,
    mercator_crs=mercator_crs,
    polygon_detection_buffer=polygon_detection_buffer,
    line_buffer=line_buffer,
)

## Basic Usage

Below shows the function calls needed to obtain a new boundary. The new boundary is stored in `aligned_df` and the areas where our new boundary disagrees with the old is stored in `diff_df`.

In [None]:
data_index = 0
original_wkt = data["geometry"][data_index]
original_geom = loads(original_wkt)
original_df = gpd.GeoDataFrame([1], geometry=[original_geom], crs=base_crs)

input_tags = {"landuse": ["residential"]}

In [None]:
base_features_df = polygon_matcher.download_osm_polygons(original_df, input_tags)

In [None]:
aligned_df, diff_df = polygon_matcher.get_new_aligned_areas(
    original_df, base_features_df
)

## Case Study

To demonstrate practical usage, we picked Sleapshyde. Here we expand the number of features to consider and display calculations for worrying areas. After this, we can plot our results and inspect the areas highlighted as potentially incorrect.

In [None]:
data_index = 4
original_wkt = data["geometry"][data_index]
original_geom = loads(original_wkt)
original_df = gpd.GeoDataFrame([1], geometry=[original_geom], crs=base_crs)

input_tags = {
    # "landuse": ["residential", "farmyard", "cemetrey", "allotments"],
    "landuse": ["farmyard"],
    # "natural": ["wood", "grassland", "meadow"],
}

In [None]:
base_features_df = polygon_matcher.download_osm_polygons(original_df, input_tags)

In [None]:
aligned_df, diff_df = polygon_matcher.get_new_aligned_areas(
    original_df, base_features_df
)

In [None]:
results_tuple = get_plotting_polygons(
    original_df, base_features_df, aligned_df, diff_df, base_crs
)

original_border, base_features, new_border, difference_area = results_tuple

In [None]:
plot_area_with_sliders(
    original_border,
    base_features,
    new_border,
    difference_area,
    (255, 0, 0),
    (0, 0, 255),
    0.3,
    data["name"][data_index],
)