# Walkthrough Notebook

Notes:

data["geometry"][10] is a good counterexample, on LHS issue of needing to clock going OUT but current method does not.

data["geometry"][257] (Heyroyd) is weird and shows issues with OSM. There is something that SHOULD be a residential area at the top, but the line goes straight through. Just grey "nothing" - no tags. This is the base layer so has small borders around everything, meaning it will be hard to select 'all that does not have a tag' as a polygon in and of itself. Maybe SAM could work, but this is getting difficult now.

In [None]:
import urllib

import polars as pl
import requests

from data_quality_utils.polygon_matching.polygon_matching import *
from data_quality_utils.polygon_matching.polygon_plotting import *

In [None]:
datasette_base_url = "https://datasette.planning.data.gov.uk/conservation-area.csv"

query = """
select * 
from entity
"""
encoded_query = urllib.parse.urlencode({"sql": query})

r = requests.get(f"{datasette_base_url}?{encoded_query}", auth=("user", "pass"))

filename = "datasette_data.csv"
with open(filename, "wb") as f_out:
    f_out.write(r.content)

data = pl.read_csv(filename)

In [None]:
# Customise tags for which areas you want to specify for
# Prominent and common useful ones below
# May want to go one by one for best results
input_tags = {
    "landuse": ["residential", "farmyard", "cemetery", "allotments"],
    "natural": ["wood", "grassland", "meadow"],
    # 'landuse': ['farmyard'],
    # 'landuse': ['farmland'],
    # 'landuse': ['residential'],
    # 'waterway': ['drain'],
    # "highway": ['primary', 'secondary', 'unclassified', 'track']
}

input_brdr_distance = 20  # distance that we can snap to polygons within
input_brdr_threshold = 1  # parameter to tune sensitivity - lower is more sensitive
input_brdr_strategy = (
    OpenbaarDomeinStrategy.SNAP_PREFER_VERTICES
)  # strategies for how to snap to polygons
# input_brdr_strategy = OpenbaarDomeinStrategy.SNAP_ONLY_VERTICES
# input_brdr_strategy = OpenbaarDomeinStrategy.SNAP_ALL_SIDE
initial_crs = "EPSG:4326"  # CRS from datasette
brdr_crs = "EPSG:3857"  # CRS for brdr, with accurate meter interpretable distances
target_osm_crs = "EPSG:4326"  # CRS for osm input polygon
data_index = 4  # Index for this test case run
used_osm_indices = (
    None  # If wanting to test specific indices of polygons - moreso for testing
)
line_buffer = 10  # Buffer for width of line in meters, this is radius of circle generated at each point
polygon_detection_buffer = (
    1  # Detection distance for polygons on either side of actual boundary in meters
)

areas_tuple = process_areas(
    original_wkt=data["geometry"][data_index],
    initial_crs=initial_crs,
    osm_tags=input_tags,
    brdr_distance=input_brdr_distance,
    brdr_threshold=input_brdr_threshold,
    brdr_strategy=input_brdr_strategy,
    brdr_crs=brdr_crs,
    osm_query_crs=target_osm_crs,
    used_osm_indices=used_osm_indices,
    line_buffer=line_buffer,
    polygon_detection_buffer=polygon_detection_buffer,
)

if areas_tuple:
    original_border, new_border, difference_area, base_features = areas_tuple

A really good example is Sleapshyde. Note, if the distance is upped, say to 100, then the long area at the bottom is highlighted when it probably shouldn't be. Maybe this is a nice highlight as the line through the wood seems somewhat arbitrary, but setting the dist to, say, 20, gives nice clean highlights for the two main issues.

In [None]:
plot_area_with_sliders(
    original_border,
    new_border,
    difference_area,
    (255, 0, 0),  # Red
    base_features,
    (0, 0, 255),  # Blue
    0.3,  # Initial alpha value
    data["name"][data_index],
)

In [None]:
input_tags = {
    "landuse": ["residential", "farmyard", "cemetery", "allotments"],
    "natural": ["wood", "grassland", "meadow"],
}

input_brdr_distance = 20  # distance that we can snap to polygons within
input_brdr_threshold = 1  # parameter to tune sensitivity - lower is more sensitive
input_brdr_strategy = (
    OpenbaarDomeinStrategy.SNAP_PREFER_VERTICES
)  # strategies for how to snap to polygons
initial_crs = "EPSG:4326"  # CRS from datasette
brdr_crs = "EPSG:3857"  # CRS for brdr, with accurate meter interpretable distances
target_osm_crs = "EPSG:4326"  # CRS for osm input polygon
data_index = 257  # Index for this test case run
used_osm_indices = (
    None  # If wanting to test specific indices of polygons - moreso for testing
)
line_buffer = 10  # Buffer for width of line in meters, this is radius of circle generated at each point
polygon_detection_buffer = (
    1  # Detection distance for polygons on either side of actual boundary in meters
)

heyroyd_areas_tuple = process_areas(
    original_wkt=data["geometry"][data_index],
    initial_crs=initial_crs,
    osm_tags=input_tags,
    brdr_distance=input_brdr_distance,
    brdr_threshold=input_brdr_threshold,
    brdr_strategy=input_brdr_strategy,
    brdr_crs=brdr_crs,
    osm_query_crs=target_osm_crs,
    used_osm_indices=used_osm_indices,
    line_buffer=line_buffer,
    polygon_detection_buffer=polygon_detection_buffer,
)

if heyroyd_areas_tuple:
    original_border, new_border, difference_area, base_features = heyroyd_areas_tuple

Example below has a clear issue on OSM - the left hand side at the top corner clearly goes through houses when looking at the satellite images, but the area is generic with no tags. Although there is a residential area below which we would ideally want to snap to, because there is not a polygon on <i>both sides</i>, the model struggles to highlight the area as there is no overlap with an area of interest. This is also looks very wrong - setting the distance to e.g. 200 snaps to some of the top of the residential area, but again as the field is not a polygon, it is not highlighted as added area.

This could potentially be solved with SAM, then checking the overlap of SAM generated polygons with existing, then if little to no overlap (as we would expect with the grey 'nothing' area), then create that as a tag: generic polygon or something.

In [None]:
plot_area_with_sliders(
    original_border,
    new_border,
    difference_area,
    (255, 0, 0),  # Red
    base_features,
    (0, 0, 255),  # Blue
    0.3,  # Initial alpha value
    data["name"][data_index],
)

In [None]:
input_tags = {
    "landuse": ["residential", "farmyard", "cemetery", "allotments"],
    "natural": ["wood", "grassland", "meadow"],
}

input_brdr_distance = 200  # distance that we can snap to polygons within
input_brdr_threshold = 10  # parameter to tune sensitivity - lower is more sensitive
input_brdr_strategy = (
    OpenbaarDomeinStrategy.SNAP_PREFER_VERTICES
)  # strategies for how to snap to polygons
initial_crs = "EPSG:4326"  # CRS from datasette
brdr_crs = "EPSG:3857"  # CRS for brdr, with accurate meter interpretable distances
target_osm_crs = "EPSG:4326"  # CRS for osm input polygon
data_index = 0  # Index for this test case run
used_osm_indices = (
    None  # If wanting to test specific indices of polygons - moreso for testing
)
line_buffer = 10  # Buffer for width of line in meters, this is radius of circle generated at each point
polygon_detection_buffer = (
    1  # Detection distance for polygons on either side of actual boundary in meters
)

napsbury_areas_tuple = process_areas(
    original_wkt=data["geometry"][data_index],
    initial_crs=initial_crs,
    osm_tags=input_tags,
    brdr_distance=input_brdr_distance,
    brdr_threshold=input_brdr_threshold,
    brdr_strategy=input_brdr_strategy,
    brdr_crs=brdr_crs,
    osm_query_crs=target_osm_crs,
    used_osm_indices=used_osm_indices,
    line_buffer=line_buffer,
    polygon_detection_buffer=polygon_detection_buffer,
)

if napsbury_areas_tuple:
    original_border, new_border, difference_area, base_features = napsbury_areas_tuple

Napsbury is a weird one. By setting the dist relatively high (200m) we highlight some real issues. Referring to the document_url, there should not be conservation area lines through the housing areas. It is odd that brdr has chosen to align on the inside in this case, but it still highlights that something is wrong - when brdr is confused, even if it doesn't get it right, it is still useful to indicate things are not cut and dry! Setting the distance lower, say to 20, results in worse detection due to the scale of how wrong this conservation area is.

In [None]:
plot_area_with_sliders(
    original_border,
    new_border,
    difference_area,
    (255, 0, 0),  # Red
    base_features,
    (0, 0, 255),  # Blue
    0.3,  # Initial alpha value
    data["name"][data_index],
)