# Tree identification challenge

This notebook shows a solution to the challenge of verifying whether a coordinate point in tree protection order data corresponds to a real tree.

### 1. Satellite images

The first part of the challenge is getting satellite images of the area of interest. This notebook shows how to download high resolution satellite images using 3 different sources: a WMS service, Google map tiles service and Google static maps service.

### 2. Tree detection

Once the satellite image is downloaded we use a model to identify all trees in the image. The model will create a box around every tree and return pixel values bounding the box. This notebook shows what that looks like for different images.

### 3. Flagging issues

We calculate the distances between every tree box and the original point, and sort them. If the point marks a real tree, the closest box should be within a pre-defined distance of the point. This distance is configurable but 10 metres seems to be a good balance between allowing some noise due to factors like shadows, dense trees etc. If the closest box is further, we flag the data point for review.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import os

import polars as pl
import requests
from dotenv import load_dotenv
from IPython.display import Image, display
from matplotlib import pyplot as plt

from data_quality_utils.map_extractor import (
    GoogleMapTilesExtractor,
    GoogleStaticMapsExtractor,
    WMSExtractor,
)
from data_quality_utils.tree_finder import TreeFinder, show_image, show_stats

In [None]:
# load environment variables
load_dotenv()
api_key = os.environ.get("GOOGLE_MAPS_API_KEY")
wms_url = os.environ.get("WMS_URL")

In [None]:
# define global variables
IMG_SIZE = (500, 500)
SCALE = 2
ZOOM = 18
OFFSET = 0.0005

### Get tree data from `planning.data.gov.uk`

In [None]:
planning_base_csv_url = "https://files.planning.data.gov.uk/"
dataset = "tree"
r = requests.get(f"{planning_base_csv_url}dataset/{dataset}.csv")

filename = "data/trees.csv"
with open(filename, "wb") as f_out:
    f_out.write(r.content)

In [None]:
data = pl.read_csv(filename).select(["name", "point", "address-text"])
data

In [None]:
# a single example to demonstrate functionality
example_tree = data[765]

# 1. Satellite images

The section below shows 3 ways to download satellite images and describes their pros and cons.

Each `Extractor` requires specific parameters described in the sections below. When downloading the map image, these will be saved as `json` metadata under the same name as the image to make tree detection easier.

## Download images from WMS

This cells shows how to use the `WMSExtractor` to download images. 

WMS stands for Web Map Service. A WMS server exposes satellite imagery from various providers (e.g. OpenStreetMap, Airbus, ...) and is accessible via a link to the server. These servers can be public or private and the specific service you use will determine what types of imagery you can access. Users can send requests to a WMS server via a GIS (geographic information system) client - in this case the `owslib` library in Python.

The map layer used in this project uses the WGS84 / EPSG4326 projection. EPSG4326 is a geographic coordinate system that represents locations on Earth using latitude and longitude in degrees, based on a spherical model of the planet. It’s the standard used by GPS and most web maps for referencing geographic positions.

The WMS link for this project was provided by Paul but any valid link will work. For a different link you might need to update the map layer based on what's available, in this project we use "APGB_Latest_UK_125mm". Make sure to create the `.env` file (based on the `env.example` template) with a WMS link so the extractor can be loaded properly while keeping the link secret.

#### Params

The extractor needs:

- `lat`: latitude coordinate
- `lon`: longiture coordinate
- `offset`: how far away in each direction to go from the point
- `img_size`: image size, default is [500, 500]

#### Pros

- good quality images
- no usage limit

#### Cons

- image quality sometimes distorted due to shadows and other issues

In [None]:
point = example_tree.get_column("point")[0].split("(")[1].split(")")[0]
lon, lat = map(float, point.split(" "))
filename = "data/wms_images/example_tree.png"

map_extractor = WMSExtractor(wms_url)
map_extractor.download_image(lat, lon, offset=OFFSET, filename=filename)
display(Image(filename, width=500))

### Download images from Google map tiles

This cell shows how to use the `GoogleMapTilesExtractor` to download images.

Google Map Tiles API provides access to map imagery by exposing pre-rendered tiles that can be retrieved using standard web requests. Each tile represents a fixed-size image (typically 256×256 pixels) corresponding to a specific zoom level and map coordinate (X, Y). Map tiles can be retrieved and assembled to create composite images.

These tiles follow a global tiling scheme based on the Web Mercator projection (EPSG3857). It represents the Earth as a flat, square map using meters as units, distorting areas near the poles but preserving shape and direction for ease of visualization and tile-based rendering.

##### Usage limits

You need a Google maps API key to use the `GoogleMapTilesExtractor`. You can find the key in cloud console under APIs and services/Credentials. Detailed instructions can be found [here](https://developers.google.com/maps/documentation/tile/get-api-key-v2?_gl=1*n3ilib*_up*MQ..*_ga*MTM5OTgxMDYxNC4xNzQzNjg2OTg5*_ga_NRWSTWS78N*MTc0MzY4Njk4OC4xLjEuMTc0MzY4Njk5Ny4wLjAuMA..&hl=en&setupProd=prerequisites). Make sure to create the `.env` file (based on the `env.example` template) with a valid Google Maps API key.

Free usage limit for this API is 100,000 requests per month. After that your GCP project will start getting charged. 

Do not run this on the whole dataset.

#### Params

The extractor needs:

- `lat`: latitude coordinate
- `lon`: longiture coordinate
- `zoom`: zoom value, higher values are closer up

#### Pros

- relatively high free usage limit

#### Cons

- image quality worse than WMS
- paid service
- coordinates must be converted to the EPSG3857 projection
- point of interest not centered by default, requires more processing

In [None]:
point = example_tree.get_column("point")[0].split("(")[1].split(")")[0]
lon, lat = map(float, point.split(" "))
filename = "data/google_tiles_images/example_tree.png"

map_extractor = GoogleMapTilesExtractor(api_key)
map_extractor.download_image(lat, lon, zoom=ZOOM, filename=filename)
display(Image(filename, width=500))

## Download images from Google static maps

This cell shows how to use the `GoogleStaticMapsExtractor` to download images.

Google Static Maps API provides an interface for retrieving static map images via HTTP requests. Unlike the tile-based system, this API returns a single map image of customizable size and scale, centered around a specific latitude and longitude. The imagery can include different map types such as satellite, roadmap, terrain, or hybrid.

Static maps use the WGS84 geographic coordinate system (EPSG4326) for specifying locations, however it uses EPSG3857 internally so some conversion is needed when calculating distances between point on the image.

##### Usage limits

You need a Google maps API key to use the `GoogleStaticMapsExtractor`. You can find the key in cloud console under APIs and services/Credentials. Detailed instructions can be found [here](https://developers.google.com/maps/documentation/tile/get-api-key-v2?_gl=1*n3ilib*_up*MQ..*_ga*MTM5OTgxMDYxNC4xNzQzNjg2OTg5*_ga_NRWSTWS78N*MTc0MzY4Njk4OC4xLjEuMTc0MzY4Njk5Ny4wLjAuMA..&hl=en&setupProd=prerequisites). Make sure to create the `.env` file (based on the `env.example` template) with a valid Google Maps API key.

Free usage limit for this API is 10,000 requests per month. After that your GCP project will start getting charged. 

Do not run this on the whole dataset.

#### Params

The extractor needs:

- `lat`: latitude coordinate
- `lon`: longiture coordinate
- `zoom`: zoom value, higher values are closer up
- `scale`: (1 or 2) image quality - 2 will double the pixel number
- `img_size`: image size, default is [500, 500]

#### Pros

- best quality images

#### Cons

- low free usage limit, can be expensive

In [None]:
point = example_tree.get_column("point")[0].split("(")[1].split(")")[0]
lon, lat = map(float, point.split(" "))
filename = "data/google_static_images/example_tree.png"

map_extractor = GoogleStaticMapsExtractor(api_key)
map_extractor.download_image(
    lat, lon, zoom=ZOOM, scale=SCALE, img_size=IMG_SIZE, filename=filename
)
display(Image(filename, width=500))

# 2. Tree detection

This section shows how to use the `TreeFinder` to detect all trees in an image. 

We use a pre-train `deepforest` model to detect all trees in an image. The specific checkpoint was fine-tuned on trees in urban areas of Berlin making it much better at recognising the trees from out tree protection order data. The model will put a bounding box around each identified tree, shown in orange.

The `find_all_trees` function acepts a filename and returns in image as a numpy array. The result images will have all the trees marked by boxed. You can use the `show_image` function to display the resulting image and give it a name if you wish.

This is not a full solution, but it gives an idea of which trees are being detected.

In [None]:
# create a Tree finder
tree_finder = TreeFinder()

#### Trees detected in WMS images

In [None]:
filename = "data/wms_images/example_tree.png"
image = tree_finder.find_all_trees(filename)
show_image(image)

In [None]:
filename = "data/google_tiles_images/example_tree.png"
image = tree_finder.find_all_trees(filename)
show_image(image)

#### Trees detected in Google static images

In [None]:
filename = "data/google_static_images/example_tree.png"
image = tree_finder.find_all_trees(filename)
show_image(image)

#### Trees detected in Google tiles images

# 3. Flagging issues

This section shows how to use the `TreeFinder` to find the closest tree to a data point. The model will execute these steps:

1. First we will use the `deepforest` model to identify all trees as described in section 2. Next we have to calculate the distance between each box and the original tree point. We use the centre of the box for the calculation.

2. Since the model returns the tree boxes as pixel values, we first convert these back into the EPSG4326 coordinate system. This is done differently based on what projection the underlying map provider uses.

3. We use the geodesic distance for calculations. This will give us a precise "as the crow flies" distance. We then sort the distances and pick the shortest one.

4. We apply a flag threshold to determine which boxes are far enough to require manual review. The flag threshold is configurable, we found 10m to work quite well.

The `find_closest_tree` function accepts these parameters:

- `filename`: the path to an image
- `convert_coords`: whether a conversion between projection systems is needed. `False` for WMSExtractor, `True` for Google extractors
- `flag_threshold`: minimum distance to flag

It returns 3 values:

- `distance`: the shortest distance box
- `flagged`: whether this tree was flagged given some `flag_threshold`
- `image`: the resulting image as a numpy array with the closest box and original tree point marked

Since the `Extractors` also save metadata about every image, you do not need to provide any coordinates or other parameters to the `TreeFinder`. It will automatically load the settings used to extract each image and use them where needed.

The section below shows the results with WMS and Google static images. Due to low quality of Google map tiles images and difficulty working with them (due to centering) this image is skipped.

#### Closest tree with WMS image

In [None]:
filename = "data/wms_images/example_tree.png"

dist, flagged, image = tree_finder.find_closest_tree(
    filename=filename,
    convert_coords=False,
    flag_threshold=10,
)
title = f"Distance = {dist:.2f}m\nFlagged for review = {flagged}"
show_image(
    image, title, save=True, save_path=f"data/result_images/wms_example_tree.png"
)

#### Closest tree with Google static image

In [None]:
filename = "data/google_static_images/example_tree.png"

dist, flagged, image = tree_finder.find_closest_tree(
    filename=filename,
    convert_coords=True,
    flag_threshold=10,
)
title = f"Distance = {dist:.2f}m\nFlagged for review = {flagged}"
show_image(
    image, title, save=True, save_path=f"data/result_images/static_example_tree.png"
)

## Compare multiple examples side by side

Finally the section below shows the results of a few selected examples.

The performance varies as a result of image quality - one extractor clearly outperforms the other one in some cases but fails in others. Most trees are still detected correctly, as well as correctly flagging cases when the data point doesn't seem to mark a tree or is very far off.

In [None]:
# pick some example trees
example_indices = [
    765,
    7865,
    8030,
    12458,
    14120,
    17346,
    20745,
    21555,
    22483,
    24920,
    26152,
    31622,
    35518,
    41236,
    46031,
    56861,
    57073,
    63443,
    71083,
    80506,
    81889,
    84328,
]

In [None]:
for example_index in example_indices:
    point = data.get_column("point")[example_index].split("(")[1].split(")")[0]
    lon, lat = map(float, point.split(" "))

    wms_filename = f"data/wms_images/tree_{example_index}.png"
    # wms_extractor = WMSExtractor(wms_url)
    # wms_extractor.download_image(lat, lon, offset=OFFSET, img_size=IMG_SIZE, filename=wms_filename)

    wms_dist, wms_flagged, wms_image = tree_finder.find_closest_tree(
        filename=wms_filename,
        convert_coords=False,
        flag_threshold=10,
    )

    static_filename = f"data/google_static_images/tree_{example_index}.png"
    # static_extractor = GoogleStaticMapsExtractor(api_key)
    # static_extractor.download_image(lat, lon, zoom=ZOOM, scale=SCALE, img_size=IMG_SIZE, filename=static_filename)

    static_dist, static_flagged, static_image = tree_finder.find_closest_tree(
        filename=static_filename,
        convert_coords=True,
        flag_threshold=10,
    )

    images = [wms_image, static_image]
    wms_title = f"Tree {example_index}\nDistance = {wms_dist:.2f}m\nFlagged for review = {wms_flagged}"
    static_title = f"Tree {example_index}\nDistance = {static_dist:.2f}m\nFlagged for review = {static_flagged}"
    titles = [wms_title, static_title]
    show_image(
        images,
        titles,
        save=True,
        save_path=f"data/result_images/tree_{example_index}.png",
    )

## Plot distances from true coordinates

This section plots a histogram of the distances obtained over the example images. The performance is comparable.

In [None]:
wms_stats = []
static_stats = []
for example_index in example_indices:
    point = data.get_column("point")[example_index].split("(")[1].split(")")[0]
    lon, lat = map(float, point.split(" "))

    wms_filename = f"data/wms_images/tree_{example_index}.png"
    wms_dist, wms_flagged, wms_image = tree_finder.find_closest_tree(
        filename=wms_filename,
        convert_coords=False,
        flag_threshold=10,
    )
    wms_stats.append(wms_dist)

    static_filename = f"data/google_static_images/tree_{example_index}.png"
    static_dist, static_flagged, static_image = tree_finder.find_closest_tree(
        filename=static_filename,
        convert_coords=True,
        flag_threshold=10,
    )
    static_stats.append(static_dist)

In [None]:
show_stats({"wms": wms_stats, "static": static_stats})