# Interactive Database Mapping #

This Notebook contains a code to create an interactive and searchable map database of onshore wind turbine projects across the UK. 

The Notebook is split into two main sections:
- Part 1: Data Preparation
- Part 2: Making an Interactive Map

## Part 1: Data Preparation ##

We will begin by importing the neccessary packages for the analysis and map making:

In [None]:
import os
import datetime
import pandas as pd
import geopandas as gpd
import numpy as np
import folium
from folium.plugins import Search
import requests
import webbrowser

#### Clean the raw data to prepare for map making: ####

##### 1. Load in the Rewnewable Energy Planning Database (REPD) quarterly extract for January 2025*: 

The data is currently in csv format.

*Note: As long as the data columns are the same, the following code should be compatible with future quarterly extracts, allowing maintainance of the database.

In [None]:
# Load with cp1252 to avoid Unicode error as file is not in UTF-8 encoding.
repd_data = pd.read_csv('Data/repd-q4-jan-2025.csv', encoding='cp1252')

# Then normalize any weird characters to ensure file reads correctly.
repd_data = repd_data.map(lambda x: x.replace('\xa0', ' ') if isinstance(x, str) else x)

repd_data.head() # Show raw data to get familiar with the content.

##### 2. Filter the REPD dataset to focus only on onshore wind turbine projects: #####

In [None]:
repd_data = repd_data[repd_data['Technology Type'] == 'Wind Onshore']

##### 3. Filter out any unneccessary data columns: ##### 
(e.g. data related to other technologies such as solar, reference numbers used only by REPD data, etc..)

In [None]:
# 3.1. First get a list of all the available data columns and their indices to help identify useful information:
index_columns = [(index, column) for index, column in enumerate(repd_data.columns)]

# Print the list
for index, column in index_columns:
    print(f"{index}: {column}")

In [None]:
# 3.2. Now filter the columns by dropping irrelevant columns by name:
repd_wind = repd_data.drop(columns=['Old Ref ID', 'Ref ID', 'Technology Type', 
                                    'Storage Type', 'CHP Enabled', 'Storage Co-location REPD Ref ID', 
                                    'Share Community Scheme', 'CfD Allocation Round', 'RO Banding (ROC/MWh)', 
                                    'CfD Capacity (MW)', 'Mounting Type for Solar', 'Are they re-applying (New REPD Ref)', 
                                    'Are they re-applying (Old REPD Ref) ', 'Development Status', 'Offshore Wind Round', 
                                    'Heat Network Ref', 'Solar Site Area (sqm)'])

repd_wind.head() # View updated repd_wind dataset

##### 4. Perform checks and cleaning operations on the remaining data to ensure full compatibility with folium mapping: #####

In [None]:
# 4.1. Check for suspicious characters in remaining columns which may stop the map from generating correctly:
for col in repd_wind.columns:
    if repd_wind[col].dtype == object: # check columns of type object
        if repd_wind[col].str.contains(r'\\[0-9]').any(): # Search for suspicious escape sequences like "\1", "\n", "\t", etc.
                                                          # These can cause errors or render incorrectly in HTML popups.
            print(f"Suspicious escape sequence in column: {col}")

In [None]:
# 4.1.2. Fix suspicious characters by escaping backslashes in all string/object columns:
for col in repd_wind.select_dtypes(include='object').columns:
    repd_wind[col] = repd_wind[col].str.replace(r'\\', r'\\\\', regex=True) # replace single backslashes with double backslashes, 
                                                                            # which prevents Python from interpreting them as escape sequences.

In [None]:
# 4.2. Check for non-numeric entries in the coordinate columns and drop these data rows:
non_numeric_x = repd_wind[~repd_wind['X-coordinate'].apply(pd.to_numeric, errors='coerce').notnull()]
non_numeric_y = repd_wind[~repd_wind['Y-coordinate'].apply(pd.to_numeric, errors='coerce').notnull()]

# Display rows with non-numeric coordinates
print("Non-numeric X-coordinates:")
print(non_numeric_x[['X-coordinate', 'Y-coordinate']])

print("Non-numeric Y-coordinates:")
print(non_numeric_y[['X-coordinate', 'Y-coordinate']])

# Drop rows with non-numeric coordinates
repd_wind = repd_wind.dropna(subset=['X-coordinate', 'Y-coordinate'])

##### 5. Convert the tabular data into a GeoDataFrame using the 'X-coordinate' and 'Y-coordinate' columns:

In [None]:
# Create the GeoDataFrame
wind_turbines_27700 = gpd.GeoDataFrame(repd_wind,
    geometry=gpd.points_from_xy(repd_wind['X-coordinate'], repd_wind['Y-coordinate']), # Create geometry from X, Y coordinate columns
    crs='epsg:27700') # Set ESPG to British National Grid

wind_turbines_27700.head() # Check if successful

Folium is most compatible with geographic coordinate systems, such as WGS 84 (epsg:4326). We will make a copy of the GeoDataFrame with WGS 84 coordinates for use in the folium map. 
We will also keep a copy with the British National Grid coordinates as this would be more useful for precise measurements of distances or any other analyses of interest.

In [None]:
# Make a copy of the GeoDataFrame with WGS 84:
wind_turbines = wind_turbines_27700.to_crs(epsg=4326)
print(wind_turbines.crs) # Check if successful

# Drop the now unneccessary coordinate columns:
wind_turbines = wind_turbines.drop(columns=['X-coordinate', 'Y-coordinate'])

For the purposes of the interactive database, it may be more useful to replace NaN values with N/A values to signify to the user that this information is not available from the REPD dataset, but may be available elsewhere.

In [None]:
# Change NaN values for string columns to string N/A values:
str_columns = wind_turbines.select_dtypes(include=['object'])

# Fill NaN values in string columns with 'N/A'
wind_turbines[str_columns.columns] = str_columns.fillna('N/A')
wind_turbines

##### 6. Perform last checks for invalid coordinates and invalid geometries, and drop any invalid results:

In [None]:
# 6.1. Check for any data outside valid longitude (-180 to 180) or latitude (-90 to 90):
invalid_coords = wind_turbines[
    (wind_turbines.geometry.x < -180) |
    (wind_turbines.geometry.x > 180) |
    (wind_turbines.geometry.y < -90) |
    (wind_turbines.geometry.y > 90)]

print(f"Out-of-bounds coordinates: {len(invalid_coords)}")
invalid_coords[['Site Name', 'geometry']]

In [None]:
# 6.2. Check for invalid geometries:
invalid_geom = wind_turbines[
    wind_turbines.geometry.is_empty | wind_turbines.geometry.isna()]

print(f"Invalid geometries: {len(invalid_geom)}")
invalid_geom[['Site Name', 'geometry']]

In [None]:
# 6.3. Filter out invalid geometries and coordinates:
wind_turbines = wind_turbines[
    wind_turbines.geometry.notnull() &
    wind_turbines.geometry.is_valid &
    wind_turbines.geometry.apply(lambda geom: geom.is_empty == False) &
    wind_turbines.geometry.apply(lambda geom: geom.x != float("inf") and geom.y != float("inf"))]

In [None]:
# Create a new column with nicely formatted coordinates for use in folium map popups:
wind_turbines['Coordinates'] = wind_turbines.geometry.apply(lambda geom: f"{geom.y:.5f}, {geom.x:.5f}")

Now that the data is fully cleaned and compatible with Folium, create an additional column, with all data combined for use in search function. This will allow map users to search the map based on any keyword from the dataset.

In [None]:
# Combine all fields into a new column which will be used in the search function:
wind_turbines['search_all'] = wind_turbines.apply(
    lambda row: " | ".join([str(val) for val in row.values]), axis=1)

# Check if successful
wind_turbines



## Part 2: Making an Interactive Map

##### 1. Define Functions for use in map making steps

1.1. Functions for marker radius and marker colour assignment:

In [None]:
# Function to assign a radius to each marker based on turbine height

def marker_radius(height):
    """
    Assigns marker radius for CircleMarker based on turbine height.

    Inputs:
        height (float or str): Height of turbine (m). If invalid or missing, default is used.

    Outputs:
        radius (int): Circle marker size in pixels.
    
    Rules:
        - <50m: 6px
        - 50–100m: 8px
        - 100–150m: 10px
        - >150m: 12px
        - N/A or invalid: 2px (default)
    """
    try:
        height = float(height)
        if height < 50: # under 50m return radius 6
            return 6
        elif height < 100: # 50-100m return radius 8
            return 8
        elif height < 150: # 100-150m return radius 10
            return 10
        else:
            return 12 # 150m+ return radius 12
    except:
        return 2  # default if missing/invalid height

# ---------------------------------------------------------------------------------------------------------------------

# Function to define colours of markers for each stage in the planning process

def marker_colour(gdf, column_name):
    """
    Assigns fixed, meaningful colours to markers depending on development status.

    Inputs:
        gdf (GeoDataFrame): DataFrame with status info.
        column_name (str): Must be 'Development Status (short)'.

    Outputs:
        colour_dict (dict): Mapping of present statuses to hex colours.

    Notes:
        - If column_name is not 'Development Status (short)', raises ValueError.
        - Only statuses present in the dataset are included in the output.
    """
    if column_name != "Development Status (short)":
        raise ValueError("This function only supports 'Development Status (short)'.")

    # Define logical order and fixed colours
    status_colours = {
        'Operational': '#39ff14',                  # neon green - accepted
        'Under Construction': '#006400',           # dark green - in process
        'Awaiting Construction': '#3cb371',        # medium green - in process
        'Application Submitted': '#90ee90',        # light green - in planning process
        'Appeal Lodged': '#ff7f0e',                # orange - in planning process
        'Appeal Withdrawn': '#9467bd',             # violet - voluntary change in later stages
        'Appeal Refused': '#8c564b',               # dark red - refused in later stages
        'Application Refused': '#d62728',          # red - refused
        'Application Withdrawn': '#e377c2',        # pink - voluntary change
        'Planning Permission Expired': '#7f7f7f',  # grey - inactive
        'Decommissioned': '#393b79',               # navy - no longer there
        'Abandoned': '#636363',                    # dark grey - inactive
        'Revised': '#c49c94',                      # beige - change in application
        'No Application Required': '#9edae5'       # mint blue
    }

    # Filter to only those statuses present in the data
    present_statuses = gdf[column_name].dropna().unique()
    colour_dict = {status: colour for status, colour in status_colours.items() if status in present_statuses}
    return colour_dict

1.2. Function for conditional popups for turbine data

In [None]:
# Function to clean up popups, such that only available information appears for columns 16+ 
# covering the planning process stage which the application is in (unnecessary to include all info).

def conditional_popups(gdf):
    """
    Cleans up popups to only show relevant planning information for wind turbine applications.
    
    Inputs:
        gdf (GeoDataFrame): A GeoDataFrame with full turbine metadata.

    Outputs:
        popups (list of str): List of HTML strings used as popups for each row.

    Notes:
        - Columns 0–15 are always included in the popup (basic turbine metadata).
        - Columns 16–33 and 35 are included **only if value ≠ 'N/A'** (planning process details).
        - Columns 'geometry' (index 34) and 'search_all' (index 36) are excluded entirely.
    """
    popups = []

    for _, row in gdf.iterrows():
        popup_cond = []

        # Always include first 16 columns (basic turbine info)
        for column in row.index[:16]:
            popup_cond.append(f"<b>{column}:</b> {row[column]}")

        # Conditionally include planning-related columns (16–33 and 35)
        for i in np.r_[16:34, 35]:
            column = row.index[i]
            value = str(row[column])
            if value != 'N/A':
                popup_cond.append(f"<b>{column}:</b> {value}")

        # Join all lines into one HTML string per popup
        popups.append("<br>".join(popup_cond))

    return popups


1.3. Function to add a legend to the map

In [None]:
# Function to create a legend based on the colours and sizes defined in previous function

def add_legend(map, colour_dict):
    """
    Adds a custom HTML legend to a folium map, showing turbine height and development status.

    Inputs:
        map (folium.Map): The map object to add the legend to.
        colour_dict (dict): Dictionary of status:colour, from marker_colour().

    Outputs:
        Adds HTML block to the map for display.

    Notes:
        - Height ranges are represented by varying circle sizes.
        - Statuses are displayed with coloured squares.
    """
    size_dict = { # specify labels and radius sizes to display in legend
        '0-50 meters': 6, 
        '50-100 meters': 8,
        '100-150 meters': 10,
        '150+ meters': 12,
        'undefined': 2
    }

    # Start building the legend HTML
    legend_html = '''
    <div style="position: fixed; 
            bottom: 0px; left: 0px; width: 200px; height: auto; 
            border:2px solid grey; z-index:9999; font-size:10px; 
            background-color: white; opacity: 0.8; padding: 5px;">
            <b>Wind Turbine Legend</b><br>
            <b>Development Status</b><br>
    '''
    
    # Add each category and its colour to the legend
    for category, colour in colour_dict.items():
        legend_html += f'<i style="background: {colour}; width: 15px; height: 15px; display: inline-block; margin-right: 8px;"></i>{category}<br>'
    
    # Add turbine height sizing information to the legend
    legend_html += '<br><b>Turbine Height (m)</b><br>'
    for height_range, radius in size_dict.items():
        # Add circular markers in the legend that match the size of the radius
        legend_html += f'<i style="background: gray; border-radius: 50%; width: {radius * 2}px; height: {radius * 2}px; display: inline-block; margin-right: 8px;"></i>{height_range}<br>'
  
    legend_html += '</div>'
    
    # Add the legend to the map as a popup
    map.get_root().html.add_child(folium.Element(legend_html))

1.4. Functions to add overlay data to map from folder

In [None]:
# Function to open layers (for map overlays) from folder and add to map:

def layers_from_folder(folder_path, map_object, default_color='gray'):
    """
    Loads '.geojson' and '.shp' files from folder, assigns colours based on filename,
    simplifies geometry, and adds them to the Folium map.

    Inputs:
        folder_path (str): Path to the folder containing spatial files.
        map_object (folium.Map): The Folium map to which layers will be added.
        default_color (str): Colour to assign to layers without a specific match.

    Output:
        Prints summary of added layers. Layers are added directly to the map object.
    """

    def determine_colour(filename):
        """
        Determines colour coding for each layer based on its filename.

        Input:
            filename (str): The name of the spatial file (e.g. .shp or .geojson).

        Output:
            colour (str): Colour assigned to the layer for display on the map.

        Rules:
            - Returns green for ecological/nature designations (e.g. AONB, National Parks, NNR, SAC, SPA)
            - Returns blue for sites of scientific interest (e.g. SSSI, ASSI)
            - Returns brown for archaeology/history designations (e.g. Scheduled Monuments)
            - Returns the default colour (usually gray) for peatland or other unspecified layers
        """
        fname = filename.lower()
        if any(keyword in fname for keyword in ['aonb', 'national_parks', 'nnr', 'sac', 'spa']):
            return 'green'  # return green for ecological/nature designations
        elif any(keyword in fname for keyword in ['sssi', 'assi']):
            return 'blue'   # return blue for sites of scientific interest
        elif 'scheduled_monuments' in fname:
            return 'brown'  # return brown for archaeology/history
        elif 'peatland' in fname:
            return default_color  # return default for peat/non-peat land
        else:
            return default_color

    for file in os.listdir(folder_path):
        if file.endswith(('.geojson', '.shp')):  # search for files ending in .geojson or .shp
            filepath = os.path.join(folder_path, file)
            try:
                gdf = gpd.read_file(filepath)  # read file into GeoDataFrame

                # Reproject to WGS84 for compatibility with Folium
                if gdf.crs and gdf.crs.to_epsg() != 4326:
                    gdf = gdf.to_crs(epsg=4326)

                # Simplify geometries to reduce memory use (preserve topology)
                gdf['geometry'] = gdf['geometry'].simplify(tolerance=0.01, preserve_topology=True)

                layer_name = os.path.splitext(file)[0]  # name layer based on filename
                color = determine_colour(file)           # assign colour based on filename
                add_overlay_layer(gdf, layer_name, color=color, map_object=map_object)  # add to map

                print(f" Added: {layer_name} (color: {color})")
            except Exception as e:
                print(f" Failed to load {file}: {e}")

# ---------------------------------------------------------------------------------------------------------------------

# Function to define display properties and add individual overlay layers to Folium map

def add_overlay_layer(gdf, layer_name, color='green', map_object=None):
    """
    Converts datetime-like and object-type columns to strings,
    defines display properties for a GeoDataFrame layer,
    and adds it to the Folium map with a tooltip showing all attributes.

    Inputs:
        gdf (GeoDataFrame): The spatial data to display as a map layer.
        layer_name (str): Name for the layer in the Folium LayerControl.
        color (str): Fill and border colour for the layer polygons or lines.
        map_object (folium.Map): Folium map object to add the layer to.

    Output:
        layer (folium.FeatureGroup): The styled and tooltip-enabled layer added to the map.
    """

    # Convert datetime-like or object-type columns to strings for JSON serialization
    for col in gdf.columns:
        if pd.api.types.is_datetime64_any_dtype(gdf[col]) or pd.api.types.is_object_dtype(gdf[col]):
            if gdf[col].apply(lambda x: isinstance(x, pd.Timestamp)).any():
                gdf[col] = gdf[col].astype(str)

    # Create the Folium FeatureGroup (layer toggled off by default)
    layer = folium.FeatureGroup(name=layer_name, show=False)

    # Build the GeoJson layer with style and tooltip
    geojson = folium.GeoJson(
        gdf,
        name=layer_name,
        style_function=lambda x: {
            'fillColor': color,
            'color': color,
            'weight': 2,
            'fillOpacity': 0.5,
        },
        tooltip=folium.GeoJsonTooltip(
            fields=[col for col in gdf.columns if col != 'geometry'],  # include all non-geometry fields
            aliases=[f"{col}:" for col in gdf.columns if col != 'geometry'],
            sticky=True  # tooltip follows mouse pointer
        )
    )

    layer.add_child(geojson)  # attach GeoJson to layer
    if map_object:
        layer.add_to(map_object)  # add layer to map

    return layer

##### 2. Create Folium Map

Create map centred on turbines' mean coordinates:

In [None]:
m = folium.Map(
    location=[wind_turbines.geometry.y.mean(), wind_turbines.geometry.x.mean()],
    zoom_start=6,
    width='100%',  # Full width and height
    height='100%')

Generate marker colours and popups for turbines dataset and add to map:

In [None]:
# Generate colours and popups:
colours = marker_colour(wind_turbines, 'Development Status (short)') 
popups = conditional_popups(wind_turbines)

# Add wind turbine markers with conditional popups to map object:
for i, (idx, row) in enumerate(wind_turbines.iterrows()):
    status = row['Development Status (short)']
    color = colours.get(status, 'black')
    radius = marker_radius(row['Height of Turbines (m)']) 
    if pd.notna(row.geometry.y) and pd.notna(row.geometry.x):
        folium.CircleMarker(
            location=[row.geometry.y, row.geometry.x],
            radius=radius,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.8,
            popup=folium.Popup(popups[i], max_width=300)
        ).add_to(m)

Create search group, invisible markers, and add search control to map. Markers are only added to ensure the search tool works, but will not be visible on the folium map as another layer.

In [None]:
# Invisible Search Layer
search_group = folium.FeatureGroup(show=False, control=False)

# Add invisible markers for search only
for idx, row in wind_turbines.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=0.01,
        color='transparent',
        fill_color='transparent',
        fill_opacity=0
    ).add_to(search_group)

search_group.add_to(m)

# Add the Search control
search = Search(
    layer=search_group,
    search_label='search_all',
    placeholder='Search...',
    collapsed=False
).add_to(m)

Add basemaps to the map as tile overlays from ArcGIS.

Note: if ArcGIS server happens to be down for maintainance or due to unexpected issues, this will not work.

In [None]:
# Add a topography basemap
folium.TileLayer(
    # URL to map server
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Topo_Map/MapServer/tile/{z}/{y}/{x}', 
    attr='Esri',
    name='Esri Topographic', # Layer name for layer control panel on map
    overlay=True,
    control=True # enables users to control whether layer is visible or not
).add_to(m)

# Add satellite basemap
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
    attr='Esri',
    name='Esri Satellite',
    overlay=True,
    control=True
).add_to(m)

Call on previously defined 'layers_from_folder' function to add other overlays to the map:

In [None]:
# Load and add additional layers from 'Data' folder provided
layers_from_folder('Data', m, default_color='gray')

Finally, add layer conrol, legend, and view map:

Note: uncomment 'm.save()' and 'webbrowser.open_new_tab()' to save the map hmtl and/or open it in a seperate browser tab.

In [None]:
# Add layer control and legend
folium.LayerControl().add_to(m)
add_legend(m, colours) # call on previously defined legend function to add to map

# Save the map to a HTML file
# m.save('my_map.html')

# Open in new browser tab
# webbrowser.open_new_tab('my_map.html')

# View map in Jupyter notebook:
m