---

# Simplified access to Veluwe ecological data through OGC Web Services

**Author:** Hudson Passos  
**Internship host:** Netherlands Institute of Ecology (NIOO-KNAW)  
**Host supervisor:** Stefan Vriend (NIOO-KNAW)  
**WUR supervisor:** Liesbeth Bakker (WUR, NIOO-KNAW)  
**Repository:** [research-project-internship-nioo](https://github.com/hudsonpassos/research-project-internship-nioo)  
**Date:** July 18, 2025  
**Python version:** 3.11.9  
**License:** MIT  
**Description:**  
This notebook is part of a research internship project. It focuses on the automated selection, filtering, 
and preprocessing of open ecological geospatial datasets for the Veluwe region using OGC Web Services (WCS and WFS).


---

# Part 2: Get coverage and feature names

### 1.1. Initialization: packages, paths, and spatial inputs

In [18]:
import pandas as pd
import re 
import requests
import xml.etree.ElementTree as ET
from tqdm.notebook import tqdm
from lxml import etree
import unicodedata
from difflib import SequenceMatcher
from functools import lru_cache
from urllib.parse import urlparse, urlunparse
from urllib.parse import urlencode
from xml.etree import ElementTree as ET

**Checkpoint 01:**

In [2]:
# Loading
df = pd.read_csv("checkpoint01_ngr_all_metadata.csv")

---

**Separating datasets in WCS and WFS:**

In [3]:
df_wcs = df[df['ogc_web_services'].str.contains(r'\bWCS\b', na=False)]
df_wfs = df[df['ogc_web_services'].str.contains(r'\bWFS\b', na=False)]

### 2.1. Fetch coverage (WCS):

In [4]:
# Namespaces for XML parsing
NS = {
    'gmd':   'http://www.isotc211.org/2005/gmd',
    'gco':   'http://www.isotc211.org/2005/gco',
    'gmx':   'http://www.isotc211.org/2005/gmx',
    'xlink': 'http://www.w3.org/1999/xlink',
}

@lru_cache(maxsize=128)
def fetch_root(identifier: str) -> ET.Element | None:
    url = f"https://www.nationaalgeoregister.nl/geonetwork/srv/api/records/{identifier}/formatters/xml"
    r = requests.get(url, timeout=20)
    if r.status_code == 404:
        return None
    r.raise_for_status()
    return ET.fromstring(r.content)

@lru_cache(maxsize=128)
def fetch_json(identifier: str) -> dict:
    """
    Try to fetch the JSON metadata for a given identifier.
    If the server returns 400 or 404, assume no JSON exists → return {}.
    Otherwise raise for other errors.
    """
    url = f"https://www.nationaalgeoregister.nl/geonetwork/srv/api/records/{identifier}/formatters/json"
    r = requests.get(url, timeout=20)
    if r.status_code in (400, 404):
        return {}
    r.raise_for_status()
    return r.json()

def first_text(elem: ET.Element, path: str) -> str | None:
    node = elem.find(path, NS)
    return node.text.strip() if node is not None and node.text else None

def extract_wcs_layers_from_metadata_or_url(identifier: str, full_url: str) -> list[str]:
    """
    Tries to extract WCS layer names from CSW metadata using the identifier first.
    If no layer names are found, falls back to using the GetCapabilities URL.
    Returns a list of WCS layer names (strings).
    """
    # === First attempt: CSW metadata (function 1) ===
    try:
        root = fetch_root(identifier)
        if root is not None:
            layer_names = []
            for ci in root.findall(
                './/gmd:distributionInfo//gmd:MD_DigitalTransferOptions//gmd:onLine//gmd:CI_OnlineResource',
                NS
            ):
                proto = first_text(ci, 'gmd:protocol/gco:CharacterString') or ''
                url = first_text(ci, 'gmd:linkage/gmd:URL') or ''
                is_wcs = 'wcs' in proto.lower() or 'wcs' in url.lower()
                if is_wcs:
                    name = first_text(ci, 'gmd:name/gco:CharacterString')
                    if name:
                        layer_names.append(name.strip().strip("[]"))
            if layer_names:
                return layer_names
    except Exception as e:
        print(f"⚠️ Error parsing CSW metadata for identifier {identifier}: {e}")

    # === Fallback: Live WCS GetCapabilities (function 2) ===
    try:
        response = requests.get(full_url, timeout=20)
        response.raise_for_status()
        xml_root = ET.fromstring(response.content)

        # Detect WCS version
        root_tag = xml_root.tag.lower()
        if "wcs/2.0" in root_tag or "capabilities" in root_tag and "2.0" in response.text:
            namespaces = {'wcs': 'http://www.opengis.net/wcs/2.0'}
            coverage_elements = xml_root.findall('.//wcs:CoverageSummary', namespaces)
            return [
                el.findtext('wcs:CoverageId', namespaces=namespaces).strip()
                for el in coverage_elements
                if el.findtext('wcs:CoverageId', namespaces=namespaces)
            ]
        elif "wcs" in root_tag:
            namespaces = {'wcs': 'http://www.opengis.net/wcs'}
            coverage_elements = xml_root.findall('.//wcs:CoverageOfferingBrief', namespaces)
            return [
                el.findtext('wcs:name', namespaces=namespaces).strip()
                for el in coverage_elements
                if el.findtext('wcs:name', namespaces=namespaces)
            ]
    except Exception as e:
        print(f"⚠️ Error fetching WCS GetCapabilities from {full_url}: {e}")

    # No layers found
    return []

def mining_coverage(df, extract_wcs_layers_from_metadata_or_url):
    """
    Uses your original working logic:
    1. Extract CSW layer names from df['identifier'] using extract_wcs_layers_from_metadata_or_url()
    2. Clean those names for matching
    3. Fetch WCS CoverageIds from GetCapabilities (per row)
    4. Match based on cleaned names
    Adds only 'CSW_metadata_name' and 'coverage_id' columns to df
    """
    
    def get_csw_metadata_name(df, extract_func):
        """
        Returns a list of csw_metadata_name values (one per row), extracted using the given function.
        To be assigned like: df["csw_metadata_name"] = get_csw_metadata_name(df, extract_func)
        """
        return [
            extract_func(row["identifier"], row["wcs_getcapabilities_url"])
            #for _, row in df.iterrows()
            for _, row in tqdm(df.iterrows(), total=len(df), desc="Extracting CSW metadata name")
        ] 
    
    def clean_csw_metadata_name(name):
        if isinstance(name, str) and name:
            parts = re.split(r'__|[:\[\]]', name)
            cleaned = parts[-1] if parts else None
            return cleaned if cleaned else name
        return None
    
    def clean_coverage_id(cov_id):
        """
        Splits coverage_id on '__' and returns the second part if present.
        """
        if cov_id and isinstance(cov_id, str) and "__" in cov_id:
            return cov_id.split("__")[1]
        return cov_id
    
    def ensure_getcapabilities_url(url: str) -> str:
        """
        Ensures the URL includes the GetCapabilities request for a WCS service.
        If it already includes 'request=GetCapabilities' (case-insensitive), it returns the URL unchanged.
        Otherwise, it appends 'service=WCS&request=GetCapabilities' appropriately.
        """
        url_lower = url.lower()
        if "request=getcapabilities" in url_lower:
            return url

        # Strip trailing ? or & to avoid malformed URLs
        url = url.rstrip("?&")

        # Add separator based on whether URL already has query parameters
        separator = "&" if "?" in url else "?"
        return f"{url}{separator}service=WCS&request=GetCapabilities"

    def find_matching_coverage_id(layer_name: str, wcs_url: str) -> str:
        """
        Attempts to find a matching WCS CoverageId from the given GetCapabilities URL.

        Matching strategy:
        1. Exact match with <ows:Title>
        2. Exact match with last part of <wcs:CoverageId> (split by '__' or ':')
        3. Exact match with full <wcs:CoverageId>
        4. Fuzzy match fallback using normalized strings (threshold = 0.8)
        """

        def normalize(text):
            if not isinstance(text, str):
                return ''
            text = unicodedata.normalize('NFKD', text).encode('ASCII', 'ignore').decode('utf-8')
            return re.sub(r'[^a-z0-9]', '', text.lower().strip())

        wcs_url = ensure_getcapabilities_url(wcs_url)
        
        try:
            response = requests.get(wcs_url, timeout=30)
            response.raise_for_status()
            root = etree.fromstring(response.content)
        except Exception as e:
            print(f"❌ Error fetching WCS capabilities from {wcs_url}: {e}")
            return None

        ns = {
            "wcs": "http://www.opengis.net/wcs/2.0",
            "ows": "http://www.opengis.net/ows/2.0"
        }

        target = normalize(layer_name)
        best_match = None
        best_score = 0.0

        for coverage in root.findall(".//wcs:CoverageSummary", namespaces=ns):
            cov_id = coverage.findtext("wcs:CoverageId", namespaces=ns)
            title = coverage.findtext("ows:Title", namespaces=ns)

            cov_id_norm = normalize(cov_id)
            title_norm = normalize(title)
            last_part = normalize(re.split(r"__|:", cov_id)[-1]) if cov_id else ''

            # Exact matches
            if title_norm == target:
                return cov_id
            if last_part == target:
                return cov_id
            if cov_id_norm == target:
                return cov_id

            # Fuzzy match
            for candidate_label, candidate in [
                ("title", title_norm),
                ("last_part", last_part),
                ("cov_id", cov_id_norm)
            ]:
                ratio = SequenceMatcher(None, candidate, target).ratio()
                if ratio > best_score and ratio > 0.7:
                    best_match = cov_id
                    best_score = ratio

        return best_match

    def get_wcs_coverage_table_rowwise(df):
        """
        For each row in df, fetches the WCS coverage IDs and tries to match the clean layer name
        with the value in 'csw_metadata_name_clean'. If matched, sets the full coverage ID
        (e.g., groupname__layername) into 'coverage_id' and the cleaned name into 'clean_coverage_id'.
        """
        coverage_ids = []
        clean_coverage_ids = []

        #for idx, row in df.iterrows():
        for idx, row in tqdm(df.iterrows(), total=len(df), desc="Matching WCS coverage IDs"):
            csw_name_clean = row.get("csw_metadata_name_clean")
            wcs_url = row.get("wcs_getcapabilities_url")
            matching_coverage_id = find_matching_coverage_id(csw_name_clean, wcs_url)
            coverage_ids.append(matching_coverage_id)
            clean_coverage_ids.append(csw_name_clean if matching_coverage_id else None)

        df["coverage_id"] = coverage_ids
        df["clean_coverage_id"] = clean_coverage_ids

        return df 
    
    # Step 1:
    df["csw_metadata_name"] = get_csw_metadata_name(df, extract_wcs_layers_from_metadata_or_url)
    df["csw_metadata_name"] = df["csw_metadata_name"].apply(lambda x: x[0] if isinstance(x, list) and x else None)
    
    # Step 2
    df["csw_metadata_name_clean"] = df["csw_metadata_name"].apply(clean_csw_metadata_name)

    # Step 3a
    df = get_wcs_coverage_table_rowwise(df)
    #df["coverage_id"], df["clean_coverage_id"] = get_wcs_coverage_table_rowwise(df)
   
    # Step 4
    df["layer"] = df.apply(
        lambda row: row["coverage_id"]
        if pd.notna(row["coverage_id"])
        else row["csw_metadata_name"],
        axis=1
    )
    
    # Step 5
    df["coverage_id"] = df.apply(
    lambda row: row["csw_metadata_name"][0]
    if pd.isna(row["coverage_id"]) and isinstance(row["csw_metadata_name"], list) and row["csw_metadata_name"]
    else row["coverage_id"],
    axis=1
)
  
    return df

**Executing function:**

In [5]:
df_wcs = mining_coverage(df_wcs, extract_wcs_layers_from_metadata_or_url)

Extracting CSW metadata name:   0%|          | 0/928 [00:00<?, ?it/s]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["csw_metadata_name"] = get_csw_metadata_name(df, extract_wcs_layers_from_metadata_or_url)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["csw_metadata_name"] = df["csw_metadata_name"].apply(lambda x: x[0] if isinstance(x, list) and x else None)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  d

Matching WCS coverage IDs:   0%|          | 0/928 [00:00<?, ?it/s]

❌ Error fetching WCS capabilities from https://data.rivm.nl/geo/gcn/wcs/GetCapabilities?request=GetCapabilities: 400 Client Error:  for url: https://data.rivm.nl/geo/gcn/wcs/GetCapabilities?request=GetCapabilities
❌ Error fetching WCS capabilities from https://data.rivm.nl/geo/gcn/wcs/GetCapabilities?request=GetCapabilities: 400 Client Error:  for url: https://data.rivm.nl/geo/gcn/wcs/GetCapabilities?request=GetCapabilities
❌ Error fetching WCS capabilities from http://geodata.rivm.nl/geoserver/wcs?service=WCS&request=GetCapabilities: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["coverage_id"] = coverage_ids
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["clean_coverage_id"] = clean_coverage_ids
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["layer"] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_index

### 2.2. Get CRS and resolution (via DescribeCoverage)

In [6]:
def get_crs_and_resolution(wcs_url: str, layer: str, fallback_layer: str = None) -> tuple[str | tuple[str, str]]:
    """
    Extract the EPSG CRS and spatial resolution from DescribeCoverage.
    Tries WCS version 2.0.1 first, falls back to 1.0.0 if needed.
    If the first identifier fails, retries using fallback_layer.
    
    Returns:
        (epsg_code, (res_x, res_y)) or ("unavailable", "unavailable")
    """
    def parse_v2_0_1(response_content):
        ns = {
            'wcs': "http://www.opengis.net/wcs/2.0",
            'gml': "http://www.opengis.net/gml/3.2"
        }
        root = ET.fromstring(response_content)

        envelope = root.find(".//gml:Envelope", ns)
        crs_uri = envelope.attrib.get("srsName") if envelope is not None else None
        crs_epsg = crs_uri.split('/')[-1] if crs_uri and crs_uri.startswith("http") else "unavailable"

        offset_vectors = root.findall(".//gml:offsetVector", ns)
        if len(offset_vectors) >= 2:
            try:
                vec1 = [float(v) for v in offset_vectors[0].text.strip().split()]
                vec2 = [float(v) for v in offset_vectors[1].text.strip().split()]
                dx = max(abs(vec1[0]), abs(vec2[0]))
                dy = max(abs(vec1[1]), abs(vec2[1]))
                return crs_epsg, (dx, dy)
            except:
                return crs_epsg, "unavailable"

        return crs_epsg, "unavailable"

    def parse_v1_0_0(response_content):
        ns = {
            'wcs': "http://www.opengis.net/wcs",
            'gml': "http://www.opengis.net/gml"
        }
        root = ET.fromstring(response_content)

        envelope = root.find(".//gml:Envelope", ns)
        crs_uri = envelope.attrib.get("srsName") if envelope is not None else None
        crs_epsg = crs_uri.split(':')[-1] if crs_uri and "EPSG" in crs_uri else "unavailable"

        low = root.find(".//gml:low", ns)
        high = root.find(".//gml:high", ns)
        if low is not None and high is not None:
            try:
                low_coords = [int(c) for c in low.text.strip().split()]
                high_coords = [int(c) for c in high.text.strip().split()]
                size_x = abs(high_coords[0] - low_coords[0]) + 1
                size_y = abs(high_coords[1] - low_coords[1]) + 1

                lower_corner = root.find(".//gml:pos[1]", ns)
                upper_corner = root.find(".//gml:pos[2]", ns)
                if lower_corner is not None and upper_corner is not None:
                    lc = [float(v) for v in lower_corner.text.strip().split()]
                    uc = [float(v) for v in upper_corner.text.strip().split()]
                    res_x = abs((uc[0] - lc[0]) / size_x)
                    res_y = abs((uc[1] - lc[1]) / size_y)
                    return crs_epsg, (res_x, res_y)
            except:
                return crs_epsg, "unavailable"

        return crs_epsg, "unavailable"

    def try_with_identifier(identifier):
        if not wcs_url or not identifier:
            return "unavailable", "unavailable"

        base_url = wcs_url.split('?')[0]

        # Try WCS 2.0.1
        url_v2 = f"{base_url}?service=WCS&request=DescribeCoverage&version=2.0.1&coverageId={identifier}"
        try:
            response = requests.get(url_v2, timeout=20)
            response.raise_for_status()
            return parse_v2_0_1(response.content)
        except Exception:
            pass

        # Fallback to WCS 1.0.0
        url_v1 = f"{base_url}?service=WCS&request=DescribeCoverage&version=1.0.0&coverage={identifier}"
        try:
            response = requests.get(url_v1, timeout=20)
            response.raise_for_status()
            return parse_v1_0_0(response.content)
        except Exception:
            return "unavailable", "unavailable"

    # First try with `layer`, fallback to `coverage_id` if needed
    crs, res = try_with_identifier(layer)
    if crs == "unavailable" and fallback_layer:
        return try_with_identifier(fallback_layer)
    return crs, res


**Executing function:**

In [7]:
tqdm.pandas()

# Apply the function once and store the result
df_wcs[["crs_epsg", "spatial_resolution"]] = df_wcs.progress_apply(
    lambda row: pd.Series(get_crs_and_resolution(row["wcs_getcapabilities_url"], row["layer"], row["coverage_id"])),
    axis=1
)

  0%|          | 0/928 [00:00<?, ?it/s]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_wcs[["crs_epsg", "spatial_resolution"]] = df_wcs.progress_apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_wcs[["crs_epsg", "spatial_resolution"]] = df_wcs.progress_apply(


In [8]:
df_wcs

Unnamed: 0,identifier,resource_type,md_standard,ogc_web_services,md_date,language,crs_epsg_codes,title,keywords,abstract,...,access_rights,wcs_getcapabilities_url,wfs_getcapabilities_url,csw_metadata_name,csw_metadata_name_clean,coverage_id,clean_coverage_id,layer,crs_epsg,spatial_resolution
87,9d973c4a-ef03-4785-b7f6-942e86b385f8,dataset,ISO 19115,WMS; WCS,2024-12-15,dut,,Bathymetrie Nederland - kust,Bathymetrie; Bodemhoogte; loding; multibeam; S...,Nederlands deel van de Noordzee kust ondieper ...,...,otherRestrictions,https://geo.rijkswaterstaat.nl/services/ogc/gd...,,bodemhoogte_20mtr:bodemhoogte_20mtr,bodemhoogte_20mtr,bodemhoogte_20mtr__bodemhoogte_20mtr,bodemhoogte_20mtr,bodemhoogte_20mtr__bodemhoogte_20mtr,28992,"(20.0, 20.0)"
99,{417EC886-0DB7-4362-ADAE-1AA0849769F1},dataset,ISO 19115,WMS; WCS,2024-12-10,dut,28992; 5709,"Stafkaarten (omgeving: Krabbendijke,Rilland,Ho...",Risico's en externe veiligheid; Mileu,Kaart met locaties waar op basis van historisc...,...,otherRestrictions,https://opengeodata.zeeland.nl/geoserver/raste...,,GEORMA_STFKRTWO2MRK_ZTRST,GEORMA_STFKRTWO2MRK_ZTRST,rasters_stafkaarten__georma_stfkrtwo2mrk_ztrst,GEORMA_STFKRTWO2MRK_ZTRST,rasters_stafkaarten__georma_stfkrtwo2mrk_ztrst,28992,"(2.231777438774114, 2.2317774311926573)"
123,{FE1D7765-83F6-410C-AEF9-1D1A6DA41226},dataset,ISO 19115,WMS; WCS,2024-12-10,dut,28992; 5709,Waterkansenkaart Stedelijk gebied – Inspanning...,Waterkansenkaart; Waterhuishouding,Om aan te geven in welke richting stedelijke u...,...,otherRestrictions,https://opengeodata.zeeland.nl/geoserver/raste...,,GEOGWD_WTRKNSKRTGSHBBWRST,GEOGWD_WTRKNSKRTGSHBBWRST,rasters__GEOGWD_WTRKNSKRTGSHBBWRST,GEOGWD_WTRKNSKRTGSHBBWRST,rasters__GEOGWD_WTRKNSKRTGSHBBWRST,28992,"(25.0, 25.0)"
126,{B046F51C-DEAE-4148-88F6-996B92493E3D},dataset,ISO 19115,WMS; WCS,2024-03-26,dut,28992; 5709,Stafkaarten ( omgeving: Westkapelle ),Risico's en externe veiligheid; Milieu,Kaart met locaties waar op basis van historisc...,...,otherRestrictions,https://opengeodata.zeeland.nl/geoserver/raste...,,GEORMA_STFKRTWO2KOPWLHRST,GEORMA_STFKRTWO2KOPWLHRST,rasters_stafkaarten__GEORMA_STFKRTWO2KOPWLHRST,GEORMA_STFKRTWO2KOPWLHRST,rasters_stafkaarten__GEORMA_STFKRTWO2KOPWLHRST,28992,"(2.263138023382273, 2.26313801420783)"
140,{C70C0460-80D8-47C6-A245-177A0A7D98B1},dataset,ISO 19115,WMS; WCS,2024-12-10,dut,28992; 5709,"Stafkaarten (omgeving: Wemeldinge, Yerseke )",Risico's en externe veiligheid; Milieu,Kaart met locaties waar op basis van historisc...,...,otherRestrictions,https://opengeodata.zeeland.nl/geoserver/raste...,,GEORMA_STFKRTWO2WMLRST,GEORMA_STFKRTWO2WMLRST,rasters_stafkaarten__GEORMA_STFKRTWO2WMLRST,GEORMA_STFKRTWO2WMLRST,rasters_stafkaarten__GEORMA_STFKRTWO2WMLRST,28992,"(2.2342986297652967, 2.234298638778217)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9314,121bdaa7-2fea-48a8-ae29-299a816cd73d,dataset,ISO 19115,WMS; WCS,2024-08-06,dut,28992; 5709,"Droogvalduurkaart, basiskaart voor de Zoute Ec...",,Van de Oosterschelde is in 2021 een ecotopenka...,...,otherRestrictions,https://geo.rijkswaterstaat.nl/services/ogc/gd...,,ecotopen_zout_raster:edroogvalduur_os_2021,edroogvalduur_os_2021,ecotopen_zout_raster__edroogvalduur_os_2021,edroogvalduur_os_2021,ecotopen_zout_raster__edroogvalduur_os_2021,28992,"(20.000000000000142, 20.00000000000575)"
9315,3293e9bf-6299-4b49-b73d-a6c8c76fc7f4,dataset,ISO 19115,WMS; WCS,2024-07-31,dut,28992; 5709,Luchtfoto A27 Lunetten-Hooipolder,,Luchtfoto gevlogen t.b.v. het maken van een DT...,...,otherRestrictions,https://geo.rijkswaterstaat.nl/services/ogc/gd...,,luchtfoto_divers:a27,a27,luchtfoto_divers__a27,a27,luchtfoto_divers__a27,28992,"(0.05, 0.05)"
9321,43b001c7-29d2-40c5-89af-fa455620b177,dataset,ISO 19115,WMS; WCS,2025-07-15,dut,28992; 5709,Bathymetrie Nederland - binnenwateren 1 mtr. R...,hvd; bathymetrie; bodemhoogte; multibeam,Dit bodemhoogtebestand bevat een grid van de l...,...,otherRestrictions,https://geo.rijkswaterstaat.nl/services/ogc/gd...,,bodemhoogte_1mtr:ZN_west_NAP,ZN_west_NAP,bodemhoogte_1mtr__ZN_west_NAP,ZN_west_NAP,bodemhoogte_1mtr__ZN_west_NAP,28992,"(1.0, 1.0)"
9322,07377c38-d000-431f-872c-3febe630e6da,dataset,ISO 19115,WMS; WCS,2024-05-28,dut,28992; 5709,Satellietbeeld Houtribdijk 2004,,"SAT1 opname van 16 april 2004 / 10:43 GMT, van...",...,otherRestrictions,https://geo.rijkswaterstaat.nl/services/ogc/gd...,,luchtfoto_divers:sat1_houtkribdijk,sat1_houtkribdijk,luchtfoto_divers__sat1_houtkribdijk,sat1_houtkribdijk,luchtfoto_divers__sat1_houtkribdijk,28992,"(0.5999999999999995, 0.6000000000000013)"


**Removing layer duplicates**

In [24]:
df_wcs = df_wcs.drop_duplicates(subset="layer", keep="first")

**Checkpoint 02a:**

In [25]:
# Saving
df_wcs.to_csv("checkpoint02_ngr_WCS_metadata.csv", index=False)

---

### 2.3. Get feature collection name (WFS):

In [10]:
def extract_wfs_layers_from_metadata_or_url(identifier: str, full_url: str) -> list[str]:
    """
    Tries to extract WFS layer names from CSW metadata using the identifier first.
    If no layer names are found, falls back to using the GetCapabilities URL.
    Returns a list of WFS layer names (strings).
    """
    # === First attempt: CSW metadata (function 1) ===
    try:
        root = fetch_root(identifier)
        if root is not None:
            layer_names = []
            for ci in root.findall(
                './/gmd:distributionInfo//gmd:MD_DigitalTransferOptions//gmd:onLine//gmd:CI_OnlineResource',
                NS
            ):
                proto = first_text(ci, 'gmd:protocol/gco:CharacterString') or ''
                url = first_text(ci, 'gmd:linkage/gmd:URL') or ''
                is_wfs = 'wfs' in proto.lower() or 'wfs' in url.lower()
                if is_wfs:
                    name = first_text(ci, 'gmd:name/gco:CharacterString')
                    if name:
                        layer_names.append(name.strip().strip("[]"))
            if layer_names:
                return layer_names
    except Exception as e:
        print(f"⚠️ Error parsing CSW metadata for identifier {identifier}: {e}")

    # === Fallback: Live WFS GetCapabilities (function 2) ===
    try:
        response = requests.get(full_url, timeout=20)
        response.raise_for_status()
        xml_root = ET.fromstring(response.content)

        # Detect WFS version
        root_tag = xml_root.tag.lower()
        if "wfs/2.0" in root_tag or "capabilities" in root_tag and "2.0" in response.text:
            namespaces = {'wfs': 'http://www.opengis.net/wfs/2.0'}
            feature_elements = xml_root.findall('.//wfs:FeatureTypeList/wfs:FeatureType', namespaces)
            return [
                el.findtext('wfs:Name', namespaces=namespaces).strip()
                for el in feature_elements
                if el.findtext('wfs:Name', namespaces=namespaces)
            ]
        elif "wfs" in root_tag:
            namespaces = {'wfs': 'http://www.opengis.net/wfs'}
            feature_elements = xml_root.findall('.//wfs:FeatureTypeList/wfs:FeatureType', namespaces)
            return [
                el.findtext('wfs:Name', namespaces=namespaces).strip()
                for el in feature_elements
                if el.findtext('wfs:Name', namespaces=namespaces)
            ]
    except Exception as e:
        print(f"⚠️ Error fetching WFS GetCapabilities from {full_url}: {e}")

    # No layers found
    return []

def mining_featnames(df, extract_wfs_layers_from_metadata_or_url):
    """
    Uses your original working logic:
    1. Extract CSW layer names from df['identifier'] using extract_wcs_layers_from_metadata_or_url()
    2. Clean those names for matching
    3. Fetch WCS CoverageIds from GetCapabilities (per row)
    4. Match based on cleaned names
    Adds only 'CSW_metadata_name' and 'coverage_id' columns to df
    """
 
    def get_csw_metadata_name(df, extract_func):
        """
        Returns a list of csw_metadata_name values (one per row), extracted using the given function.
        To be assigned like: df["csw_metadata_name"] = get_csw_metadata_name(df, extract_func)
        """
        return [
            extract_func(row["identifier"], row["wfs_getcapabilities_url"])
            for _, row in tqdm(df.iterrows(), total=len(df), desc="Extracting CSW metadata feature name")
        ]

    def clean_csw_metadata_name(name):
        if isinstance(name, str) and name:
            parts = re.split(r'__|[:\[\]]', name)
            cleaned = parts[-1] if parts else None
            return cleaned if cleaned else name
        return None
    
    def clean_coverage_id(cov_id):
        """
        Splits coverage_id on '__' and returns the second part if present.
        """
        if cov_id and isinstance(cov_id, str) and "__" in cov_id:
            return cov_id.split("__")[1]
        return cov_id

    def ensure_getcapabilities_url(url: str) -> str:
        """
        Ensures the URL includes the GetCapabilities request for a WFS service.
        If it already includes 'request=GetCapabilities' (case-insensitive), it returns the URL unchanged.
        Otherwise, it appends 'service=WFS&request=GetCapabilities' appropriately.
        If the URL is empty or nan, raises ValueError.
        """
        if not isinstance(url, str) or url.strip() == "" or pd.isna(url):
            raise ValueError("Invalid or missing WFS URL")

        url_lower = url.lower()
        if "request=getcapabilities" in url_lower:
            return url

        # Strip trailing ? or & to avoid malformed URLs
        url = url.rstrip("?&")

        # Add separator based on whether URL already has query parameters
        separator = "&" if "?" in url else "?"
        return f"{url}{separator}service=WFS&request=GetCapabilities"

    def find_matching_coverage_id(layer_name: str, wfs_url: str) -> str:
        """
        Attempts to find a matching WFS FeatureType Name from the given GetCapabilities URL.

        Matching strategy:
        1. Exact match with <ows:Title>
        2. Exact match with last part of <wfs:Name> (split by '__' or ':')
        3. Exact match with full <wfs:Name>
        4. Fuzzy match fallback using normalized strings (threshold = 0.8)
        """
        def normalize(text):
            if not isinstance(text, str):
                return ''
            text = unicodedata.normalize('NFKD', text).encode('ASCII', 'ignore').decode('utf-8')
            return re.sub(r'[^a-z0-9]', '', text.lower().strip())

        #wfs_url = ensure_getcapabilities_url(wfs_url)
        
        try:
            wfs_url = ensure_getcapabilities_url(wfs_url)
        except ValueError:
            print(f"⚠️ Skipping layer '{layer_name}' because WFS URL is missing or invalid.")
            return None     
        
        try:
            response = requests.get(wfs_url, timeout=30)
            response.raise_for_status()
            root = etree.fromstring(response.content)
        except Exception as e:
            print(f"❌ Error fetching WFS capabilities from {wfs_url}: {e}")
            return None

        ns = {
            "wfs": "http://www.opengis.net/wfs/2.0",
            "ows": "http://www.opengis.net/ows/2.0"
        }

        target = normalize(layer_name)
        best_match = None
        best_score = 0.0

        for feature in root.findall(".//wfs:FeatureTypeList/wfs:FeatureType", namespaces=ns):
            feature_id = feature.findtext("wfs:Name", namespaces=ns)
            title = feature.findtext("ows:Title", namespaces=ns)

            feature_id_norm = normalize(feature_id)
            title_norm = normalize(title)
            last_part = normalize(re.split(r"__|:", feature_id)[-1]) if feature_id else ''

            # Exact matches
            if title_norm == target:
                return feature_id
            if last_part == target:
                return feature_id
            if feature_id_norm == target:
                return feature_id

            # Fuzzy match
            for candidate_label, candidate in [
                ("title", title_norm),
                ("last_part", last_part),
                ("feature_id", feature_id_norm)
            ]:
                ratio = SequenceMatcher(None, candidate, target).ratio()
                if ratio > best_score and ratio > 0.7:
                    best_match = feature_id
                    best_score = ratio

        return best_match

    def get_wfs_feature_table_rowwise(df):
        """
        Matches CSW metadata names to WFS FeatureType names from GetCapabilities.

        For each row:
        1. Clean the CSW metadata name.
        2. Fetch WFS capabilities XML.
        3. Find the best-matching <wfs:Name> based on:
           - Exact match with <ows:Title>, <wfs:Name>, or its suffix.
           - Fuzzy match (similarity > 0.7) as fallback.
        4. Add the matched name to 'feature_id' and the cleaned name to 'clean_feature_ids'.

        Returns the updated DataFrame.
        """
        feature_id = []
        clean_feature_ids = []

        #for idx, row in df.iterrows():
        for idx, row in tqdm(df.iterrows(), total=len(df), desc="Matching WFS FeatureTypes"):
            csw_name_clean = row.get("csw_metadata_name_clean")
            wfs_url = row.get("wfs_getcapabilities_url")
            matching_feature_id = find_matching_coverage_id(csw_name_clean, wfs_url)
            feature_id.append(matching_feature_id)
            clean_feature_ids.append(csw_name_clean if matching_feature_id else None)

        df["feature_id"] = feature_id
        df["clean_feature_ids"] = clean_feature_ids

        return df 
    
    # Step 1:
    df["csw_metadata_name"] = get_csw_metadata_name(df, extract_wfs_layers_from_metadata_or_url)
    df["csw_metadata_name"] = df["csw_metadata_name"].apply(lambda x: x[0] if isinstance(x, list) and x else None)
    
    # Step 2
    df["csw_metadata_name_clean"] = df["csw_metadata_name"].apply(clean_csw_metadata_name)

    # Step 3a
    df = get_wfs_feature_table_rowwise(df)
   
    # Step 4
    df["layer"] = df.apply(
        lambda row: row["feature_id"]
        if pd.notna(row["feature_id"])
        else row["csw_metadata_name"],
        axis=1
    )
    
    # Step 5
    df["feature_id"] = df.apply(
    lambda row: row["csw_metadata_name"][0]
    if pd.isna(row["feature_id"]) and isinstance(row["csw_metadata_name"], list) and row["csw_metadata_name"]
    else row["feature_id"],
    axis=1
)

    return df

**Executing function:**

In [11]:
df_wfs = mining_featnames(df_wfs, extract_wfs_layers_from_metadata_or_url) 

Extracting CSW metadata feature name:   0%|          | 0/4480 [00:00<?, ?it/s]

⚠️ Error fetching WFS GetCapabilities from https://nedglobe.cadac.com/services/koggenland/geoserver/wfs: 400 Client Error: Bad Request for url: https://nedglobe.cadac.com/services/koggenland/geoserver/wfs
⚠️ Error fetching WFS GetCapabilities from https://www.wibon-inspire.nl/geoserver/Brondata/wfs?: HTTPSConnectionPool(host='www.wibon-inspire.nl', port=443): Read timed out. (read timeout=20)
⚠️ Error fetching WFS GetCapabilities from https://www.wion-inspire.nl/geoserver/wfs?: 404 Client Error: Not Found for url: https://www.wion-inspire.nl/geoserver/wfs
⚠️ Error fetching WFS GetCapabilities from https://opendata.hunzeenaas.nl/geoserver/wfs: 400 Client Error: 400 for url: https://opendata.hunzeenaas.nl/geoserver/wfs
⚠️ Error fetching WFS GetCapabilities from https://opendata.hunzeenaas.nl/geoserver/wfs: 400 Client Error: 400 for url: https://opendata.hunzeenaas.nl/geoserver/wfs
⚠️ Error fetching WFS GetCapabilities from https://services.geodataoverijssel.nl/geoserver/B22_wegen/wfs: 40

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["csw_metadata_name"] = get_csw_metadata_name(df, extract_wfs_layers_from_metadata_or_url)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["csw_metadata_name"] = df["csw_metadata_name"].apply(lambda x: x[0] if isinstance(x, list) and x else None)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  d

Matching WFS FeatureTypes:   0%|          | 0/4480 [00:00<?, ?it/s]

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_b/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_b/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsvisie/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsvisie/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://ka

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geodata.nationaalgeoregister.nl/omgevingswarmte/wfs?request=GetCapabilities: HTTPSConnectionPool(host='geodata.nationaalgeoregister.nl', port=443): Max retries exceeded with url: /omgevingswarmte/wfs?request=GetCapabilities (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000028E350FD510>: Failed to resolve 'geodata.nationaalgeoregister.nl' ([Errno 11001] getaddrinfo failed)"))
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_a/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.ge

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities
❌ Error fetching WFS capabilities from https://atlas.brabant.nl/arcgis/services/Atlas_Leefomgeving/MapServer/WFSServer?service=WFS&request=GetCapabilities: 400 Client Error: Bad Request for url: https://atlas.brabant.nl/arcgis/services/Atlas_Leefomgeving/MapServer/WFSServer?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsp

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Milieu/ToezichtHandhaving/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Milieu/ToezichtHandhaving/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://ge

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Natuur/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Natuur/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_a/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_a/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_a/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_a/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_verordening/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Cultuurhistorie/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Cultuurhistorie/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_a/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_a/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Cultuurhistorie/MapServer/WFSServer?request=GetCapabilitie

❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities

❌ Error fetching WFS capabilities from https://deltaresdata.openearth.nl/geoserver/DANK/wfs?service=WFS&request=GetCapabilities: 404 Client Error:  for url: https://deltaresdata.openearth.eu/geoserver/DANK/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Natuur/MapServer/WFSServer?request=GetCapabilities: 400 C

❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Natuur/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Natuur/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Cultuurhistorie/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Cultuurhistorie/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Landschap/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Landschap/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/EcologieFloraFauna/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/EcologieFloraFauna/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Milieu/Bodem/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Milieu/Bodem/MapServer/WFSServer?request=GetCapabilities
❌ E

❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://

❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Milieu/EnergieKlimaatLucht/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Milieu/EnergieKlimaatLucht/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://

❌ Error fetching WFS capabilities from https://arcgisp.enschede.nl/ArcGIS/services/BodemEnOndergrond/BKK2008_Functieklassen/MapServer/WFSServer?request=GetCapabilities&service=WFS: HTTPSConnectionPool(host='arcgisp.enschede.nl', port=443): Max retries exceeded with url: /ArcGIS/services/BodemEnOndergrond/BKK2008_Functieklassen/MapServer/WFSServer?request=GetCapabilities&service=WFS (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000028E34ECDE90>: Failed to resolve 'arcgisp.enschede.nl' ([Errno 11001] getaddrinfo failed)"))
❌ Error fetching WFS capabilities from https://kaartportaal.drenthe.nl/server/services/GDB_actueel/GBI_AOV18_DEELGEB_CHK_V/MapServer/WFSServer?service=WFS&request=GetCapabilities: 499 Client Error:  for url: https://kaartportaal.drenthe.nl/server/services/GDB_actueel/GBI_AOV18_DEELGEB_CHK_V/MapServer/WFSServer?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/se

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Milieu/Bodem/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Milieu/Bodem/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Natuur/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Natuur/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_c/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_c/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGeb

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Cultuurhistorie/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/LandelijkGebied/Cultuurhistorie/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geodata.nationaalgeoregister.nl/omgevingswarmte/wfs?service=WFS&request=GetCapabilities: HTTPSConnectionPool(host='geodata.nationaalgeoregister.nl', port=443): Max retries exceeded with url: /omgevingswarmte/wfs?service=WFS&request=GetCapabilities (Ca

❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_d/wfs?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geodata.nationaalgeoregister.nl/ienw/geluidskaartspoorwegennacht/v1/wfs?request=GetCapabilities&service=WFS: HTTPSConnectionPool(host='geodata.nationaalgeoregister.nl', port=443): Max retries exceeded with url: /ienw/geluidskaartspoorwegennacht/v1/wfs?request=GetCapabilities&service=WFS (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://kaartportaal.drenthe.nl/server/services/GDB_actueel/GBI_IKN_NNN_2022_V/MapServer/WFSServer?SERVICE=WFS&REQUEST=GetCapabilities: 499 Client Error:  for url: https://kaartportaal.drenthe.nl/server/services/GDB_actueel/GBI_IKN_NNN_2022_V/MapServer/WFSServer?SERVICE=WFS&REQUEST=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsvisie/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsvisie/MapServe

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabili

❌ Error fetching WFS capabilities from https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities: 400 Client Error: Bad Request for url: https://geoservices.provinciegroningen.nl/server/services/Beleidsplannen/Omgevingsverordening/MapServer/WFSServer?request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_visie/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://kaartportaal.drenthe.nl/server/services/GDB_actueel/GBI_POV18_VERWACHTING_L/

❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_bow/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_b/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_b/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://geoserver.gelderland.nl/geoserver/ngr_verordening/wfs?service=WFS&request=GetCapabilities: 403 Client Error: Forbidden for url: https://geoserver.gelderland.nl/geoserver/ngr_verordening/wfs?service=WFS&request=GetCapabilities
❌ Error fetching WFS capabilities from https://schagen.nedgraphicscs.nl:443/geoserver/wfs?service=WFS&request=GetCapabilities: HTTPSConnectionPool(host='schagen.nedgraphicscs.nl', port=443): Max retries exceeded with 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["feature_id"] = feature_id
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["clean_feature_ids"] = clean_feature_ids
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["layer"] = df.apply(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] 

### 2.4. Get the 'geometry type' and 'number of features':

**Functions**

In [12]:
def get_base_wfs_url(full_url):
    parsed = urlparse(full_url)
    # Remove query parameters
    return urlunparse((parsed.scheme, parsed.netloc, parsed.path, '', '', ''))

# Function to capture the 'geometry field name'

def get_geometry_field_name_from_xsd(wfs_base_url, feature_type_name, version='1.1.0'):
    import requests
    from urllib.parse import urlencode
    from xml.etree import ElementTree as ET
    
    params = {
        'service': 'WFS',
        'version': version,
        'request': 'DescribeFeatureType',
        'typeName': feature_type_name
    }
    url = f"{wfs_base_url}?{urlencode(params)}"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
        
    try:
        r = requests.get(url, timeout=30, headers=headers)
        r.raise_for_status()
        tree = ET.fromstring(r.content)
        
        # List of preferred geometry field names (case-insensitive match)
        preferred_geom_names = ['geom', 'geometry', 'shape', 'Geom', 'GEOM', 'GEOMETRY', 'SHAPE', 'Shape']
        
        # Collect matching candidates
        candidates = []
        
        for elem in tree.iter():
            t = elem.attrib.get('type', '')
            if t.startswith('gml:') and t.endswith('PropertyType'):
                candidates.append(elem.attrib['name'])
        
        # Prioritize preferred names if any match
        for preferred_name in preferred_geom_names:
            for candidate in candidates:
                if candidate.lower() == preferred_name.lower():
                    return candidate
        
        # Fallback: return first found, if any
        if candidates:
            return candidates[0]
        
        return None
    
    except Exception as e:
        print(f"Error in DescribeFeatureType for layer {feature_type_name}: {e}")
        return None

# Function to get the 'geometry type'

def get_wfs_geometry_type(wfs_base_url, feature_type_name, version='1.1.0'):
    import requests
    from urllib.parse import urlencode
    from xml.etree import ElementTree as ET
    
    # Request DescribeFeatureType
    params = {
        'service': 'WFS',
        'version': version,
        'request': 'DescribeFeatureType',
        'typeName': feature_type_name
    }
    url = f"{wfs_base_url}?{urlencode(params)}"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    
    try:
        r = requests.get(url, timeout=30, headers=headers)
        r.raise_for_status()
        tree = ET.fromstring(r.content)
        
        # Look for geometry field
        for elem in tree.iter():
            t = elem.attrib.get('type', '')
            if t.startswith('gml:') and t.endswith('PropertyType'):
                geometry_field_name = elem.attrib['name']
                
                # Now map type to geometry type
                if t == 'gml:GeometryPropertyType':
                    return 'Geometry'
                elif t.startswith('gml:') and t.endswith('PropertyType'):
                    # Extract e.g. 'Polygon' from 'gml:PolygonPropertyType'
                    geom_type = t[len('gml:') : -len('PropertyType')]
                    return geom_type
                else:
                    return 'unknown_geometry_type'
        
        print(f"No geometry field found in DescribeFeatureType for layer {feature_type_name}.")
        return 'no_geometry_field'
    
    except Exception as e:
        print(f"Error in DescribeFeatureType for layer {feature_type_name}: {e}")
        return 'describe_feature_type_error'

    
# Function to get the features count:
    
def get_wfs_feature_count(wfs_base_url, feature_type_name, version='2.0.0', verbose=False):
    """
    Robust total feature count for a WFS layer.

    Works on GeoServer, ArcGIS WFSServer, PDOK WFS, both 1.1.0 and 2.0.0.

    Args:
        wfs_base_url (str): WFS base URL.
        feature_type_name (str): Layer name (typeName).
        version (str): WFS version: '1.1.0' or '2.0.0' (default '2.0.0' → best for modern servers like PDOK).
        verbose (bool): If True, prints debug info.

    Returns:
        int or None: total number of features, or None if not found or error.
    """
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }

    def do_request(version_to_try):
        params = {
            'service': 'WFS',
            'version': version_to_try,
            'request': 'GetFeature',
            'typeNames' if version_to_try == '2.0.0' else 'typeName': feature_type_name,
            'resultType': 'hits'
        }

        url = f"{wfs_base_url}?{urlencode(params)}"
        if verbose:
            print(f"Requesting: {url}")

        try:
            r = requests.get(url, timeout=60, headers=headers)
            r.raise_for_status()
            tree = ET.fromstring(r.content)

            if verbose:
                print(f"Root element: {tree.tag}")
                print(f"Attributes of root element: {tree.attrib}")
                print("----")
                print(ET.tostring(tree, encoding='unicode')[:1000])

            if version_to_try == '2.0.0':
                number_matched = tree.attrib.get('numberMatched')
                if number_matched is not None:
                    if verbose:
                        print(f"Total numberMatched = {number_matched}")
                    return int(number_matched)
                else:
                    print("Could not find numberMatched attribute in response.")
                    return None

            elif version_to_try == '1.1.0':
                number_of_features = tree.attrib.get('numberOfFeatures')
                if number_of_features is not None:
                    if verbose:
                        print(f"Total numberOfFeatures = {number_of_features}")
                    return int(number_of_features)
                else:
                    if verbose:
                        print("Could not find numberOfFeatures attribute in response.")
                    return None

            else:
                if verbose:
                    print(f"Unsupported WFS version: {version_to_try}")
                return None

        except Exception as e:
            if verbose:
                print(f"Error querying WFS for feature count (version {version_to_try}): {e}")
            return None

    # First try requested version
    count = do_request(version)

    # If no result and first version was 2.0.0, fallback to 1.1.0
    if count is None and version == '2.0.0':
        if verbose:
            print("Falling back to WFS version 1.1.0...")
        count = do_request('1.1.0')

    return count


**Executing functions**

In [14]:
df_wfs

Unnamed: 0,identifier,resource_type,md_standard,ogc_web_services,md_date,language,crs_epsg_codes,title,keywords,abstract,...,bounding_box,license,access_rights,wcs_getcapabilities_url,wfs_getcapabilities_url,csw_metadata_name,csw_metadata_name_clean,feature_id,clean_feature_ids,layer
2,fdbb1ab4-57cb-4393-bea7-5cbf260c24d6,service,ISO 19115,WFS,2024-08-08,dut,,CBS Gebiedsindelingen 2010 WFS,Statistische eenheden; Gebiedsindeling; Gemeen...,Deze service bevat de CBS Gebiedsindelingen va...,...,"('1.086', '50.5622', '8.4677', '55.8094')","Naamsvermelding verplicht, organisatienaam (ht...",otherRestrictions,,https://service.pdok.nl/cbs/gebiedsindelingen/...,gebiedsindelingen:arrondissementsgebied_gegene...,arrondissementsgebied_gegeneraliseerd,gebiedsindelingen:arrondissementsgebied_gegene...,arrondissementsgebied_gegeneraliseerd,gebiedsindelingen:arrondissementsgebied_gegene...
3,c922dbee-ed75-47d2-93d1-92ad8eeba29c,dataset,ISO 19115,WMS; WFS,2021-08-23,dut,28992,Houtsingelgebied Westerwolde,LANDSCHAPSBEHEER; LANDSCHAPSBESCHERMING; CULTU...,Dit bestand bevat de houtsingelgebieden die zi...,...,"('6.988', '52.852', '7.164', '53.106')",geen beperkingen,otherRestrictions,,https://geoservices.provinciegroningen.nl/serv...,Landschap:HoutsingelgebiedWesterwolde,HoutsingelgebiedWesterwolde,Landschap:HoutsingelgebiedWesterwolde,HoutsingelgebiedWesterwolde,Landschap:HoutsingelgebiedWesterwolde
6,6066d10b-d573-4aee-9ebc-5ace76b31a6f,dataset,ISO 19115,WMS; WFS,2023-12-28,dut,,Testveld onderzoeksturbines (Omgevingsverorden...,WINDENERGIE; PROVINCIALE VERORDENINGEN; RUIMTE...,Dit bestand bevat het testveld onderzoeksturbi...,...,"('6.70685', '53.455908', '6.747081', '53.466439')",Geen beperkingen (https://creativecommons.org/...,otherRestrictions,,https://geoservices.provinciegroningen.nl/serv...,Beleidsplannen_Omgevingsverordening:TestveldOn...,TestveldOnderzoeksturbines,,,Beleidsplannen_Omgevingsverordening:TestveldOn...
8,caf3d5a5-e7e0-44bc-a4e8-d954884e8696,dataset,ISO 19115,WMS; WFS,2019-01-29,dut,28992,Grootschalig open landschap (Omgevingsvisie 20...,STRUCTUURPLANNEN; LANDSCHAPSBESCHERMING; NATUU...,Dit bestand bevat gebieden behorend tot het gr...,...,"('6.276', '53.014', '7.232', '53.471')",geen beperkingen,otherRestrictions,,https://geoservices.provinciegroningen.nl/serv...,Beleidsplannen_Omgevingsvisie:GrootschaligOpen...,GrootschaligOpenLandschap,Beleidsplannen_Omgevingsvisie:GrootschaligOpen...,GrootschaligOpenLandschap,Beleidsplannen_Omgevingsvisie:GrootschaligOpen...
9,8665fd45-d7d3-4341-a177-1788be9ec571,service,ISO 19115,WFS,2025-01-21,dut,4258,Hydrografie: Netwerk (INSPIRE geharmoniseerd) WFS,Hydrografie; Nationaal; Netwerkschematisatie; ...,Naar INSPIRE thema Hydrography - Network gehar...,...,"('-5.9061', '49.1673', '13.6282', '57.0701')",Geen beperkingen (https://creativecommons.org/...,otherRestrictions,,https://service.pdok.nl/rws/hydrografie/netwer...,hydrografie:hydro_node,hydro_node,hydrografie:hydro_node,hydro_node,hydrografie:hydro_node
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9344,8623bf6e-4e83-485a-9945-c7bd974f95b4,dataset,ISO 19115,WMS; WFS,2019-09-06,dut,28992,Deelgebieden Provincie Fryslân,Bodem; Cultuurhistorie; Landschap; Streekplan,Ruimtelijke begrenzing van homogene gebieden m...,...,"('4.846', '52.761', '6.441', '53.52')",geen beperkingen,otherRestrictions,,https://geoportaal.fryslan.nl/arcgis/services/...,Landschapstypen_-_deelgebieden,Landschapstypen_-_deelgebieden,PGR:Landschapstypen_-_deelgebieden,Landschapstypen_-_deelgebieden,PGR:Landschapstypen_-_deelgebieden
9345,{2FA2ACE7-F99F-44E4-9B59-234A50A5235C},dataset,ISO 19115,WMS; WFS,2024-12-10,dut,28992; 5709,Mijnenvelden,munitie,Kaart met locaties waar op basis van historisc...,...,"('3.427', '51.44', '3.807', '51.744')",Geen beperkingen (http://creativecommons.org/p...,otherRestrictions,,https://opengeodata.zeeland.nl/geoserver/bodem...,geonam_xplwo2mvlvlk,geonam_xplwo2mvlvlk,bodem:geonam_xplwo2mvlvlk,geonam_xplwo2mvlvlk,bodem:geonam_xplwo2mvlvlk
9348,94ba9207-c73d-4e3e-ba26-4262a9ef2e42,dataset,ISO 19115,WFS; WMS,2024-09-24,dut,,Sloepenroutenetwerk - Routes,VAARRECREATIE; Informatief,Routes van het sloepenroutenetwerk in west Utr...,...,"('4.671', '52.003', '5.168', '52.345')",Open data (publiek)|https://creativecommons.or...,otherRestrictions,,https://services.geodata-utrecht.nl/geoserver/...,Sloepenroutenetwerk_Routes,Sloepenroutenetwerk_Routes,s01_4_toerisme_recreatie:Sloepenroutenetwerk_R...,Sloepenroutenetwerk_Routes,s01_4_toerisme_recreatie:Sloepenroutenetwerk_R...
9351,eac77cd5-31b5-4160-b3cc-b7d677c3d3b3,dataset,ISO 19115,WMS; WFS,2025-04-29,dut,28992,Bodemkaart 2021,Bodemkaart; Bodem; Kaart; veen; bodemkaarten; ...,De bodemkundige informatie op de bodemkaart he...,...,"('3.729447', '51.627352', '5.039446', '52.3463...",Geen beperkingen (http://creativecommons.org/p...,otherRestrictions,,https://geodata.zuid-holland.nl/geoserver/bode...,BODEMKAART_2021,BODEMKAART_2021,bodem:BODEMKAART_2021,BODEMKAART_2021,bodem:BODEMKAART_2021


**Removing feature duplicates**

In [33]:
df_wfs = df_wfs.drop_duplicates(subset="layer", keep="first")

**Checkpoint 02b:**

In [32]:
# Saving
df_wfs.to_csv("checkpoint02_ngr_WFS_metadata.csv", index=False)

---

---