## Access to the World Database on Protected Areas (WDPA) historical data and harmonization

This block is dedicated to refining initial land-use/land-cover (LULC) data with additional data on protected areas (PA) from [the World Database on Protected Areas (WDPA)](https://www.protectedplanet.net/en/thematic-areas/wdpa).
As soon as protected areas may significantly increase the suitability of landscapes and reduce landscape "impedance" for species migration, landscapes intersected with PAs should be considered as different from those with no protected status. This workflow describes the process of updating LULC data needed to compute functional landscape connectivity. It provides two main outputs:
- LULC data enriched with protected areas (recorded as updated LULC value) for wide usage.
- For habitat connectivity calculations, impedance and affinity values for calculations in specific  software (Miramon and Graphab).

Current limitations:
- WDPA API is accessed through personal credentials, while granting access to the API is not automatic and reviewed by the Protected Planet team.
- WDPA API does not support getting data by bounding box, only by unique IDs of protected areas and countries.
- Temporary server outage has been experienced with WDPA API (returning 'status code 500').
- If a protected area is deestablished ('degazetted'), it is removed from the database and its ID cannot be reused (for further details, see the [manual on WDPA API](https://wdpa.s3-eu-west-1.amazonaws.com/WDPA_Manual/English/WDPA_WDOECM_Manual_1_6.pdf)). If it is the case, all historical transformations of these protected areas will be not accessible to request.
- [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) is used as an ancillary tool to perform reverse geocoding and find countries intersecting with the input raster dataset to query for data through WDPA API. At the same time, boundaries of countries include the exclusive economic zones in seas and can cover not only terrestrial protected areas.
- [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) does not fetch countries if bounding box of input raster dataset is within the spatial feature (country), but does not intersect with her borderline.

#### 1. Extracting data through WDPA API

Spatial data on protected areas in GeoJSON and GeoPackage formats for countries needed (on our case, Spain, France and Andorra) are obtained through WDPA API using a personal access token and [official docimentation](https://api.protectedplanet.net/documentation). Most meaningful attributes have been chosen (IDs, designation status, IUCN category, year of establishment etc.)

Let's import libraries neeeded:

In [None]:
import requests
from shapely.geometry import shape
import json
import subprocess
import os
import sys
from datetime import datetime
from itertools import product

#local import
from utils import load_yaml

if os.getcwd().endswith("1_protected_areas") == False:
    # NOTE working from docker container
    os.chdir('./1_protected_areas')

# define own modules from the root directory (at level above)
# define current directory
current_dir = os.getcwd()
# define parent directory (level above)
parent_dir = os.path.abspath(os.path.join(current_dir, '..'))
# add the parent directory to sys.path
sys.path.append(parent_dir)

import timing
import warnings

Input variables are stored in the configuration file (eg input raster dataset, timestamp). Let's read them:

In [2]:
from reprojection import RasterTransform

##### 1.1. Reverse geocoding
To run WDPA API it is requred to list countries for query on protected areas. Currently this is implemented through ohsome API fetching codes of countries (according to ISO3 standard).
Other ways attempted:
- [Nominatim API](https://nominatim.org/release-docs/latest/api/Overview/) is unstable when quering with multiple filters to fetch the borderlines from the Open Street Map portal (does not bring features needed).
- [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) fetches features only if they intersect with the bounding box, but does not supply with countries if the bounding box is located within one country and does not intersect its boundaries.
- [geopandas built-in dataset from the Natural Earth](https://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-admin-0-countries-2/), but the dataset with the boundaries of countries is not curently available there.

In [None]:
class WDPA_PreProcessor():

    def __init__(self, config_path:str) -> None:
        self.config = load_yaml(config_path)

        # read year 
        self.years = self.config.get('year', None)
        if self.years is None:
            warnings.warn("Year variable is null or not found in the configuration file.")
            self.years = []
        elif isinstance(self.years, int):
            self.years = [self.years]
        else:
            # cast to list
            self.years = [int(year) for year in self.years]

        #read lulc
        self.lulc_templates = self.config.get('lulc', None)
        if self.lulc_templates is None:
            raise ValueError("LULC variable is null or not found in the configuration file.")
        elif isinstance(self.lulc_templates, str):
            self.lulc_templates = [self.lulc_templates]
        else:
            # cast to list
            self.lulc_templates = [lulc for lulc in self.lulc_templates]

        # read lulc_dir
        self.lulc_dir = self.config.get('lulc_dir', None)
        if self.lulc_dir is None:
            raise ValueError("LULC directory is null or not found in the configuration file.")
        
        # get all existing files
        self.lulc_s = self.get_all_existing_files(self.lulc_templates, self.years)

    def get_all_existing_files(self, lulc_templates: list, years: list) -> list[str]:
        """
        Get all existing files based on the list of years and the LULC templates

        Args:
            lulc_templates (list): list of LULC templates (e.g. ['lulc_{year}.tif', 'lulc_{year}_v2.tif'])
            years (list): list of years (e.g. [2015, 2016, 2017])
        Returns:
            list: list of existing files to process (e.g. ['lulc_2015.tif', 'lulc_2016.tif'])
        """

        # generate all possible filenames based on the list of years
        lulc_s = []
        # use itertools,product to create combination of lulc filename and year
        for lulc_template, year in product(lulc_templates, years): 
            try:
                # Substitute year in the template
                lulc_file = lulc_template.format(year=year)
                # Construct the full path to the input raster dataset
                lulc_path = os.path.join(current_dir, '..', self.lulc_dir, lulc_file)
                # Normalize the path to ensure it is correctly formatted
                lulc_path = os.path.normpath(lulc_path)
                lulc_s.append(lulc_path)
            except KeyError as e:
                raise ValueError(f"Placeholder {e.args[0]} not found in 'lulc_template'") from e
            
        # Check if files exist and collect existing files
        existing_lulc_s = []
        for lulc_templates in lulc_s:
            if os.path.exists(lulc_templates):
                print(f"Input raster to be used for processing is {lulc_templates}")
                existing_lulc_s.append(lulc_templates)
            else:
                print(f"File does not exist: {lulc_templates}")

        # list all existing filenames to process
        print("\nList of available input raster datasets to process:")
        for lulc_templates in existing_lulc_s:
            print(f"Processing file: {lulc_templates}")

        # update lulc_s with files that exist
        return existing_lulc_s
        
      #NOTE Ohsome API is using openstreetmap data, which may not be the best source to fetch country codes from bounding box with. The GAUL dataset provided by FAO (UN) is a better source for this.
    def get_country_code_from_bbox(self, bbox:str, save_geojson:bool=True) -> set:
        """
        This function sends a request to the ohsome API to get the country code from a given bounding box

        Args:
            bbox (str): bounding box in the format 'x_min,y_min,x_max,y_max'

        Returns:
            set: set of unique country codes
        """
        url = 'https://api.ohsome.org/v1/elements/geometry'
        data = {"bboxes": {bbox}, "filter": "boundary=administrative and admin_level=2", "properties": 'tags'}
        response = requests.post(url, data=data)

        # check if the request was successful
        if response.status_code == 200:
            response_json = response.json()
            print("Request was successful")
            # extract unique country names, filtering out None values
            # create set to handle only unique names
            unique_country_names = {
                feature['properties'].get('ISO3166-1:alpha3') 
                for feature in response_json.get('features', []) # filter out none values
                if feature['properties'].get('ISO3166-1:alpha3')
            }
    
            # print unique country names
            print(f"Countries covered by the bounding box are (ISO-3 codes): \n{'\n'.join(unique_country_names)}")
            print("-" * 40)

            # save JSON response to GeoJSON
            if save_geojson:
                with open('countries.geojson', 'w') as f:
                    json.dump(response_json, f, indent=4)
        else:
            print(f"Error: {response.status_code}")
            print("-" * 40)

        return unique_country_names
        
    def fetch_lulc_country_codes(self, save_geojson:bool=True) -> dict[set]:
        """
        Fetch the country codes for the LULC rasters

        Args:
            save_geojson (bool): save the geojson file

        Returns:
            dict: dictionary containing the country codes for each LULC raster
        """
        lulc_country_codes = {}
        for lulc in self.lulc_s:
            x_min, y_min, x_max, y_max = RasterTransform(lulc).bbox_to_WGS84()
            bbox = f"{x_min},{y_min},{x_max},{y_max}"
            lulc_country_codes[lulc] = self.get_country_code_from_bbox(bbox, save_geojson)
        return lulc_country_codes

In [4]:
wdpa_preprocessor = WDPA_PreProcessor(os.path.join(parent_dir, 'config.yaml'))
config = wdpa_preprocessor.config
lulc_country_codes = wdpa_preprocessor.fetch_lulc_country_codes()
# lulc_country_codes["test"] = {"KEN", "UG"}
#get all the values of the dictionary as a set of unique country codes
unique_country_names = set().union(*lulc_country_codes.values())

Input raster to be used for processing is /data/data/input/lulc/lulc_ukceh_25m_2018.tif

List of available input raster datasets to process:
Processing file: /data/data/input/lulc/lulc_ukceh_25m_2018.tif
Input raster dataset /data/data/input/lulc/lulc_ukceh_25m_2018.tif was opened successfully.
Coordinate reference system of the input raster dataset is EPSG:27700




Spatial resolution (pixel size) is 25.0 meters
Before reprojection:
x_min: 347225.0
x_max: 452300.0
y_min: 343800.0
y_max: 540325.0
After reprojection:
x_min: -2.7876218653524014
x_max: -1.1888887126830572
y_min: 52.98892120067396
y_max: 54.75515692785134
Bounding box: -2.7876218653524014,52.98892120067396,-1.1888887126830572,54.75515692785134
Request was successful
Countries covered by the bounding box are (ISO-3 codes): 
GBR
----------------------------------------


##### 1.2. Looping over countries

In [8]:
class PA_Processor:
    """
    This protected area (PA) processor class is used to convert the json responses from the protected planet API to a single GeoJSON file per country.
    """
    def __init__(self, country:str) -> None:
        """
        Initialize the PA_Processor class

        Args:
            country (str): The country name.
        """
        self.country = country
        self.feature_collection = {
            "type": "FeatureCollection",
            "features": []
        }

    def add_PA_to_feature_collection(self, protected_areas:list[dict], exclude_redundant_ids:bool=True) -> dict:
        """
        Adds protected areas from the API response to the feature collection of the class.

        Args:
            protected_areas (list): A list of protected areas dictionaries.

        Returns:
            feature_collection: The feature collection with protected areas.
        """
        # loop over protected areas        
        for pa in protected_areas:

            # convert date string to datetime object
            date_str = pa['legal_status_updated_at']

            # filter out protected areas if no date of establishment year is recorded
            if date_str is None:
                continue
            # format to YYYY-MM-DD
            else:
                try:
                    date = datetime.strptime(date_str, '%Y-%m-%d')
                except ValueError:
                    # handle cases where the date is in a different format
                    try:
                        date = datetime.strptime(date_str, '%d/%m/%Y')
                    except ValueError:
                        # handle cases where the date is in a different format
                        date = datetime.strptime(date_str, '%m/%d/%Y')
                    
                # format to YYYY-MM-DD
                date_str = date.strftime('%Y-%m-%d')
              
            # extract geometry
            geometry = pa['geojson']['geometry']
            pa.get('geojson', {}).get('geometry')

            # debugging, print the geometry data
            if geometry is None:
                print(f"Warning: No geometry found for protected area {pa.get('name')} with ID {pa.get('id')}")
            else:
                print(f"Geometry found for protected area {pa.get('name')} with ID {pa.get('id')}")    

            if exclude_redundant_ids:
                pa['designation'].pop('id', None)
                pa['designation']['jurisdiction'] = pa['designation']['jurisdiction']["name"]
                pa['iucn_category'] = pa['iucn_category']['name']
                pa['legal_status'] = pa['legal_status']['name']
               

            # create feature with geometry and properties
            feature = {
                "type": "Feature",
                "geometry": geometry,
                "properties": {
                    "id": pa['id'],
                    "name": pa['name'],
                    "original_name": pa['name'],
                    "wdpa_id": pa['id'],
                    "management_plan": pa['management_plan'],
                    "is_green_list": pa['is_green_list'],
                    "iucn_category": pa['iucn_category'],
                    "designation": pa['designation'],
                    "legal_status": pa['legal_status'],
                    "year": date_str,
                }
            }
            # append the feature to the feature collection
            self.feature_collection["features"].append(feature) 

        return self.feature_collection

    def save_to_file(self, file_path:str) -> str:
        """
        Saves a country feature collection to a single GeoJSON file.

        Args:
            file_path (str): The path to the file.

        Returns:
            geojson_filepath (str): The path to the saved GeoJSON file.
        """
        # define filename for GeoJSON file
        geojson_filepath = os.path.join(file_path, f"{self.country}_protected_areas.geojson")
        # convert GeoJSON data to a string
        geojson_string = json.dumps(self.feature_collection, indent=4) 
        # write GeoJSON string to a file
        with open(geojson_filepath, 'w') as f:
            f.write(geojson_string)
        
        return geojson_filepath
        
        

In [9]:
class PA_Processor_Wrapper:
    """
    This class retrieves and processes protected areas for multiple countries and utilizes the PA processor class to merge them into individual GeoJSON files for each country.
    """

    def __init__(self, countries:list[str], api_url:str, token:str, marine:str, output_dir:str) -> None:
        """
        Initialize the PA_Processor_Wrapper class.

        Args:
            countries (list): A list of country codes.
            api_url (str): The API endpoint URL.
            token (str): The API token.
            marine (str): The marine area boolean value.
            output_dir (str): The path to the directory where the GeoJSON files will be saved.
        """
        self.api_url = api_url
        self.token = token
        self.marine = marine
        self.countries = countries
        self.output_dir = output_dir
        self.processors = {country: PA_Processor(country) for country in countries}

    def process_all_countries(self) -> None:
        """
        Fetches all PAs for each country and processes them into a single GeoJSON file.
        """

        for country in self.countries:
            all_protected_area_geojson = []
            page = 0
            url = self.api_url.format(country=country, token=self.token, marine=self.marine)
            while True:
                url += f"&page={page}"
                response = requests.get(url)
                if response.status_code != 200:
                    print(f"Error: {response.status_code}")
                    break
                data = response.json()
                protected_areas = data["protected_areas"]
                if len(protected_areas) == 0:
                    break
                else:
                    all_protected_area_geojson.append(data)
                    page += 1

            # combine all the protected areas into a single feature collection / GeoJSON
            for data in all_protected_area_geojson:
                self.processors[country].add_PA_to_feature_collection(data["protected_areas"]) 

    def save_all_country_geoJSON(self) -> list[str]:
        """
        Saves all country GeoJSON files to the export directory.

        Returns:
            geojson_filepaths (list): A list of file paths to the saved GeoJSON files.
        """
        
        geojson_filepaths = []
        for country in self.countries:
            geojson_filepaths.append(self.processors[country].save_to_file(self.output_dir))
        return geojson_filepaths
    

    def export_all_to_geopackage(self, geojson_filepaths:list[str], output_file:str = "merged_protected_areas.gpkg") -> str:
        """
        Merges all GeoJSON files into a single GeoPackage file with different layers for each country.

        Args:
            geojson_filepaths (list): A list of GeoJSON file paths.
            output_file (str): The name of the output GeoPackage file.
        
        Returns:
            str: The path to the merged GeoPackage file.
        """
        # define the output merged GeoPackage file
        gpkg = os.path.join(self.output_dir, output_file)
        # remove GeoPackage if it already exists
        if os.path.exists(gpkg):
            os.remove(gpkg)

       # loop through the GeoJSON files and convert them to a geopackage
        for geojson_file in geojson_filepaths:
            # writes layer name as the first name from geojson files
            layer_name = os.path.splitext(os.path.basename(geojson_file))[0]
            # use ogr2ogr to convert GeoJSON to GeoPackage
            subprocess.run([
                "ogr2ogr", "-f", "GPKG", "-append", "-nln", layer_name, gpkg, geojson_file
            ]) 

        return gpkg

In [10]:
# NOTE FOR TESTING ONLY (delete comments in the final version) 
# countries = {'AND'}
# api_url = "https://api.protectedplanet.net/v3/protected_areas/search?token={token}&country={country}&marine={marine}&with_geometry=true&per_page=50"

# getting variables from the configuration file
marine = config.get('marine') # fetch boolean value (false or true)

# define the API endpoint - include filter by country, avoid marine areas, maximum values of protected areas per page (50)
api_url = "https://api.protectedplanet.net/v3/protected_areas/search?token={token}&country={country}&marine={marine}&with_geometry=true&per_page=50"
# define token - replace by own
token = "968cef6f0c37b925225fb60ac8deaca6" 
# define country codes from the previous block
countries = unique_country_names

# directory to save GeoJSON files
response_dir = "response"
os.makedirs(response_dir, exist_ok=True)
# list to store the names of the GeoJSON files
geojson_filepaths = []
# TODO - country codes should derive from the extent of buffered LULC data - see section 2. It would be better to unify it, to create a separate function and apply it for all Notebooks

Pa_processor = PA_Processor_Wrapper(countries, api_url, token, marine, response_dir)
Pa_processor.process_all_countries()
geojson_filepaths = Pa_processor.save_all_country_geoJSON()
print(geojson_filepaths)

# 1.3 exporting to geoPackage
output_file = "merged_protected_areas.gpkg"
gpkg = Pa_processor.export_all_to_geopackage(geojson_filepaths, output_file)
print(f"GeoPackage file created: {gpkg}")

Geometry found for protected area Lake District with ID 959
Geometry found for protected area Eryri with ID 960
Geometry found for protected area Yorkshire Dales with ID 961
Geometry found for protected area North York Moors with ID 962
Geometry found for protected area Peak District with ID 963
Geometry found for protected area Bannau Brycheiniog with ID 964
Geometry found for protected area Northumberland with ID 965
Geometry found for protected area Dartmoor with ID 966
Geometry found for protected area Exmoor with ID 967
Geometry found for protected area Arfordir Penfro with ID 968
Geometry found for protected area Cairngorms with ID 1448
Geometry found for protected area Rum with ID 1450
Geometry found for protected area Beinn Eighe with ID 1453
Geometry found for protected area Moor House-Upper Teesdale with ID 1454
Geometry found for protected area Ben Lawers with ID 1460
Geometry found for protected area Loch Leven with ID 1469
Geometry found for protected area Glen Roy with ID



GeoPackage file created: response/merged_protected_areas.gpkg


#### 2. Processing of protected areas

Data downloaded from WDPA as geopackage are processed in 4 steps:
1. Extract extent and spatial resolution of LULC data.
Redefine no data values as 0 for input LULC data.
2. Extract protected areas filtered by LULC timestamp and year of PAs establishment.
3. Rasterize protected areas (there is no way to read geodataframes by gdal_rasterize except from writing files on the disc) based on step 1.
4. Compress protected areas.

In [24]:
import geopandas as gpd
import rasterio
import os
import subprocess
import numpy as np

class Rasterizer_Processor:

    def __init__(self, gpkg_filepath:str, input_dir:str,output_dir:str) -> None:
        self.gdf = gpd.read_file(gpkg_filepath)
        self.input_folder = input_dir
        self.output_dir = output_dir
        # create output directory if it does not exist
        os.makedirs(output_dir, exist_ok=True)

        tiff_files = [f for f in os.listdir(input_dir) if f.endswith('.tif')]

        # choose the first TIFF file (it shouldn't matter which LULC file to extract extent because they must have the same extent)
        if tiff_files:
            file_path = os.path.join(input_dir, tiff_files[0])  
            extent, self.res = self.extract_ext_res(file_path)
            self.min_x, self.max_x, self.min_y, self.max_y = extent.left, extent.right, extent.bottom, extent.top
            print("Extent of LULC files")
            print("Minimum X Coordinate:", self.min_x, 
                "\n Maximum X Coordinate:", self.max_x, 
                "\n Minimum Y Coordinate:", self.min_y, 
                "\n Maximum Y Coordinate:", self.max_y)
            print("Spatial resolution (pixel size):", self.res)
        else:
            raise ValueError("No LULC files found in the input folder.")

        # extract the year from the filename
        self.year_stamps = [int(f.split('_')[1].split('.')[0]) for f in tiff_files]
        print("Considered timestamps of LULC data are:","".join(str(self.year_stamps)))

            
    # define function
    def extract_ext_res(self, file_path:str) -> tuple[any,float]:
        """
        Extracts the extent and resolution of a raster file.

        Args:
            file_path (str): The path to the raster file.

        Returns:
            tuple: The extent and resolution of the raster file.
        """
        with rasterio.open(file_path) as src:
            extent = src.bounds
            res = src.transform[0]  # assuming the res is the same for longitude and latitude
        return extent, res
    

    def filter_pa_by_year(self) -> None:
        # create an empty dictionary to store subsets
        subsets_dict = {}
        # loop through each year_stamp and create subsets
        for year_stamp in self.year_stamps:
            # filter Geodataframe based on the year_stamp
            subset = self.gdf[self.gdf['year'] <= np.datetime64(str(year_stamp))]

            # store subset in the dictionary with year_stamp as key
            subsets_dict[year_stamp] = subset

            # print key-value pairs of subsets 
            print(f"Protected areas are filtered according to year stamps of LULC and PAs' establishment year: {year_stamp}")

            # ADDITIONAL BLOCK IF EXPORT TO GEOPACKAGE IS NEEDED (currently needed as rasterizing vector data is not possible with geodataframes)
            ## save filtered subset to a new GeoPackage
            subset.to_file(os.path.join(self.output_dir,f"pas_{year_stamp}.gpkg"), driver='GPKG')
            print(f"Filtered protected areas are written to:",os.path.join(self.output_dir,f"pas_{year_stamp}.gpkg"))

        print ("---------------------------")

    def rasterize_pas_by_year(self, keep_intermediate_gpkg:bool=False) -> None:
        # list all subsets of protected areas by the year of establishment
        pas_yearstamps = [f for f in os.listdir(self.output_dir) if f.endswith('.gpkg')]
        pas_yearstamp_rasters = [f.replace('.gpkg', '.tif') for f in pas_yearstamps]

        # loop through each input file
        for pas_yearstamp, pas_yearstamp_raster in zip(pas_yearstamps, pas_yearstamp_rasters):
            pas_yearstamp_path = os.path.join(self.output_dir, pas_yearstamp)
            pas_yearstamp_raster_path = os.path.join(self.output_dir, pas_yearstamp_raster)
            # TODO - to make paths more clear and straightforward
            print(f"Rasterizing protected areas for {pas_yearstamp}")
            # rasterize
            pas_rasterize = [
                "gdal_rasterize",
                ##"-l", "pas__merged", if you need to specify the layer
                "-burn", "100", ## assign code starting from "100" to all LULC types
                "-init", "0",
                "-tr", str(self.res), str(self.res), #spatial res from LULC data
                "-a_nodata", "-2147483647", # !DO NOT ASSIGN 0 values with non-data values as it will mask them out in raster calculator
                "-te", str(self.min_x), str(self.min_y), str(self.max_x), str(self.max_y), # minimum x, minimum y, maximum x, maximum y coordinates of LULC raster
                "-ot", "Int32",
                "-of", "GTiff",
                "-co", "COMPRESS=LZW",
                pas_yearstamp_path,
                pas_yearstamp_raster_path
                ]

            # execute rasterize command
            try:
                subprocess.run(pas_rasterize, check=True)
                print("Rasterizing of protected areas has been successfully completed for", pas_yearstamp)
            except subprocess.CalledProcessError as e:
                print(f"Error rasterizing protected areas: {e}")
            finally:
                if not keep_intermediate_gpkg:
                    os.remove(pas_yearstamp_path)
                    print(f"Intermediate GeoPackage {pas_yearstamp} has been removed.")

It is important to extract year stamps.

Then, extent of LULC files (minimum and maximum coordinates) is extracted.

Protected areas should be filtered by year stamp according to the PA's establishment year.

Rasterization function based on yearstamps of protected areas is launched.

In [25]:
#TODO remove this for testing
response_dir = "response"
os.makedirs(response_dir, exist_ok=True)
gpkg = os.path.join(response_dir, "merged_protected_areas.gpkg")


rp = Rasterizer_Processor(gpkg, os.path.join(current_dir,"lulc"),os.path.join(current_dir,"pas_timeseries"))
rp.filter_pa_by_year()
rp.rasterize_pas_by_year()
print("Rasterizing of protected areas has been successfully completed for all years")

Extent of LULC files
Minimum X Coordinate: 230205.0 
 Maximum X Coordinate: 556485.0 
 Minimum Y Coordinate: 4459725.0 
 Maximum Y Coordinate: 4777335.0
Spatial resolution (pixel size): 30.0
Considered timestamps of LULC data are: [1987, 1992, 1997, 2002, 2007, 2012, 2017, 2022]
Protected areas are filtered according to year stamps of LULC and PAs' establishment year: 1987
Filtered protected areas are written to: /data/1_protected_areas/pas_timeseries/pas_1987.gpkg
Protected areas are filtered according to year stamps of LULC and PAs' establishment year: 1992
Filtered protected areas are written to: /data/1_protected_areas/pas_timeseries/pas_1992.gpkg
Protected areas are filtered according to year stamps of LULC and PAs' establishment year: 1997
Filtered protected areas are written to: /data/1_protected_areas/pas_timeseries/pas_1997.gpkg
Protected areas are filtered according to year stamps of LULC and PAs' establishment year: 2002
Filtered protected areas are written to: /data/1_prote

##### 3. Raster calculation

LULC [enriched](/raster_sum_loop.sh) through the raster calculator (currently, external shell script):
1. Rearranging no data values as they must be considered as 0 to run raster calcualtions.
2. To sum initial LULC raster and protected areas (according to the timestamp).
3. Writing the new updated LULC map with the doubled amount of LULC codes for each timestamp (loop based on year matching in filenames).
4. Compression and assignment of null values.

In [4]:
# call raster_sum_loop.sh using wrapped subprocess.run
import os
import sys
import subprocess
from subprocess import PIPE, Popen


def run_shell_command(path_to_script:str) -> None:
    """
    Run a shell script command using subprocess.run

    Args:
        (path_to_script(str): The path to the shell script.
    """
    # run the shell script
    command = f"bash {path_to_script}"

    proc = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)
    stdout, stderr = proc.communicate()
    print("Shell script executing ...")

    if proc.returncode != 0:
        #check if the output has syntax error
        if b"syntax error" in stderr:
            print("Syntax error in the shell script. \n Attempting to convert the shell script to Unix format.")
            # convert the shell script to unix format
            subprocess.run(f"dos2unix {path_to_script}", shell=True, text=True)
            # run the command again
            run_shell_command(path_to_script)
        else:
            # output the error message
            print(stderr.decode('utf-8'))
            raise subprocess.CalledProcessError(proc.returncode, command, output=stdout, stderr=stderr)
        
    else:
        print(stdout.decode('utf-8'))



if os.getcwd().endswith("1_protected_areas") == False:
    # NOTE working from docker container
    os.chdir('./1_protected_areas')
    
# define own modules from the root directory (at level above)
# define current directory
current_dir = os.getcwd()
print(current_dir)
# define parent directory (level above)
parent_dir = os.path.abspath(os.path.join(current_dir, '..'))
# add the parent directory to sys.path
sys.path.append(parent_dir)


# call the shell script
run_shell_command('raster_sum_loop.sh')


/data/1_protected_areas
Shell script executing ...
Input filename: lulc/lulc_1987.tif
Output filename: lulc_temp/lulc_1987_0.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Input filename: lulc/lulc_1992.tif
Output filename: lulc_temp/lulc_1992_0.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Input filename: lulc/lulc_1997.tif
Output filename: lulc_temp/lulc_1997_0.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Input filename: lulc/lulc_2002.tif
Output filename: lulc_temp/lulc_2002_0.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Input filename: lulc/lulc_2007.tif
Output filename: lulc_temp/lulc_2007_0.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Input filename: lulc/lulc_2012.tif
Output filename: lulc_temp/lulc_2012_0.tif
Input file size 

##### 4. Updating landscape impedance
Impedance is reclassified by [CSV table](/reclassification.csv) and compressed (through LZW compression, not Cloud Optimised Geotiff standard to avoid any further issues in processing). Landscape impedance is required by Miramon ICT and Graphab tools both.

Let's import another set of libraries needed.

In [1]:
from osgeo import gdal
gdal.UseExceptions()
import numpy as np
import csv
import os
import subprocess
import pandas as pd

class Update_land_impedance():

    def __init__(self, input_folder, output_folder, reclass_table) -> None:
        self.input_folder = input_folder
        self.output_folder = output_folder
        self.reclass_table = reclass_table

        self.tiff_files = [f for f in os.listdir(input_folder) if f.endswith('.tif')]
        os.makedirs(output_folder, exist_ok=True)

        for tiff_file in self.tiff_files:
            input_raster_path = os.path.join(input_folder, tiff_file)
            print (tiff_file)
            # modify the output raster filename to ensure it's different from the input raster filename
            output_filename = "impedance_" + tiff_file
            output_raster_path = os.path.join(output_folder, output_filename)

            # call function and capture data_type for compression - Float32 or Int32
            data_type = self.reclassify_raster(input_raster_path, output_raster_path, reclass_table)
            print ("Data type used to reclassify LULC as impedance is",data_type)

            # compression using 9999 as nodata
            compressed_raster_path = os.path.splitext(output_raster_path)[0] + '_compr.tif'
            print("path to compressed raster is:", compressed_raster_path)
            subprocess.run(['gdal_translate', output_raster_path, compressed_raster_path,'-a_nodata', '9999', '-ot', data_type, '-co', 'COMPRESS=LZW'])

            # as soon as gdal_translate doesn't support rewriting, we should delete non-compressed GeoTIFFs...
            os.remove(output_raster_path)
            # ...and rename compressed file in the same way as the original GeoTIFF
            os.rename(compressed_raster_path, output_raster_path)

            print("Reclassification complete for:", input_raster_path + "\n------------------------------------")
        

    def lulc_impedance_mapper(self, reclass_table:str) -> dict:

        has_decimal = False
        # read into pandas dataframe and conver to numeric
        df = pd.read_csv(reclass_table, encoding='utf-8-sig')
        df = df.apply(pd.to_numeric, errors='coerce')
        # check if there are decimal values in the dataframe
        if df['impedance'].dtype == 'float64':
            has_decimal = True
            # convert lulc to float too
            df['lulc'] = df['lulc'].astype(float)

        # create a dictionary from the dataframe reclass_dict[lulc] = impedance
        reclass_dict = df.set_index('lulc')['impedance'].to_dict()
        
        if has_decimal:
            print("LULC impedance is characterized by decimal values.")
            # update reclassification dictionary to align nodata values with one positive value (Graphab requires positive value as no_data value)
            # assuming nodata value is 9999 (or 9999.00 if estimating decimal values)
            reclass_dict.update({-2147483647: 9999.00, -32768: 9999.00, 0: 9999.00}) # minimum value for int16, int32 and 0 are assigned with 9999.00 (nodata)
        else:
            print("LULC impedance is characterized by integer values only.")
            # update dictionary again
            reclass_dict.update({-2147483647: 9999, -32768: 9999, 0: 9999}) # minimum value for int16, int32 and 0 are assigned with 9999.00 (nodata)
        
        return reclass_dict , has_decimal , "Int64" if has_decimal == False else "Float64"


    def reclassify_raster(self, input_raster:str, output_raster:str, reclass_table:str) -> str:
        """
        Reclassifies a raster based on a reclassification table.

        Args:
            input_raster (str): The path to the input raster.
            output_raster (str): The path to the output raster.
            reclass_table (str): The path to the reclassification table.

        Returns:
            str: The data type of the output raster.
        """
        # read the reclassification table
        reclass_dict = {}
        # map lulc with impedance values from the reclassification table
        reclass_dict,has_decimal,data_type = self.lulc_impedance_mapper(reclass_table)
           
        print ("Mapping dictionary used to classify impedance is:", reclass_dict)

        # open input raster
        dataset = gdal.Open(input_raster)
        if dataset is None:
            print("Could not open input raster.")
            return

        # get raster info
        cols = dataset.RasterXSize
        rows = dataset.RasterYSize

        # initialize output raster
        driver = gdal.GetDriverByName("GTiff")
        if has_decimal:
            output_dataset = driver.Create(output_raster, cols, rows, 1, gdal.GDT_Float32)
        else:
            output_dataset = driver.Create(output_raster, cols, rows, 1, gdal.GDT_Int32)
        #TODO - to add condition on Int32 if integer values are revealed
        output_dataset.SetProjection(dataset.GetProjection())
        output_dataset.SetGeoTransform(dataset.GetGeoTransform())

        # reclassify each pixel value
        input_band = dataset.GetRasterBand(1)
        output_band = output_dataset.GetRasterBand(1)
        # read the entire raster as a NumPy array
        input_data = input_band.ReadAsArray()

        if input_data is None:
            print("Could not read input raster.")
            return
        elif reclass_dict is None:
            print("Reclassification dictionary is empty.")
            return
        # apply reclassification using dictionary mapping
        output_data = np.vectorize(reclass_dict.get)(input_data)
        output_band.WriteArray(output_data)

        '''FOR CHECKS
        print (f"input_data_shape is': {input_data.shape}")
        print (f"output_data_shape is': {output_data.shape}")
        '''
        
        # close datasets
        dataset = None
        output_dataset = None

        return (data_type)
    # TODO - define a multiplier (effect of protected areas), cast it to yaml function and apply to estimate impedance and affinity


In [2]:
if os.getcwd().endswith("1_protected_areas") == False:
    # NOTE working from docker container
    os.chdir('./1_protected_areas')

input_dir = r'lulc_pa'
output_folder = r'impedance_pa'
reclass_table = "reclassification.csv"
Update_land_impedance(input_dir, output_folder, reclass_table)

lulc_1987_pa.tif
LULC impedance is characterized by decimal values.
Mapping dictionary used to classify impedance is: {1.0: 4.0, 2.0: 1000.0, 3.0: 5.7, 4.0: 3.4, 5.0: 2.7, 6.0: 1.0, 7.0: 2.7, 101.0: 2.0, 102.0: 500.0, 103.0: 2.85, 104.0: 1.7, 105.0: 1.35, 106.0: 0.5, 107.0: 1.35, -2147483647: 9999.0, -32768: 9999.0, 0: 9999.0}
input_data_shape is': (10587, 10876)
output_data_shape is': (10587, 10876)
Data type used to reclassify LULC as impedance is Float64
path to compressed raster is: impedance_pa/impedance_lulc_1987_pa_compr.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Reclassification complete for: lulc_pa/lulc_1987_pa.tif
------------------------------------
lulc_1992_pa.tif
LULC impedance is characterized by decimal values.
Mapping dictionary used to classify impedance is: {1.0: 4.0, 2.0: 1000.0, 3.0: 5.7, 4.0: 3.4, 5.0: 2.7, 6.0: 1.0, 7.0: 2.7, 101.0: 2.0, 102.0: 500.0, 103.0: 2.85, 104.0: 1.7, 105.0: 1.35, 106.0: 0.5, 107.0: 1

: 

: 

: 

##### 5. Updating landscape affinity 
Landscape affinity is computed and compressed based on the math expression processing landscape impedance. By now (04/06/2024), landscape affinity is computed as a reversed value of landscape impedance but it is planned to develop it as a more flexible input to compute connectivity further. This output is required by Miramon ICT software, not Graphab.

In [1]:
import os
import subprocess
import numpy as np
from osgeo import gdal

class Landscape_Affinity_Estimator:

    def __init__(self, impedance_dir:str, affinity_dir:str) -> None:
        self.impedance_dir = impedance_dir
        self.affinity_dir = affinity_dir
        # create output directory if it doesn't exist
        os.makedirs(affinity_dir, exist_ok=True)

        # list all impedance files in the directory
        impedance_files = os.listdir(impedance_dir)
        print(impedance_files)
        pass

    def compute_affinity(self,impedance_files) -> None:
        # loop through each TIFF file in impedance_dir
        for impedance_file in impedance_files:
            if impedance_file.endswith('.tif'):
                # construct full paths for impedance and affinity files
                impedance_path = os.path.join(self.impedance_dir, impedance_file)
                affinity_path = os.path.join(self.affinity_dir, impedance_file.replace('impedance', 'affinity'))

                # open impedance file
                ds = gdal.Open(impedance_path)

                if ds is None:
                    print(f"Failed to open impedance file: {impedance_file}")
                    continue

                # get raster band
                band = ds.GetRasterBand(1)
                # read raster band as a NumPy array
                data = band.ReadAsArray()
                # reverse values with condition (if it is 9999
                # or 0 leave it, otherwise make it reversed)
                reversed_data = np.where((data == 9999) | (data == 0), data, 1 / data)

                # write reversed data to affinity file
                driver = gdal.GetDriverByName("GTiff")
                out_ds = driver.Create(affinity_path, ds.RasterXSize, ds.RasterYSize, 1, gdal.GDT_Float32)
                out_ds.GetRasterBand(1).WriteArray(reversed_data)

                # copy georeferencing info
                out_ds.SetGeoTransform(ds.GetGeoTransform())
                out_ds.SetProjection(ds.GetProjection())

                # close files
                ds = None
                out_ds = None

                print(f"Affinity computed for: {impedance_file}")

                # compression
                compressed_raster_path = os.path.splitext(affinity_path)[0] + '_compr.tif'
                subprocess.run(['gdal_translate', affinity_path, compressed_raster_path,'-a_nodata', '9999', '-ot', 'Float32', '-co', 'COMPRESS=LZW'])
            
                # as soon as gdal_translate doesn't support rewriting, we should delete non-compressed GeoTIFFs...
                os.remove(affinity_path)
                # ...and rename COG in the same way as the original GeoTIFF
                os.rename(compressed_raster_path, affinity_path)
                print(f"Affinity file is successfully compressed.", end="\n------------------------------------------\n")

        print("All LULC affinities have been successfully computed.")

In [4]:
if os.getcwd().endswith("1_protected_areas") == False:
    # NOTE working from docker container
    os.chdir('./1_protected_areas')


impedance_dir = 'impedance_pa'
affinity_dir = 'affinity'
lae = Landscape_Affinity_Estimator(impedance_dir, affinity_dir)
lae.compute_affinity(os.listdir(impedance_dir))

['impedance_lulc_1987_pa.tif', 'impedance_lulc_1992_pa.tif', 'impedance_lulc_1997_pa.tif', 'impedance_lulc_2002_pa.tif', 'impedance_lulc_2007_pa.tif', 'impedance_lulc_2012_pa.tif', 'impedance_lulc_2017_pa.tif', 'impedance_lulc_2022_pa.tif']




Affinity computed for: impedance_lulc_1987_pa.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Affinity file is successfully compressed.
------------------------------------------
Affinity computed for: impedance_lulc_1992_pa.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Affinity file is successfully compressed.
------------------------------------------
Affinity computed for: impedance_lulc_1997_pa.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Affinity file is successfully compressed.
------------------------------------------
Affinity computed for: impedance_lulc_2002_pa.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Affinity file is successfully compressed.
------------------------------------------
Affinity computed for: impedance_lulc_2007_pa.tif
Input file size is 10876, 10587
0...10

ERROR 4: `impedance_pa/impedance_lulc_2017_pa.tif' not recognized as being in a supported file format.


Affinity computed for: impedance_lulc_2022_pa.tif
Input file size is 10876, 10587
0...10...20...30...40...50...60...70...80...90...100 - done.
Affinity file is successfully compressed.
------------------------------------------
All LULC affinities have been successfully computed.


Stop calculating time:

In [None]:
# call own module and sfinish calculating time
timing.stop()