# Data Gathering and Processing: Predictors

This notebook serves as an overview of the environmental predictors relevant to amphibian populations in Scotland. Understanding these predictors is essential for assessing habitat connectivity and suitability for various amphibian species, especially in the context of urbanization and habitat fragmentatio

## Table of Contents
1. [Data Overview](#Data-Overview)
2. [Slope](#1.-Slope)
3. [Vegetation Height](#2.-Vegetation-Height)
4. [Distance to Water](#3.-Distance-to-Water)


## Data Overview

The data presented in this notebook includes various environmental predictors that influence the life cycles and habitats of amphibians. Each predictor has been carefully selected based on its biological significance and relevance to amphibian ecology. The following, adapted from [Donati et al., (2022)](https://www.sciencedirect.com/science/article/pii/S0301479722008271), table summarizes the key predictors, their descriptions, biological interpretations, sources, and selection criteria.

| Predictor Code     | Predictor Description                     | Biological Interpretation                                                                                  | Source                                             | Selection                                      |
|--------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------|---------------------------------------------------|------------------------------------------------|
| Forest Dist.       | Nearest distance to forest               | Important for providing shelter and breeding sites for amphibians.                                        | Local woodland datasets or forest inventory data  | Proximity to forests influences habitat choice |
| Water Dist.        | Nearest distance to water                | Essential for breeding, feeding, and aquatic habitat availability for amphibians.                Open Water          | OS Open Rivers, hydrologicat           | Affects reproductive success                    |
| Soil Hum. Var.     | Soil moisture variability                 | Influences amphibian survival and affects physiological processes.                                         | Soil moisture datasets or interpolated data       | Critical for both aquatic and terrestrial habitats |
| NDVI Med           | NDVI median (2016-2019, April-October) | Indicates vegetation health and density, impacting habitat quality.                                        | Sentinel-2 NDVI data                              | Reflects primary productivity                   |
| NDVI SD            | NDVI standard deviation (2016-2019)     | Variability in vegetation health can indicate habitat stability and resilience.                            | Sentinel-2 NDVI data                              | Important for assessing habitat heterogeneity   |
| Road Dist.         | Nearest distance to road                 | Roads can fragment habitats and pose barriers to amphibian movement.                                       | Transport datasets or OS Open Roads               | Assessing impacts on connectivity                |
| Runoff Coefficient | Runoff coefficient                        | Indicates potential runoff and its effects on aquatic habitats, influencing survival and reproduction.     | Hydrological models or datasets                   | Impacts habitat quality                         |
| Slope              | Terrain slope                            | Steeper slopes may obstruct movement and access to breeding sites for certain species.      ESRI Living Atlas               | OS                                | Influences movement patterns                    |
| Traffic Volume     | Average daily traffic volume              | Traffic can pose direct risks to amphibians through road mortality and habitat degradation.                | Scottish Transport Statistics                      | Assessing urban impacts                         |
| Urbanization       | Urbanization proxy (density of buildings)| Urbanization can lead to habitat loss, fragmentation, and increased mortality risks.                       | Scotland Land Use Database                        | Important for habitat connectivity               |
| Veg. Height        | Vegetation height model median           | Taller vegetation can provide better shelter and foraging opportunities for amphibians.  Living Atlas Canopy                   | Vegetation heigh              | Affects habitat suitability                     |
| Grassland Density   | Grassland density in 250x250m           | Influences habitat availability and diversity of niches for amphibians.                                    | Land cover datasets                                | Relevant for terrestrial habitats               |

## Connection to Amphibian Populations

Amphibians are sensitive to environmental changes, making it crucial to understand how various factors influence their populations and habitat availability. The predictors outlined in the table play a significant role in shaping amphibian distributions and behaviors. By analyzing these predictors, we can identify areas of suitable habitat, assess the impacts of urbanization, and develop strategies for conservation and habitat enhancement.

In the following sections of this notebook, we will explore the data in more detail, examine its sources, and visualize the relationships between these predictors 

# Environmental Predictors for Amphibian Movement and Habitat Suitability

## 1. Slope
### **Impact on Amphibians**
Slope can influence amphibian movement and habitat selection by affecting moisture retention and temperature regulation. Steeper slopes may hinder movement and increase the risk of desiccation.

### **Data Processing**
The slope data for this study was derived from the Terrain layer available in the ArcGIS Living Atlas of the World. This dataset has a variable native resolution ranging from 300 metres to 1 metre across the study area. To ensure consistency and relevance for ensemble modelling, the data was resampled to 30 metres before slope calculation.

### Workflow
1. **Resampling Terrain to 30 Metres**:  
   The Terrain raster was first resampled to 30-metre resolution using bilinear interpolation. Bilinear interpolation was selected as it balances the preservation of the continuous nature of elevation data with reduced artefacts. This method ensures smooth transitions between elevation values, which is critical for calculating accurate slope values (Kennedy et al., 1998; Smith et al., 2019).

2. **Slope Calculation**:  
   Following resampling, the slope was computed using the terrain analysis tools in ArcGIS Pro. Calculating slope on the resampled raster ensures that the slope output aligns directly with the desired resolution for subsequent analyses, reducing computational inconsistencies and errors in derived gradient values (Evans, 1972; Wilson & Gallant, 2000).

3. **Export for Analysis**:  
   The processed slope raster was exported at 30-metre resolution, ready for integration into the species distribution modelling workflow.

### **Justification for Methodology**
- **Resampling Terrain First**:  
   Resampling the elevation raster before calculating slope helps maintain the fidelity of elevation gradients over varying terrain. Computing slope on a higher-resolution raster derived from bilinear resampling minimizes artefacts that could arise from upscaling after slope computation (Wilson & Gallant, 2000).

- **Selection of 30-Metre Resolution**:  
   A 30-metre resolution was chosen to balance computational efficiency with ecological relevance. Amphibian habitat analysis and Blue-Green Infrastructure (BGI) planning benefit from spatial detail without unnecessary computational overhead. Higher resolutions, such as 10 metres, would increase computational demands significantly without proportionate gains in modelling accuracy for the scale of the study area (~16,624 km²) (Riley et al., 1999; Guisan & Thuiller, 2005).

- **Use of Bilinear Interpolation**:  
   Bilinear interpolation is suitable for continuous data like elevation, as it preserves the smooth transitions between cell values, which are critical for accurate slope derivation (Kennedy et al., 1998). Alternative methods, such as nearest neighbour, may introduce abrupt changes in slope values, reducing ecological validity.

### References
- Evans, I. S. (1972). "General geomorphometry, derivatives of altitude, and descriptive statistics." *Spatial Analysis in Geomorphology*, 17-90.
- Guisan, A., & Thuiller, W. (2005). "Predicting species distribution: offering more than simple habitat models." *Ecology Letters*, 8(9), 993–1009.
- Kennedy, M., & Leigh, M. (1998). *The Global Positioning System and GIS: An Introduction*. CRC Press.
- Riley, S. J., et al. (1999). "A terrain ruggedness index that quantifies topographic heterogeneity." *Intermountain Journal of Sciences*, 5(1-4), 23-27.
- Smith, M. J., et al. (2019). "Interpolating elevation data for geomorphological applications." *Journal of Geographical Systems*, 21(4), 545–567.
- Wilson, J. P., & Gallant, J. C. (2000). *Terrain Analysis: Principles and Applications*. John Wiley & Sons.

______________________

## 2. Vegetation Height
### **Impact on Amphibians**
Vegetation height can influence the microclimate and shelter availability for amphibians. Taller vegetation may provide more cover from predators and environmental extremes.

### **Data Source and Description**
The vegetation height data used in this study was sourced from the **2020 Global Vegetation Height Map** developed by the EcoVision Lab at ETH Zurich. This dataset provides global canopy height estimates at a **10m spatial resolution**, derived from LiDAR data collected by the Global Ecosystem Dynamics Investigation (GEDI) on board the International Space Station and Sentinel-2 imagery. The dataset was generated using a deep convolutional neural network trained with LiDAR observations as ground truth data, achieving an accuracy of **±5m**. The data is designed to support applications in biodiversity monitoring, ecosystem function analysis, and sustainable land-use planning.

### **Data Processing**
### Workflow
1. **Clipping to Study Area**:
The global vegetation height data was clipped to the extent of the study area to focus on Central Scotland. This ensures that only the relevant geographical extent is included in the analysis, reducing computational overhead and maintaining ecological relevance.

2. **Resampling to 30m Resolution**:
The original 10m resolution data was resampled to a **30m resolution** to match the spatial scale of other environmental predictors in the study. Consistency in resolution is critical for ensuring accurate and unbiased inputs for species distribution modelling (SDM).

3. **Resampling Method**:
**Bilinear interpolation** was used as the resampling method. This approach is well-suited for continuous data like vegetation height, as it calculates the value of each new cell based on a weighted average of the four nearest cells. Bilinear interpolation smooths transitions between neighbouring pixels, preserving gradual changes in vegetation height while avoiding artefacts introduced by simpler methods like nearest neighbour resampling (Chen et al., 2007; Hijmans et al., 2005).

### Justification for Resampling Method
- **Suitability for Continuous Variables**:  
  Vegetation height represents a continuous variable where abrupt transitions between pixels are not ecologically meaningful. Bilinear interpolation reduces artificial boundaries and ensures smoother transitions, aligning with standard practices for processing environmental data (Chen et al., 2007).

- **Consistency with Predictor Integration**:  
  Ensuring a uniform resolution across all predictors avoids artefacts during model development and enhances comparability among variables (Dormann et al., 2013).

#### References
- Chen, X., Vierling, L., & Deering, D. (2007). "A simple and effective method for detecting specular reflection in airborne LiDAR intensity data." *Remote Sensing of Environment*, 109(2), 273-282. https://doi.org/10.1016/j.rse.2007.01.002  
- Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., & Jarvis, A. (2005). "Very high resolution interpolated climate surfaces for global land areas." *International Journal of Climatology: A Journal of the Royal Meteorological Society*, 25(15), 1965-1978. https://doi.org/10.1002/joc.1276  
- Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., ... & Münkemüller, T. (2013). "Collinearity: a review of methods to deal with it and a simulation study evaluating their performance." *Ecography*, 36(1), 27-46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

___________


## 3. Distance to Water

### **Impact on Amphibians**
Proximity to water bodies is critical for amphibian survival as it provides breeding sites and essential moisture. Increased distance may limit access to these resources.

### **Data Source and Context**
The **distance to water predictor layer** was developed to incorporate the proximity of habitats to water bodies into the species distribution modelling (SDM). Water availability is a critical environmental factor for amphibians, influencing their habitat suitability. The predictor was derived from the **Global Surface Water Occurrence 1984–2021 dataset** (Pekel et al., 2016), provided by the European Commission Joint Research Centre. This dataset identifies surface water dynamics globally over a 37-year period and was accessed in raster format.

### **Data Preparation and Processing**
1. **Resampling the Water Bodies Raster**: The original dataset was resampled from its native resolution to **30 m** to maintain consistency with other environmental predictors.The **nearest neighbour resampling method** was employed to preserve the categorical integrity of the water body classification, ensuring that water and non-water cells remained distinct.

2. **Reclassification for Euclidean Distance Calculation**: The resampled raster was reclassified to create a binary dataset:
* Cells representing water bodies were assigned a value of `1`.
* Non-water cells were set to `NoData`.
* This binary format was necessary for the **Euclidean Distance** tool to accurately compute distances from water body cells.

3. **Calculating Euclidean Distance**: The reclassified raster was used as the input for the **Euclidean Distance** tool in ArcGIS Pro. This process calculated the straight-line distance from each cell to the nearest water body, resulting in a continuous raster layer representing proximity to water.

### Output
The resulting distance to water raster was exported in **GeoTIFF format** for integration into the SDM. This predictor layer provides valuable spatial information on habitat accessibility to water bodies, aiding in the identification of suitable areas for amphibian conservation and planning.

#### References
- Pekel, J.-F., Cottam, A., Gorelick, N., & Belward, A. S. (2016). High-resolution mapping of global surface water and its long-term changes. *Nature*, 540, 418–422. https://doi.org/10.1038/nature20584
- ArcGIS Pro Documentation: Euclidean Distance Tool.

_____________

## 4. NDVI (Normalized Difference Vegetation Index)
### **Impact on Amphibians**
NDVI serves as an indicator of vegetation health and density, which can affect habitat quality and availability of food resources for amphibians.

To enhance the environmental predictors used in the species distribution modelling, NDVI (Normalized Difference Vegetation Index) data were processed to represent both the median values and the standard deviation over the study period. NDVI is a critical indicator of vegetation health and distribution, which significantly impacts habitat suitability for amphibians.

### **Data Source and Preprocessing**
The Sentinel-2 satellite imagery from the Copernicus programme was used as the source dataset, covering the period from April 2019 to October 2024. Sentinel-2 Level-2A data provides surface reflectance values with high spatial resolution (10m for visible bands). The dataset was filtered to include only images with less than 10% cloud cover and clipped to the study area.

A cloud masking algorithm was applied using the Sentinel-2 Scene Classification Layer (SCL) to remove cloud, shadow, and snow pixels, ensuring the integrity of the data. This step minimized noise and preserved only reliable reflectance values for NDVI calculations.

### **NDVI Calculation and Metrics**
NDVI was calculated for each image in the filtered collection using the formula:

$$ \text{NDVI} = \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}} $$

where the Near Infrared (NIR) band corresponds to Band 8, and the Red band corresponds to Band 4. 

The following two metrics were derived from the NDVI collection:
1. **NDVI Median:** The median NDVI value was computed to represent the central tendency of vegetation greenness during the study period, reflecting typical vegetation conditions.
2. **NDVI Standard Deviation:** The standard deviation of NDVI values was calculated to capture temporal variability in vegetation, highlighting areas with dynamic vegetation patterns.

### **Output and Use**
Both NDVI metrics were exported as raster layers with a spatial resolution of 30m, using bilinear resampling to align with the resolution of other predictors in the model. The rasters were re-projected to the British National Grid (OSGB 1936), ensuring compatibility with the modelling environment.

These layers will serve as key predictors in the ensemble modelling framework, providing insights into the role of vegetation distribution and variability in shaping amphibian habitat suitability. This step is critical for linking vegetation dynamics to ecological processes relevant to Blue-Green Infrastructure (BGI) planning in Central Scotland.



---

## 5. Vegetation Height
### Impact on Amphibians
- Vegetation height can influence the microclimate and shelter availability for amphibians. Taller vegetation may provide more cover from predators and environmental extremes.

### Data Processing
- **Acquisition**: Obtain vegetation height data from LiDAR or remote sensing sources.
- **Processing**: Create a raster layer reflecting vegetation height, and categorize height ranges to identify suitable habitats.

---

## 6. Grasslands in Surrounding 250m
### Impact on Amphibians
- The presence of grasslands can facilitate movement and provide suitable foraging habitats for many amphibian species.

### Data Processing
- **Acquisition**: Map grassland areas using land cover datasets.
- **Processing**: Analyze surrounding areas within a 250m buffer around target habitats, creating a binary layer indicating the presence or absence of grasslands.

---

## 7. Distance to Forest
### Impact on Amphibians
- Forests can serve as essential corridors for amphibian movement, providing shelter and moisture retention, thus influencing habitat selection.

### Data Processing
- **Acquisition**: Identify forested areas from land cover datasets.
- **Processing**: Calculate distance to forest for each raster cell, creating a distance raster layer.

---

## 8. Urbanization Proxy
### Impact on Amphibians
- Urbanization can fragment habitats, increase pollution, and alter hydrology, negatively impacting amphibian populations.

### Data Processing
- **Acquisition**: Use urban land cover datasets to create a proxy for urbanization.
- **Processing**: Create a raster layer indicating urban areas and categorize the level of urbanization (e.g., low, medium, high).

---

## 9. Runoff Coefficient
### Impact on Amphibians
- The runoff coefficient is a measure of imperviousness, influencing hydrological processes and habitat availability for amphibians.

### Data Processing
- **Acquisition**: Gather runoff coefficient data from land use studies or models.
- **Processing**: Create a raster layer representing the runoff coefficients for various land uses, ensuring values are standardized for analysis.

---

## 10. Distance to Rock-Gravel-Sand
### Impact on Amphibians
- The availability of rock, gravel, and sand can affect breeding habitats and the microhabitat conditions necessary for different amphibian species.

### Data Processing
- **Acquisition**: Map the locations of rock, gravel, and sand areas from geological surveys.
- **Processing**: Calculate distance to these areas, creating a distance raster layer to include in habitat models.

---

## 11. Distance to Road
### Impact on Amphibians
- Roads can pose barriers to amphibian movement and increase mortality rates due to traffic, affecting population connectivity.

### Data Processing
- **Acquisition**: Gather road network data from transport infrastructure datasets.
- **Processing**: Calculate distance to roads, creating a distance raster layer indicating the proximity of habitats to roadways.

---

## 12. Traffic Intensity
### Impact on Amphibians
- High traffic intensity can increase mortality risks and habitat fragmentation, affecting the survival of amphibian populations.

### Data Processing
- **Acquisition**: Obtain traffic data from transportation authorities or urban planning datasets.
- **Processing**: Create a raster layer indicating traffic intensity around habitats.

---

# Conclusion
Following these steps will ensure that each environmental predictor is effectively processed and integrated into habitat suitability models, enabling a comprehensive understanding of how these factors influence amphibian movement and population dynamics.
