## Addressing Sampling Bias in Species Distribution Models (SDMs)

Species distribution models (SDMs) are essential tools for predicting the potential distribution of species based on environmental variables and occurrence data. These models play a critical role in conservation planning, biodiversity management, and understanding ecological processes. However, the accuracy and reliability of SDMs can be significantly compromised by **sampling bias**—a common issue where data collection efforts are unevenly distributed across the study area, often due to easier access, proximity to research institutions, or observer preferences.

### The Problem of Sampling Bias

Sampling bias occurs when certain areas within a study region are surveyed more intensively than others, leading to overrepresentation of species presence in those locations. This can result in models that inaccurately predict species distributions, reflecting survey effort rather than true ecological patterns. For example, urban areas or regions near research facilities may have more data points simply due to higher human activity and accessibility, while remote or rural areas remain underrepresented. This issue is particularly problematic in studies aiming to inform conservation strategies, as it may lead to the neglect of critical habitats that are under-surveyed.

### Strategies to Mitigate Sampling Bias

To improve the robustness of SDMs and ensure that model outputs more accurately reflect true species distributions, several methods can be employed to address sampling bias:

1. **Geographic and Environmental Thinning**: This method involves filtering occurrence data to reduce spatial clustering, ensuring a more even distribution of data points across the study area. Geographic thinning selects records that are spatially separated, while environmental thinning ensures that data points represent a broad range of environmental conditions. Both approaches have been shown to effectively reduce the impact of sampling bias (Redding et al., 2021).

2. **Using Bias Files in Modeling Algorithms**: Incorporating bias files into algorithms like Maxent allows models to account for uneven survey efforts by weighting background points based on survey intensity. This helps to adjust predictions and improve model performance, leading to more accurate distribution maps (Phillips et al., 2009).

3. **Pooling Presence-Only and Presence-Absence Data**: Combining different types of data can help correct sampling bias by leveraging the strengths of each dataset. Probabilistic models that integrate presence-only and presence-absence data across multiple species can jointly analyze the data, adjusting for bias and improving estimation efficiency (Fithian et al., 2015).

4. **Model-Based Approaches to Sampling Bias**: Explicitly modeling the sampling process by including survey effort as a covariate can improve SDM accuracy. This approach acknowledges the non-random nature of data collection and adjusts predictions based on modeled sampling patterns (Robinson et al., 2017).

By implementing these strategies, researchers can enhance the reliability of SDMs, ensuring that predictions are driven by ecological factors rather than artifacts of data collection.



In [1]:
import geopandas as gpd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from scipy.stats import gaussian_kde
import rasterio
from rasterio.transform import from_origin
import os

### References

Fithian, W., Elith, J., Hastie, T., & Keith, D. A. (2015). Bias correction in species distribution models: pooling survey and collection data for multiple species. *Methods in Ecology and Evolution*, 6(4), 424-438. https://doi.org/10.1111/2041-210X.12242

Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modelling of species geographic distributions. *Ecological Modelling*, 190(3–4), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026

Inman, R., Franklin, J., Esque, T., & Nussear, K. (2021). Comparing sample bias correction methods for species distribution modeling using virtual species. *Ecosphere*, 12(3). https://doi.org/10.1002/ecs2.3422

Robinson, O. J., Ruiz-Gutierrez, V., & Fink, D. (2017). Correcting for bias in distribution modeling for rare species using citizen science data. *Diversity and Distributions*, 23(1), 1-12. https://doi.org/10.1111/ddi.12698
