Connor Hughes, Mariana Castaneda-Guzman, Luis E. Escobar
- Background
- Package description
- Installing the package
- Workflow in TemporalModelR
- Citation
- Note on AI usage
- Contributing
- Getting help
Ecological niche modeling (ENM) and species distribution modeling (SDM) are widely used to reconstruct and predict species environmental tolerances and geographic distributions, expanding ecological and biogeographic theory. Despite their widespread use, one aspect that has received relatively little attention is the temporally dynamic environment.
Environmental conditions shaping species distributions vary over time, including climate, resource availability, and habitat availability. As a result, the suitability of any given location may change as key environmental conditions become more or less favorable for a species. Most existing implementations of ENM assume a static relationship between species and environment: occurrences are pooled across years, predictors are time-averaged, and a single model is fit to the combined dataset. This works well when relationships are genuinely stable, but it carries three costs when they aren’t:
- Temporal mismatch. Pairing each occurrence with long-term average conditions can misrepresent the environment the species actually experienced at the moment it was recorded, leading to over-generalized niche estimates and higher rates of both extrapolation and omission.
- Lost training data. Standard spatial rarefaction retains one point per pixel, discarding temporally independent observations from the same location. When environments at a location have changed over time, those repeat observations carry unique information about the species niche.
- No temporal dynamics in the output. Static models produce a single predictive surface and cannot directly answer questions about how suitable conditions for a species have changed over time, or what environmental drivers are responsible for those changes.
Temporally-explicit ENM offer an alternative practice which preserve the time stamp on each observation. Each occurrence is paired with the environmental conditions it experienced at the time step it was recorded, rather than against a long-term average or other static variable. The resulting models are sensitive to year-to-year (or season-to-season) variation in the species-environment relationship, and predictions are projected as time series rather than as a single static surface.
TemporalModelR is an R package for building temporally explicit SDMs end-to-end, from raw occurrence records to time-resolved trend analyses. It provides a consistent and replicable workflow for implementing temporally explicit ENM.
TemporalModelR operates across two complementary dimensions to support temporally explicit modeling:
- E-space (Environmental Space) represents a location in terms of its environmental conditions, independent of geography. A species niche in E-space is its tolerance for a given set of environmental variables.
- G-space (Geographic Space) represents a location in terms of where it physically sits on the landscape. Every point in G-space has a corresponding location in E-space.
Because environmental conditions change over time, the same point in G-space may move through E-space across years, decades, or seasons. Traditional, temporally-static workflows assume both G-space and E-space to be stable in time. TemporalModelR assumes only that the niche (i.e., the species tolerance in E-space) is stable across the study period, while the E-space coordinates of any G-space location may change over time. Species observations are therefore matched to E-space data correct in both space and time, the niche is modeled in time-independent E-space. The resulting niche estimate is projected back onto G-space at explicit time periods to produce temporally dynamic predictions across time rather than a single static surface.
The package structures the SDM workflow around three phases:
- Preprocessing: aligning environmental rasters to a common grid, rarefying occurrence data in space and time, extracting environmental values matched to observation time, scaling rasters to a standardized range, partitioning into spatially and/or temporally independent cross-validation folds, and generating fold-stratified pseudoabsences for presence/absence models.
- Modeling: fitting one of four supported algorithms (GLM, GAM, random forest, hypervolume) per cross-validation fold, with automatic threshold selection and per-fold evaluation metrics in both E-space (time-independent) and G-space (time-specific).
- Postprocessing: combining per-fold predictions into a consensus binary stack for more straightforward assessments, classifying each pixel’s temporal trajectory via changepoint detection into never-suitable, always-suitable, no pattern, increasing, decreasing, or fluctuating in suitability, and aggregating patterns across user-defined spatial units like administrative regions or watersheds.
Every function in the package is built around the concept of time
placeholders in raster filenames (forest_cover_2005.tif,
pr_seas_2003_Spring.tif). Patterns map clean predictor names to
filenames; time_cols defines which columns in your occurrence data
correspond to those placeholders. For example, a column in species
occurrence data which labeled year with subsequent values 2001,
2002, 2005, and a user defined variable name as forest_cover_year,
would match to rasters forest_cover_2001, forest_cover_2002, and
forest_cover_2005. Likewise, a databse with both columns year and
month and a user defined variable name of pr_m_year_month would
match to rasters like pr_m_1997_09.
This makes the workflow flexible to whether your time scale is annual, monthly, daily, or anything in between, so long as you have rasters data at the same temporal level. It also allows different predictors to vary at different time scales within the same model, flexibly allowing temporally static variables (e.g. elevation) to be processed in the same suite and used in the same models as temporally dynamic variables, which may themselves be measured at different time steps (e.g. annual forest cover estimates, monthly precipitation).
Note: Internet connection is required to install the package.
The development version of TemporalModelR can be installed using the code below.
# install.packages("devtools")
devtools::install_github("CJHughes926/TemporalModelR")A typical TemporalModelR workflow follows these steps:
- Preprocess — align rasters, rarefy occurrences, extract and optionally scale environmental values, partition into folds, and generate pseudoabsences (if needed)
- Model — fit a GLM, GAM, random forest, or hypervolume model, assess time-independent model performance. Project the fitted model across space and time, assess temporally-explicit model performance
- Postprocess — summarize predictions to consensus surfaces, identify temporal patterns, and aggregate by spatial unit
A brief description of each step is presented below. For full walkthrough of each phase, see the vignettes hosted on the TemporalModelR site under the Articles menu, also linked inline throughout this README.
The package ships a small synthetic dataset that every vignette runs
against, so the full workflow can be reproduced end-to-end without
downloading any external data. The dataset includes raw and aligned
environmental rasters, an occurrence point database, intermediate point
files, and pre-computed data() objects from each phase of the
workflow. Reading the data set vignette first is recommended, as it
provides the shared context every other vignette refers back to.
For a description of the synthetic data and the bundled
data()objects used in the workflow vignettes, see the About the Example Dataset vignette.
The preprocessing pipeline transforms raw occurrence records and
environmental rasters into the structured inputs that downstream models
expect. Six functions cover this phase. raster_align() standardizes
rasters to a common projection, extent, and resolution to prevent
misalignment in future analyses. spatiotemporal_rarefaction() subsets
data to one point per pixel per time step, reducing pseudoreplication
from your data but preserving truly unique temporally independent
observations from the same location. temporally_explicit_extraction()
extracts environmental values from the raster that matches each point’s
observation time step, producing a table of observations with their
relevant E-space conditions correct in both space and time, along with
the means and standard deviations of each variable across the data set.
scale_rasters() optionally uses those scaling parameters to z-score
every raster, ensuring variables share a comparable range and contribute
evenly to subsequent modeling algorithms sensitive to that.
spatiotemporal_partition() builds cross-validation folds via one of
four methods: purely spatial, purely temporal, spatiotemporal (combining
both), or random as a baseline. Finally, generate_absences() produces
fold-stratified pseudoabsences relevant for presence/absence models.
For a detailed workflow showcasing each preprocessing function, see the Preprocessing temporally explicit data vignette.
TemporalModelR supports four modeling algorithms, all of which share the
same data requirements (partition_result, pseudoabsence_result for
presnce/absence models, a model_formula or model_vars). One model is
fit per cross-validation fold; each is evaluated on its held-out test
set, and for any continuous models a threshold is selected and applied
to produce only binary results.
build_temporal_glm()fits one generalized linear model per fold with a binomial link function. Supports any combination of linear, polynomial, and interactive terms via standard R formula syntax.build_temporal_gam()fits one generalized additive model per fold viamgcv::gam(). Allows for complex nonlinear relationships between environmental gradients and species presence.build_temporal_rf()fits one random forest classifier per fold viarandomForest::randomForest(). Uses ensembles of decision trees to capture complex nonlinear relationships and interactions.build_temporal_hv()constructs n-dimensional hypervolumes per fold via Gaussian kernel density estimation or one-class support vector machine. The presence-only option; no pseudoabsences required.
Each model builder produces evaluation metrics across folds in E-space (time-independent), capturing overall niche estimation quality regardless of when the test points were observed. E-space evaluation gives a stable picture of model performance even in years where occurrence records are sparse.
Once a model is fit, generate_spatiotemporal_predictions() projects it
across user-defined time steps. The function accepts any of the four
model types via the model_result argument and writes one prediction
raster per fold per time step alongside per-time-step evaluation
metrics. The same call handles single-timestep and multi-timestep
workflows given the relevant raster data is available: for example
time_steps = data.frame(year = 1:15) produces 15 annual maps;
expand.grid(year = 1:15, season = "Spring") produces 15 spring
snapshots; expand.grid(year = 1:15, season = c("Spring", "Summer"))
produces 30 maps spanning two seasons.
Once predictions are projected into G-space, time-specific assessment metrics become available, including cumulative binomial probability (CBP) tests assessing whether predictions in a given time step are better than random. G-space evaluation complements the E-space metrics from model fitting and both may be considered in model evaluation: G-space is robust and specific in years with substantial sample sizes but loses meaning when few records exist for a given time step, while E-space gives a more stable overall picture independent of time. Reporting both gives the most complete view of model performance.
For a full walkthrough of each modeling algorithm, including formula syntax, parameter choices, G-space projection, and diagnostic interpretation, see the four modeling vignettes: Modeling with a GLM, Modeling with a GAM, Modeling with a Random Forest, and Modeling with a Hypervolume.
The postprocessing phase converts the per-fold prediction rasters into
interpretable summaries. summarize_raster_outputs() collapses
fold-level predictions into a binary consensus stack (where a
user-specified number of folds agree on suitability) and a frequency
raster (proportion of time steps each pixel was suitable) which explored
the stability of G-space suitability projections across time.
analyze_temporal_patterns() uses the fastcpd package to detect
structured changes in each pixel’s suitability across the time series,
classifying trajectories as never-suitable, always-suitable, no pattern,
increasing, decreasing, or fluctuating. The changepoint analysis
accounts for temporal autocorrelation by including each pixel’s
previous-time-step value as a predictor in the change point detection
model; users can optionally enable spatial autocorrelation, which adds
the proportion of a pixel’s first-order (queen’s case) neighbors
predicted as suitable as an additional covariate.
analyze_trends_by_spatial_unit() aggregates pixel-level patterns
across user-defined polygons (administrative units, watersheds,
ecoregions), producing useful tables summarizing losses and gains in
suitability over time across a these regions, and is useful for
identifying regional variation in suitability trends.
For the full postprocessing workflow including consensus threshold selection, changepoint diagnostics, and zone aggregation, see the Postprocessing predictions vignette.
If you use TemporalModelR in your research, please cite:
Hughes, C., Castaneda-Guzman, M., & Escobar, L.E. (2026). TemporalModelR: An R Package for temporally-explicit species distribution modeling. [Journal information to be added]
To maintain high standards of code quality and documentation, we have used AI LLM tools while constructing code for this package. We used these tools for grammatical polishing and exploring technical implementation strategies for specialized functions. We manually checked and tested all code and documentation refined with these tools.
We welcome contributions to improve TemporalModelR. To maintain the
integrity and performance of the package, we follow a few core
principles:
- Quality over quantity: we prioritize well-thought-out, stable improvements over frequent, minor changes. Please ensure your code is well-documented and follows the existing style of the package.
- Minimal dependencies: one of the goals of TemporalModelR is to remain efficient. We prefer solutions that use base R or existing dependencies. Proposals that introduce new package dependencies will be evaluated for their necessity.
- AI-assisted code: if you use AI tools to generate code alternatives or improvements, please manually verify the logic and accuracy of the output and demonstrate the benefit in your Pull Request.
- Testing: new features should include examples, and tests should be performed to ensure they work as intended and do not break existing workflows.
If you have an idea for a major change, please open an Issue first to discuss it with the maintainers.
- Report bugs or request features: GitHub Issues
- Ask questions: GitHub Discussions