This repository contains functions and analysis scripts required to reproduce results reported by Anderegg et al. (2019).
Jonas Anderegg
Crop Science Group
ETH Zürich
Analysis of variance is performed using the R-package asreml
. Computationally demanding steps in the analysis are designed to run on a cluster, requiring the R-packages doParallel
, future
, and furrr
. Scripts build primarily on functions from the R packages prospectr
, caret
and tidyverse
.
Folder Utils
contains functions for the pre-processing of spectra, calculation and evaluation of spectral indices, training and evaluation of full-spectrum models, recursive feature elimination and analysis of variance.
Folder Analysis
contains scripts to implement the analysis and obtain results contained in the study.
1.-3. Helper functions to remove incomplete datasets (Lots, Years, measurement series).
f_spc_smooth
applies the Savitzky-Golay smoothing filter to raw spectra (wrapper forprospectr::SavitzkyGolay
)f_spc_avg
averages replicate measurementsf_calc_si
calculates spectral indicesf_calc_sen_si
calculates all spectral indices of the form of the PSRI, to enable the wavelength sensitivity analysisf_scale_si
scales spectral indices and predictions obtained from full-spectrum model to range from 0 to 10f_spc_bin
computes average values of a signal in pre-determined bins (wrapper forprospectr::binning
)f_spc_trim
removes noisy parts of the signal in pre-determined rangesf_cont_rem
performs continuum removal (wrapper forprospectr::ContinuumRemoval
)f_match_join
joins spectral and scoring datasets by assessment/measurement dates, using a supplied template of matching dates. This function also adds growing degree days for scoring and measurement time points.get_dynpars_SI
extracts dynamics parameters from spectral indices.get_errors_and_dynpars
Performs linear interpolation of scorings and index values (data in growing degree days after heading), extracts dynamics parameters and calculates overall error between the two fitted curves. Parametric models instead of linear interpolation are also supported.- Several data wrangling helper functions.
get_rmse
calculates the root mean square error (RMSE)perform_sampling
performs up- or down-sampling to create balanced evaluation datasets (wrapper forcaret::upSample
orcaret::downSample
)get_RMSE_sample
calculates the root mean square error (RMSE) of up- or downsamplesextract_predobs
creates predictions using a fitted model and extracts the corresponding real observationseval_full_spc_cross
fits full-spectrum models, extracts performance metrics, performs sampling (up- or downsampling of both training and test data) and extracts predictions and real observations for these samples, performs all possible variants of leave-year(s)-out cross-validation. This is essentially a wrapper function forpls::plsr
andCubist::cubist
via thecaret::train
interface. This function is designed to run in parallel on the high performance computing cluster of ETH Zürich. Must be serialized if no access to such a facility is available. Very long computation times should be expected.- Several data wrangling helper functions.
perform_rfe
performs recursive feature elimination using random forest or cubist regression as base learners. This is essentially a wrapper function forranger::ranger
andCubist::cubist
using thecaret::train
interface.tidy_rfe_output
gathers results of resamples and creates summary statistics of performance and feature ranks.plot_perf_profile
creates simple performance profile plots.
f_spats
fits the SpATS model to raw plot-based data (wrapper forSpATS::SpATS
)get_h2
reports repeatability, orNA
if no replicate measurements are available. This is a wrapper forSpATS::getHeritability
.spat_corr_spats
computes spatially corrected values for each experimental plot from the spats object (object$intercept
+object$coeff
+object$residuals
).get_BLUE_spats
computes the best linear unbiased estimator (BLUE) for each genotype.get_h2_asreml
calculates heritability across years using spatially corrected plot values and the formula proposed by Cullis.get_h2_asreml2
calculates heritability across years using best linear unbiased estimators.- Data wrangling helper functions.
01_prep_spcdat.R
Prepare spectral datasets for evaluation of spectral indices and full-spectrum models.02_full_spc_cv.R
Train and evaluate full-spectrum models03_perform_rfe.R
An example script that performs rfe using cubist as base learner for the selection of the most important wavelengths to predict visual senescence scorings for the 2016 experiment. This script is designed to run on the high performance computing cluster of ETH Zürich. It must be serialized if no access to such a facility is available. Expect very long computation times.04_get_dynpars.R
Extract dynamics parameters from visual senescence scorings.05_get_dynpars_SI.R
Extract dynamics parameters from spectral indices.06_data_prep_aov.R
Assemble all data for correction of spatial heterogeneity.07_spatial_w2_h2.R
Correct for spatial heterogeneity, calculate repeatability (w2) and across-year heritability (h2). Calculate best linear unbiased estimators.08_SI_perf.R
Evaluate performance of spectral indices in tracking visually observed senescence dynamics.09_full_spc_perf.R
Evaluate performance of full-spectrum models in tracking visually observed senescence dynamics.10_data_prep_spctemp_rfe.R
Assemble data sets for feature selection11_perform_rfe_feats.R
Perform feature selection.