Skip to content

ShanKothari/CABO-trait-models

Repository files navigation

README: CABO Trait Models

This repository is meant to support two separate manuscripts:

  1. Plant spectra as integrative measures of plant phenotypes by Kothari & Schweiger (2022), Journal of Ecology DOI: 10.1111/1365-2745.13972. The relevant components of this manuscript are the analyses of the major dimensions of variation in leaf reflectance spectra (Section 4.2), as well as a toy example (Figure 3) of how trait covariance can aid leaf trait estimation using reflectance spectra.
  2. Predicting leaf traits across functional groups using reflectance spectroscopy by Kothari et al. (2022), bioRxiv DOI: 10.1101/2022.07.01.498461 (in press at New Phytologist). In this manuscript, we build and validate partial least-squares regression (PLSR) models to estimate leaf traits across a wide variety of functional groups and ecosystems.

The repository is maintained by Shan Kothari (shan.kothari [at] umontreal [dot] ca). A previous version of the repository is archived at Zenodo (DOI: 10.5281/zenodo.6820487) and represents the analyses carried out in the final version of paper (1). This version represents the analyses in manuscript (2) and will shortly be archived as a new release of the same repository at Zenodo (DOI forthcoming).

Components

The repository contains R scripts numbered as belonging to 10 distinct 'stages' of analysis, as well as stage 00 (useful functions that are called in later scripts).

The stages are:

  1. Processing reflectance and transmittance spectra from the raw form they were downloaded in (see https://data.caboscience.org/leaf/). The spectra have already been resampled to 1 nm resolution, but here I interpolate over the sensor overlap region, average leaves for a sample, and apply a Savitzky-Golay filter. These steps are done per project.
  2. Compiling spectral data from across projects.
  3. Attaching trait data to the spectral data.
  4. Dividing up training and testing data for PLSR analyses.
  5. Calibrating models using different kinds of data ([A + D + G] reflectance, [B] transmittance, [C] absorptance, [E] brightness-normalized reflectance, and [F] continuum-removed reflectance) and traits ([D] area-based vs [all others] mass-based chemical traits). I also tried various transformations of trait data ([G]) to see whether they improved model performance. Here I mostly follow the approach laid out by Burnett et al. (2021) Journal of Experimental Botany.
  6. Plotting model predictions with (A) just reflectance or (B) comparing reflectance, transmittance, and absorptance; and (C) plotting model coefficients.
  7. Comparing trait distributions in the data to TRY data.
  8. Evaluating transferability of models across functional groups.
  9. Externally validating the models built in step 5 across LOPEX, ANGERS, and Dessain data.
  10. Identifying the dimensionality and visualizing major dimensions of the spectral data, along with some miscellaneous analyses described in manuscript (1) above.

Manuscript (1) only involves scripts 1-3 and 10. Manuscript (2) involves all scripts. The processed spectral and trait data products produced at the end of script 3, and read in at the beginning of script 4, are archived elsewhere (see Associated data below). If you are trying to reproduce analyses from manuscript (2), I recommend that you start by reading in the archived data at script 4 and proceed from there, clearing your environment in between scripts. If you are trying to reproduce analyses from manuscript (1), proceed right to script 10.

How to use

This repository is not a software package or any sort of user-oriented product that people can use without further modification. It is meant to be a reasonably well-documented and faithful record of the analyses carried out in the two manuscripts listed above. Some analyses should be easily reproducible, with some modification, given the scripts and the archived data (see Associated data below). However, we do not (for example) include the TRY data, the LOPEX and ANGERS data, and the model coefficients for the Serbin et al. (2019) paper that both of our papers cite. These can all be downloaded from original sources cited in those papers. Users will also have to (for example) change paths to the directories where they have saved the files locally.

Associated data products

There are a few associated data products:

  1. The main CABO dataset, including traits and spectra, are found here at EcoSIS. Reflectance, transmittance, and absorptance spectra are available.
  2. Our own Dessain data used in the external validation are available at that EcoSIS link.
  3. LOPEX and ANGERS data used in the external validation are available at those respective links.
  4. In script 10, I read in fresh-, pressed-, and ground-leaf spectra from another manuscript (Kothari et al. 2022 Methods in Ecology and Evolution) available at those respective links.

Even more raw data can be queried from the CABO Data Portal.

In most of the scripts, I use the CRAN-hosted package spectrolab v. 0.0.10 to handle the spectral data. In this package, the class spectra allows users to attach and retrieve metadata from spectral data using the function meta(). Below, you can find an example script that reads a .csv file, like our archived data, and turns it into an R spectra object like the ones I work with in the script.

library(spectrolab)

spec_df<-read.csv("mydata.csv")
name_var<-1 ## index for the column that contains sample names
meta_vars<-2:20 ## adjust as needed: indices for columns that contain metadata (including traits)
band_names<-400:2400 ## wavelengths of spectral bands corresponding to remaining columns

## you can also use the as_spectra command, but it's a bit more finicky 
## with data frames because the column names of bands must contain a letter
spec<-spectra(value = spec_df[,-c(name_vars,meta_vars)],
              band_names = 400:2400,
              names = spec_df[,name_var],
              meta = spec_df[,meta_vars])

Questions?

Please feel free to reach me at shan.kothari [at] umontreal [dot] ca with questions about the code or the data. I'm glad to help you adapt any of the approaches I use for your own purposes. If you draw heavily from my code, I would ask that (purely as a courtesy) you cite one of the two papers above⁠—whichever is more relevant.