Skip to content

Identifying the influence of past range shifts of the rare butternut tree using genetic diversity, geographic locations, and species distribution modeling.

Notifications You must be signed in to change notification settings

HobanLab/Butternut

Repository files navigation

Project Description

This project file contains R Scripts created for a project on Juglans cinerea, also known as butternut. We are interested in determining how butternut's range has shifted in response to modern and past climate changes. Hoban et al. (2010) determined that butternut's genetic patterns were most inluenced by post-glacial migration northward, so this study added individuals sampled by the Jeanne Romero-Severson and Martin William's labs to identify if these results are consistent when adding more comprehensive sampling in this species' northern range. We also wanted to more specifically examine past sources of data to identify the range shifts of this taxon. Therefore the main focuses on this project are listed here:

Alt text

The code written for performing two of the steps are in this Github; code for downloading, projecting, and creating pollen and glacier maps were created by Alissa Brown (https://github.com/alissab/juglans). This repository is split into multiple folders: Archive, Images, SDMs, and Genetic_Analyses.

Folder Descriptions

Archive: This is a folder for code that was used in initial steps of this analysis but did not end up in the final manuscript. Most of this code was performed on more individuals than ended up in the final results (were removed due to isolation or genetic relatedness) or for loci that differed based on scoring year.

Images: This folder contains images generated for organizing the Github and population maps.

SDMs: This folder is for the code to run "species distribution models," specifically with boosted regression tree models as described in Elith et al. (2008). The code published here is based on code from Peter Breslin (ASU) and Fabio Suzart de Albuquerque. Part of this project was interested in identifying butternut's ecological preferences using species distribution modeling and then modeling habitat requirements into the past. This folder contains the code for performing all of these analyses. Here is a conceptual diagram of the steps and files names:

Alt text

The order these steps were performed in is indicated with the number in their name. All R Scripts used for SDM analysis have "SDM" written next the number and are briefly summarized below:

  1. First, occurrence records must be downloaded from herbarium databases (in this case, Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), South East Regional Network of Expertise and Collections (SERNEC), Botanical Information and Ecology Network (BIEN), and the national network of forest survey plots managed by the Forest Inventory and Analysis Program (FIA) of the USDA Forest Service) using code that was based off code created by Emily Beckman (https://github.com/esbeckman/IMLS_Beckman). Then, these occurrences were used to create a range polygon that buffered out 100 km from the range edges (which was eventually used for genetic diversity analyses).
  2. Occurrence records were reduced for spatial autocorrelation by selecting one point for every point that is within 1 km of others.
  3. Pseudo-absence points were generated in equal number to presence points.
  4. Following this, 19 Worldclim variables were extracted at every point. Variables that were most correlated with presence and least correlated with one another (correlation coefficient < 0.5) were selected to model species distribution. This was done by generating a dissimilarity matrix and choosing the variable that had the highest correlation with presence in each cluster of highly correlated climate variables.
  5. The final boosted regression tree model was run with five variables and from these predictors a model of suitable habitat was made and then hindcast into past climate scenarios using eight time periods representing notable periods in post-LGM climatic history, available from the Paleoclim database (Brown et al., 2018): 130 (last interglacial); 22 (LGM); 17-14.7 (Heinrich-Stadial); 14.7-12.9 (Bolling-Allerod); 12.9-11.7 (Younger Dryas); 11.7-8.326 (early Holocene); 8.326-4.2 (mid-Holocene); and 4.2-0.3 (late Holocene) ka YBP (thousand years before present).

Files within SDMs folder

  • InputFiles
    • Description: All input files utilized for generating butternut's species distribution model, habitat suitability maps, and projecting the habitat suitability maps into the past.
    • bio_2-5m_bil
      • Description: Folder containing all 19 current (averages over 1970 - 2013) bioclimatic layers used to generate butternut's species distribution model, downloaded at 2.5 m resolution.
    • bound_p
      • Description: Folder containing the shapefile used to limit the area of extent to make butternut range maps.
    • occurrence_records
      • Description: The folder containing all of the raw records used to create the full occurrence record file for this species - titled, butternut_complete_occurrence.
    • Paleo_Files
      • Description: Within this file is the values of all 19 bioclimatic variables during the last 130,000 years. These variables were used to project butternut's distribution model into the past and are separated into 8 separate time points. These bioclimatic values were all downloaded at 2.5 m resolution from Paleoclim.
    • butternut_abs: CSV of pseudo-absence points used in generating the boosted regression trees that is used to predict butternut's stuitable habitat.
    • butternut_pa: CSV of all presence and pseuod-absence points used in the BRT model generating butternut's species distribution, with an additional column coded indicating what is a presence point (1) and what is an absence point (0).
    • butternut_var: CSV of all presence and pseudo-absence points (indicated by a 1 and 0, respectively) with the values of all 19 bioclimatic variables at the location of the occurrence record extracted to each point.
    • elevation_extent: TIF file of North American elevation limited to the extent of the analysis.
    • extent_project: TIF file of North American elevation limited to the extent of this analysis project to Albers Equal Area Conic projection.
    • occurrence_noauto_noproj: Occurrence records used in this analysis, cleaned for spatial autocorrelation and not projected.
  • OutputFiles
    • Description: All of the result files generated when creating the species distribution model of butternut.
    • PaleoClim_HSMs
      • Description: PDF and images of the habitat suitability maps for butternut during the last 130,000 to present.
    • biseral_cor_matrix: CSV of the biserial correlation coefficients between presence points and each bioclimatic variable.
    • contribution_bp_allvar: PDF of selected bioclim variables used to generate butternut's SDM and their percent contribution to the final SDM.
    • contribution_bp_allvar: PDF and image of the bioclim variables dissimilarity neighbor-joining tree. Variables are joined by least dissimilar to most dissimilar. Threshold for non-significance autocorrelation is marked with a red line at 0.5 ranked dissimilarity.
    • extent_pointsnoauto: PDF of butternut's habitat suitability map with occurrence records plotted on it.
    • hsm: PDF of butternut's habitat suitability map.
    • hsm_a: PDF of butternut's habitat suitability map with pseudo-absence points plotted on top of it.
    • hsm_pa: PDF of butternut's habitat suitability map with presence and pseudo-absence points plotted on top of it.
    • hsm_worldclim_only_raster: Raster of butternut's habitat suitability map.
    • worldclim_stack: RData file, TIF file, and GRD file of all selected worldclim variables stacked.

Genetic Analyses: The genetic diversity portion of this project is contained in the genetic_analyses folder, which contains the R Scripts to run genetic diversity and structure analyses, along with the regressions between genetic diversity and geographic location. We used genetic data from the publication Hoban et al. (2010) and newer sampling efforts on butternut from 2011 - 2015. These individuals were collected by Jeanne Romero-Severson, Sean Hoban (https://github.com/smhoban), and Martin Williams over the course of near ten years with a major sampling effort closer to 2009 and then followed up by another round of sampling 2012 - 2015. The initial individuals that were collected were genotyped by Sean Hoban and then subsequent individuals were genotyped in the Romero-Severson lab at Notre Dame non-consequetively. The order these analyses were performed in is indicated with a preceeding number. The script labeled "comparison_barplot" was code used to determine if there were scoring differences between researcher which led to some re-binning analyses to ensure consistency of allele scoring when researchers differed. Then, the code for removing individuals based on missing data and relatedness was designed so PCoA and structure could be run. The individuals were also plotted on a map following removal for missing data. Also, mean latitude, longitude, allelic richness and heterozygosity were calculated for all populations. Finally, a loop was written to compare mean latitude and distance to range edge of each population to genetic diversity.

Alt text

Folders within Genetic_Analyses

data_files: Data files generated throughout the process of some of the analyses are stored here. Before_reorg contains the 3 population genind used for the comparison barplot code, after_reorg contains all of the genind and data frames generated following the re-binning process. geographic_files contains many of the geographic files used in analyses, including the range shapefile with the 100 km buffer (butternut_buffer) and many of the mea population longitude and latitude data frames.

genetic_analyses_results: All of the outputs from genetic analyses that were used in the final publication, including figures and data frames, are stored here.

Files within Genetic_Analyses

  • data_files
    • after_reorg
      • Description: Following geographic reorganization into populations and rebinning analysis, these are the files used in the final genetic analyses included in the manuscript.
      • butternut_24pop.gen: Genepop file of 1721 butternut individuals following geographic reorganization into 24 populations.
      • butternut_24pop_nomd: Genepop, Genalex, CSV, and Arlequin files of genotypes of butternut individuals used for main genetic analyses (cleaned for clones and missing data). When lonlat is included, it includes coordinates and when "relate" is included, its used for relatedness analysis.
      • butternut_24pop_relate_red: CSV, genepop, and Arlequin files of genotypes of butternut individuals, reduced for individuals with missing data and too high of relatedness, leaving 993 individuals.
      • butternut_26pop_nomd: Genepop and Arlequin files of genotypes of butternut individuals including Quebec population individuals. CSV file titled "lonlat" includes coordinate information for all individuals.
      • butternut_26pop_relate_red: Genepop, Arlequin, and CSV files of genotypes of butternut individuals reduced by relatedness but with Quebec butternut individuals.
    • before_reorg
      • Description: Genepop files from initial analyses on butternut individuals; rebinning analysis and geographic analyses were performed on these individuals to then yield the data files in the "after_reorg" file.
      • butternut_3pops: Genepop file of all initial 1761 butternut individuals, sorted by the person who scored data files. This document was used in the initial rebinning analysis to determine if there were differences in scoring with different individual scorers.
      • butternut_44pops: Genepop file of all initial 1,761 which was used for initial genetic and geographic analyses.
    • geographic_files
      • Description: Files used in generating maps and all geographic analyses for butternut.
    • butternut_buffer: Shapefile of butternut's total range, generated from occurrence records and buffered out by 100 km and cleaned to have smooth edges.
      • buffernut_buffer_map: Map of mean population coordinates plotted over the butternut range buffer.
      • butternut_coord_df: CSV of the mean coordinates of the main 24 populations used in this analysis with raw names and color-coded by population.
      • butternut_dist_edge_df: CSV of the distance to range edge for each of the 24 populations.
      • max_min_lonlat: CSV of the minimum and maximum coordinates of the butternut range.
  • genetic_analyses_results
    • Clustering_Analyses
        • STR
          • Butternut_Structure_24pops: Folder containing results of structure runs on 993 individuals.
          • Butternut_Structure_26pops: Folder containing results of structure runs on 1005 individuals./li>
        • Butternut_24pop_PCoA: PDF, PNG, and Inkscape file of a PCoA of 993 butternut individuals.
        • Butternut_24pop_Structure: Structure diagram of 993 individuals of the best supported K value for structure.
        • Butternut_24pop_Structure_alldeltaK_Supplement.PNG: Image file of several delta K values for 993 butternut individuals, presented in the supplement.
        • Butternut_26pop_PCoA_Structure: PDF, PNG, and Inkscape file of a PCoA of the 1005 butternut individuals including Quebec individuals.
      • Diversity_Analyses
          • butternut_24pop_allrich_rp_df.csv: R2 and p-values for regressions between each 24 butternut populations' geographic statistics (mean population latitude; distance to range edge) and allelic richness.
          • butternut_24pop_gendiv_stat_df.csv: Genetic diversity summary stats of 24 populations of butternut populations.
          • butternut_24pop_hexp_rp_df.csv: R2 and p-values for regressions between each 24 butternut populations' between geographic statistics (mean population latitude; distance to range edge) and expected heterozygosity.
          • butternut_24pop_hwe.csv: Hardy-Weinberg Equilibrium deviation by population 24 populations.
          • butternut_24pop_ld_loci.csv: Linkage disequilibrium between each of the 11 loci.
          • butternut_26pop_allrich_rp_df
          • butternut_26pop_hexp_rp_df
        • Rebinning_Analyses
            • PCoA

        R Information

        R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

        attached base packages: [1] stats graphics grDevices utils datasets methods base

        other attached packages: [1] poppr_2.9.2 Demerelate_0.9-3 adegenet_2.1.3 ade4_1.7-17 diveRsity_1.9.90 rgdal_1.5-23 sp_1.4-5 pegas_1.0-1 ape_5.5 PopGenReport_3.0.4 hierfstat_0.5-7 poppr_2.9.2

        References

        Brown, J. L., Hill, D. J., Dolan, A. M., Carnaval, A. C., & Haywood, A. M. (2018). PaleoClim, high spatial resolution paleoclimate surfaces for global land areas. Scientific data, 5(1), 1-9

        Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.

        Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas. International journal of climatology, 37(12), 4302-4315.

        Hoban, S. M., Borkowski, D. S., Brosi, S. L., McCLEARY, T. S., Thompson, L. M., McLachlan, J. S., ... Romero-Severson, J. (2010). Range‐wide distribution of genetic diversity in the North American tree Juglans cinerea: A product of range shifts, not ecological marginality or recent population decline. Molecular Ecology, 19(22), 4876-4891.

About

Identifying the influence of past range shifts of the rare butternut tree using genetic diversity, geographic locations, and species distribution modeling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages