Skip to content

Generating confidences intervals around minimum sample size estimates

License

Notifications You must be signed in to change notification settings

HobanLab/Salas_RaMP_ResamplingPredictionIntervals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repository contains code that builds upon Quercus_IUCN_samp_sims, a previous simulation project by Kaylee Rosenberger. The goal of this subproject is to assess the variation in minimum sample size estimates (MSSEs) required to maintain genetic diversity of various ex situ oak collections. In this subproject, we calculate prediction intervals around MSSEs required for 95% allelic representation. We also explore how the 95% MSSE changes based on sampling of few to many loci for alleles of different frequency categories using MSAT and SNP genetic marker datasets.

Repository structure

Scripts

  1. MSSE_confidenceIntervals.R
  • Script calculates the confidence interval (CI) values around the 95% MSSE using IUCN 14 oaks dataset, and builds a matrix that stores the CI values. Script visualizes allelic representation at a number of randomly sampled individuals for each species with scatterplots, and saves images in .pdf format.
    • Inputs
      • IUCN 14 oaks: quercus_final_results_orig.Rdata--a resampling array containing the total allelic representation values for oaks simulated by Kaylee Rosenberger; source code found in the Quercus_IUCN_samp_sims repo
    • Outputs
      • Quercus14_CI_values.csv
      • 14CIplots.pdf
      • 14CIWidthplots.pdf
      • 14CIWidthplotshigh.pdf
      • 14CIWidthplotslow.pdf
  1. QUAC_MSSE_Quantiles.R
  • Script calculates MSSE means and quantiles, and generates plots for the total allelic representation (and other categories of allelic frequency) in order to create confidence intervals around 95% minimum sample size estimates. The approach used in this script for calculating allelic representation confidence intervals is improved upon by using the predict function (see MSSE_PredictionIntervals.R).
    • Inputs
      • QUAC_Subset_resampArrs folder--resampling arrays built from Quercus acerifolia (QUAC) microsatellite (MSAT) and single nucleotide polymorphism (SNP) genetic data (for SNPs, R0 and R80). These datasets are all subset to the same number of samples, to allow for greater comparability between marker types and missing data levels.
  1. MSSE_PredictionIntervals.R
  • Script calculates the prediction interval (PI) values around the 95% MSSE using two different datasets (QUAC and IUCN 14 oaks), and builds a matrix that stores the PI values
    • Inputs
      • QUAC: QUAC_Subset_resampArrs folder--resampling arrays built from Quercus acerifolia (QUAC) microsatellite (MSAT) and single nucleotide polymorphism (SNP) genetic data (for SNPs, R0 and R80). These datasets are all subset to the same number of samples, to allow for greater comparability between marker types and missing data levels.
      • IUCN 14 oaks: quercus_final_results_orig.Rdata--a resampling array containing the total allelic representation values for oaks simulated by Kaylee Rosenberger; source code found in the Quercus_IUCN_samp_sims repo
    • Outputs
      • QUAC: QUAC_PI_values.csv
      • IUCN 14 oaks: Quercus14_PI_values.csv
  1. QUAC_QUBO_loci_bootstrapping.R
  • Script builds resampling arrays based on different ranges of randomly sampled loci, calculates the prediction intervals around the 95% MSSEs, and builds a matrix that stores the PI values
    • Inputs
      • LociBootstrapping_Datasets folder--genpop objects for wild populations of Q. acerifolia (QUAC) and Q. boyntonii (QUBO), saved as R objects.
    • Outputs
      • QUAC_MSSE_Quantiles.csv

Datasets

This folder contains the input files read in by the analyses in the Scripts folder (see outline of Inputs above). These files are typically either resampling arrays (sets of allelic representation values, for a given number of randomly drawn samples) or genpop objects (read in using the adegenet library) from which resampling arrays are built.

Outputs

This folder contains the CSV outputs generated by the analyses in the Scripts folder (see outline of Outputs above). Generally, the contents of these CSVs are minimum sample size estimates and upper/lower the confidence intervals (CI) or prediction intervals (PI) bounding them.

Archive

This folder contains one archived R script and one archived .csv file that stores the original for loop necessary to loci bootstrap and analyze a resampling array that calculates prediciton intervals around the 95% MSSE, and builds a matrix that stores the PI values.

About

Generating confidences intervals around minimum sample size estimates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages