Skip to content

Bioinformatic workflow to choose the optimal genotyping strategy using RADs (ddRADseq-GBS) from individual-replicates pilot studies

License

Notifications You must be signed in to change notification settings

GGFHF/RADdesigner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

README

/bin/

Contains the workflow scripts for the sections Data post-filtering and error rates of the manuscript "RADdesigner: a workflow to select the optimal sequencing methodology in genotyping experiments on woody plant species".

The scripts in /bin should be run in the order they are numbered. R functions used by some of these scripts are not numbered. The parameters that need to be modified in each script are indicated at the top of the script. Comments with capital letters indicate the command lines that need to be modified to include path to specific applications or to the user's working directory.

Scripts content:

  • 0.Packages_dependencies/0.R_dependendies_installation.R R scritp to install and call the packages required to run subsequent R scripts.
  • 0.Packages_dependencies/VCFtools_installation_instructions.txt Text file with the command line instructions to install VCFtools.
  • 1.Filtering_multiallelicSNPs/1.Filtering_multiallelicSNPs.sh Bash script to filter multi-allelicSNPs from .vcf files, inside this script we call two R scripts:
  • 1.Filtering_multiallelicSNPs/multi_bi_allelic_filtered.R R script to separate multi-allelic from bi-allelic SNPs in each vcf.file
  • 1.Filtering_multiallelicSNPs/plot_percentage_multi_allelicSNPs.R R script to plot the percentage of multi-allelic SNPs filtered for each combination. Also this script plots the number of filtered loci/SNPs after filtering.
  • 2.Depth_locusSNPerror/2.Depth_locusSNPerror.R R script that contains a custom workflow to quantify and plot read-depth and both locus/SNP error rates. Also this script quantifies and plots the dendrograms of similarity for each sample and for each combination.
  • 2.Depth_locusSNPerror/Mastretta_functions/LociAllele_error_pyrad.R Mastretta-Yanes et al. (2014), R function to quantify locus-error-rate.
  • 2.Depth_locusSNPerrorMastretta_functions/SNPs_error.R Mastretta-Yanes et al. (2014), R function to quantify SNP-error-rate.
  • 3.Prepare_vcf/3.Prepare_vcf.shcontains a custom bash script to organize the .vcf files to ease file management in the next step.
  • 4.Filtering_locusSNPerror/4.Filtering_locusSNPerror.Rquantifies and filter the locus/SNP error.

These scripts use the vcf files data recovered from the IpyRAD software. Each script uses the output of the previous script as an input.

/Input_files/

Contains the required information files to run the scripts placed in /bin directory.

  • Input_files/INPUT_VCF_DATA is a directory to stage the input .vcf files. Be sure to store your duplicate input vcf data using the string "_d".
  • Input_files/Ch_combinations.csv is a file with the characteristics of the combinations.
  • Input_files/Samples_names.txtis a file with the names of your individuals and the duplicates placed in a column.

About

Bioinformatic workflow to choose the optimal genotyping strategy using RADs (ddRADseq-GBS) from individual-replicates pilot studies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages