Skip to content

Analyzing population structure, demographic history, and regions of selection using whole-genome data in bottlenose whales

Notifications You must be signed in to change notification settings

edegreef/NBW-resequencing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

This is a repository for scripts used in analyzing northern bottlenose whale (Hyperoodon ampullatus) resequencing data. These scripts were executed on computing resources through the University of Manitoba and Compute Canada. Project is a collaboration with the Marine Gene Probe Lab & Whitehead Lab at the Dalhousie University, the University of Manitoba, and University of St. Andrews.

Reference genome folder: 📁

  • Evaluated genome quality with QUAST, Assemblathon, BUSCO
  • Mapped chromosome alignment through synteny with Satsuma
  • Annotated genome with MAKER and added functional annotations with BLAST+ and Interproscan

Resequencing data folder: 📁

Processing sequencing data

  1. Trimmed fastqs with trimmomatic
  2. Mapped fastqs to reference genome with bwa
  3. Removed duplicate reads with picard
  4. Added readgroups with picard
  5. Checked modal coverage of all the samples
  6. Downsampled some bams with GATK to maintain a consistent modal coverage before calling snps
  7. Merged sequence files for two individuals to increase coverage

Initial SNP prep

  1. Called variants from bams and reference genome using Platypus
  2. Removed indels then extracted snp metrics with bcftools and vcftools
  3. Looked at snp metrics in R
  4. Filtered snps based on quality with GATK and vcftools
  5. Used bcftools to edit sample ID label (removing the path in each sample ID name) and to add word "contig" to each scaffold name so data can be used in downstream analyses that won't take data with an interger as scaffold name.

Sex chromosomes

  1. Notes on using DifCover for coverage comparisons between males and females
  2. Examined DifCover results in R
  3. Used bedtools to make annotated bed file with X and Y regions
  4. Finalized list of sex scaffolds in R

More SNP prep

  1. Filtered X and Y-linked SNPs. Used bcftools to create vcfs of X and Y snps, then used CHROM and POS information from those vcfs to filter out X and Y snps in full snp set using vcftools. Also made a separate file with LD-pruned snps using LD_pruning.sh.
  2. Note on running breakdancer to detect structural variants.
  3. a) Examining structural variant regions in R and creating a list from vcf positions for filtering. b) filtering SVs out
  4. Filtered out small scaffolds (< 50kb).
  5. a) Filtered out individuals with very high missing data and ran pairise kinship estimates using plink. b) Examined kinship results in R to identify kin pairs.
  6. Filtered out snps that did not match Hardy-Weinberg Equilibrium.
  7. Removed an individual from each kin pair, and prepped snp files for pop analyses.

Population analyses folder: 📁

  • Made site map including ocean depth using a variety of R packages
  • Evaluated PCA with pcadapt and created heatmap matrix of PC1 and PC2 distances
  • Evaluated t-SNE with Rtsne
  • Estimated FST and heterozygosity with hierfstat, created FST heatmap, and also analyzed isolation-by-distance
  • Identified private and shared snps with bcftools isec
  • Ran sNMF with LEA, looked at results, and plotted pie chart admixture map
  • Ran ADMIXTURE with shell script
  • Filtered beagle file to match snp sites and then ran NGSadmix
  • Estimated changes in effective population sizes with SMC++ and PSMC
  • Estimated ROH with PLINK and plotted in R
  • Identified regions under selection using rehh

About

Analyzing population structure, demographic history, and regions of selection using whole-genome data in bottlenose whales

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published