GitHub - edegreef/NBW-resequencing: Analyzing population structure, demographic history, and regions of selection using whole-genome data in bottlenose whales

This is a repository for scripts used in analyzing northern bottlenose whale (Hyperoodon ampullatus) resequencing data. These scripts were executed on computing resources through the University of Manitoba and Compute Canada. Project is a collaboration with the Marine Gene Probe Lab & Whitehead Lab at the Dalhousie University, the University of Manitoba, and University of St. Andrews.

Reference genome folder: 📁

Evaluated genome quality with QUAST, Assemblathon, BUSCO
Mapped chromosome alignment through synteny with Satsuma
Annotated genome with MAKER and added functional annotations with BLAST+ and Interproscan

Resequencing data folder: 📁

Processing sequencing data

Trimmed fastqs with trimmomatic
Mapped fastqs to reference genome with bwa
Removed duplicate reads with picard
Added readgroups with picard
Checked modal coverage of all the samples
Downsampled some bams with GATK to maintain a consistent modal coverage before calling snps
Merged sequence files for two individuals to increase coverage

Initial SNP prep

Called variants from bams and reference genome using Platypus
Removed indels then extracted snp metrics with bcftools and vcftools
Looked at snp metrics in R
Filtered snps based on quality with GATK and vcftools
Used bcftools to edit sample ID label (removing the path in each sample ID name) and to add word "contig" to each scaffold name so data can be used in downstream analyses that won't take data with an interger as scaffold name.

Sex chromosomes

Notes on using DifCover for coverage comparisons between males and females
Examined DifCover results in R
Used bedtools to make annotated bed file with X and Y regions
Finalized list of sex scaffolds in R

More SNP prep

Filtered X and Y-linked SNPs. Used bcftools to create vcfs of X and Y snps, then used CHROM and POS information from those vcfs to filter out X and Y snps in full snp set using vcftools. Also made a separate file with LD-pruned snps using LD_pruning.sh.
Note on running breakdancer to detect structural variants.
a) Examining structural variant regions in R and creating a list from vcf positions for filtering. b) filtering SVs out
Filtered out small scaffolds (< 50kb).
a) Filtered out individuals with very high missing data and ran pairise kinship estimates using plink. b) Examined kinship results in R to identify kin pairs.
Filtered out snps that did not match Hardy-Weinberg Equilibrium.
Removed an individual from each kin pair, and prepped snp files for pop analyses.

Population analyses folder: 📁

Made site map including ocean depth using a variety of R packages
Evaluated PCA with pcadapt and created heatmap matrix of PC1 and PC2 distances
Evaluated t-SNE with Rtsne
Estimated FST and heterozygosity with hierfstat, created FST heatmap, and also analyzed isolation-by-distance
Identified private and shared snps with bcftools isec
Ran sNMF with LEA, looked at results, and plotted pie chart admixture map
Ran ADMIXTURE with shell script
Filtered beagle file to match snp sites and then ran NGSadmix
Estimated changes in effective population sizes with SMC++ and PSMC
Estimated ROH with PLINK and plotted in R
Identified regions under selection using rehh

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
pop_analyses		pop_analyses
reference_genome		reference_genome
resequencing_data		resequencing_data
NBW-cartoon-forgit.JPG		NBW-cartoon-forgit.JPG
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reference genome folder: 📁

Resequencing data folder: 📁

Processing sequencing data

Initial SNP prep

Sex chromosomes

More SNP prep

Population analyses folder: 📁

About

Releases

Packages

Languages

edegreef/NBW-resequencing

Folders and files

Latest commit

History

Repository files navigation

Reference genome folder: 📁

Resequencing data folder: 📁

Processing sequencing data

Initial SNP prep

Sex chromosomes

More SNP prep

Population analyses folder: 📁

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages