Skip to content

Dario-Galanti/Exoreads_treasure

Repository files navigation

Exoreads_treasure

This repo contains scripts to extract exogenous reads, not mapping to the target species, from large sequencing datasets.

We used them on a sequencing dataset of more than 200 Thlaspi arvense lines, published previously here

extract_unmapped.sh
Extract exogenous (unmapped) reads from mapping bam files and recovers them from the original fastq files. For mapping refer to my previous script. For large datasets parallelization can be implemented submitting each sample as a separate job.

BWA_multi_accurate_align_BinAC.sh
Perform high confidence alignment of multiple samples in parallel. We used it to map non-target reads (not mapping to the T. arvense genome) to the aphid genome Mizus persicae and its symbiont Buchnera aphidicola, to quantify aphid infestation of our T. arvense collection.

AmbigReads_cleanup_BinAC.sh
Remove ambiguous reads from the target species alignments (T. arvense in our case). These are reads mapping to the target (T. arvense), but also to contaminants (either the aphid, buchnera or mildew genomes). If not removed, these ambiguous reads can create false positive SNPs strongly associated to the number of Aphid, Buchnera and Mildew reads. Before running this script raw reads should be mapped to the target and any other suspected contaminant. To save space, we suggest removing unmapped reads when mapping to the contaminants (samtools view -F 4).

Region_bedcov.sh
Calculate coverage of a set of bam files in a specific region, divided into bins, using samtools bedcov.

About

This repo contains scripts to extract exogenous reads, not mapping to the target species, from large sequencing datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages