Skip to content
Decoy alignment sequences
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
blacklists
README.md
RSRS.fa
eve1.fa
human_alphasat.fa
human_alu.fa
human_chrM2x.fa
human_rDNA.fa
human_repeats.fa
mouse_chrM2x.fa
rCRSd.fa

README.md

Genome assembly decoy sequences

This repository contains fasta files for pre-alignments. These are useful for studies of repeats, mtDNA, or using as decoy sequences.

Contents:

Repeats

  • human_alu.fa - Manually curated sequences from GenBank for alu elements.
  • human_alphasat.fa - Manually curated sequences from GenBank annotated for centromeric or alpha-satellite DNA repeats.
  • human_rDNA.fa - Complete human ribosomal repeat unit
  • human_repeats.fa: a combination of the 3 of above (alu, alphasat, and rDNA) , produced with cat human_alu.fa human_alphasat.fa human_rDNA.fa > human_repeats.fa.

mtDNA

  • rCRSd.fa is the Revised Cambridge Reference Sequence (rCRS) of the Human Mitochondrial DNA obtained from NC_012920. It is duplicated (pasted 2 times, one right after another) to simulate circular DNA for aligners that do not have a circular setting (hence the appended d to the name). This is the assembly used in hg38.
  • RSRS.fa is from this paper.
  • eve1.fa is from this paper.
  • chrMx - Doubled chrM derived from AF347015, the African Yoruban sequence used in the hg19 assembly.

Pre-built indexes

You can download pre-built refgenie reference genomes indexes for use in pipelines here:

To use these with pypiper pipelines, just unzip the folder and place in your genomes folder (e.g. $GENOMES).

Built it yourself

Build genome references for these decoy sequences using refgenie like this:

refgenie.py -i hg19_alphasat.fa

A complete setup:

GENOMES=decoy_genomes

pip install --user --upgrade https://github.com/epigen/pypiper/zipball/master
git clone https://github.com/databio/ref_decoy.git
git clone https://github.com/databio/refgenie.git

for fa_file in `ls ref_decoy/*.fa`; do python refgenie/src/refgenie.py -i $fa_file; done

Or, using the refgenie docker image (adds -d):

for fa_file in `ls ref_decoy/*.fa`; do python refgenie/src/refgenie.py -d -i $fa_file; done
You can’t perform that action at this time.