Skip to content

ThorntonLab/simyakCNVSI

Repository files navigation

#Supplemental files for Rogers et al. Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans.

Strain names and numbers are found in FlyLines.txt CY=Cameroon D.yakuba NY= Nairobi D.yakuba

MD=Madagascar D.simulans NS=Nairobi D.simulans

Chromosome names and numbers are found in DsimChromNames.txt DyakChromNames.txt

Genes captured by duplications are found in DupGenes.dsim.txt DupGenes.dyak.txt

Bed Format files compatible with UCSC browser are found in DsimTandemDups.bed
DyakTandemDups.bed Format: chrom start stop identifier freqency

##CNV calls Raw data calls prior to all filtering (including ancestral duplications, reference duplications, and divergent reads spanning more than 25kb) are in RawCallsByLineSim/*.div.cov3.mm2.rearr RawCallsByLineYak/*.div.cov3.mm2.rearr

The format of these files is 10 columns:

  1. id = Event identification number (arb. integer)
  2. chrom1 = Chromosome number in reference where the first read cluster is
  3. coverage = Number of read pairs supporting the event
  4. strand1 = Strand of first read cluster. 0 = plus, 1 = minus
  5. start1 = Start position of first read cluster.
  6. stop1 = Stop position of second read cluster.
  7. chrom2 = Chromosome number in reference where the second read cluster is
  8. start2 = Start position of second read cluster.
  9. stop2 = Stop position of second read cluster.
  10. reads = Pipe-separated (the pipe is the | character) list of the read pairs supporting the event. Format is readPairName;start,stop,strand,start,stop,strand, where the last two values are for the two reads in the pair.

##Transposable element (TE) calls

The TE calls are in the TEcalls directories. The file formats are based on the line number used in the paper. Thus, there are 20 files per species. The format is the following:

  1. Chromsosome
  2. Insert site position 1
  3. Insert site position 2
  4. Annotation information
  5. shared or novel>

    The second and third column represent the range of possible insert site postions. These positions are zero-offset (e.g., position 1 on each chromosome is 0). Column 4 is a crude attempt at annotating what the TE is. (Note that the annotation scheme differs from Cridand et al. doi: 10.1093/molbev/mst129. Further, our purpose in the paper was to define presence/absence and thus we ingore the annotation information for our own analyses. The issue of accurate TE annotation of the reference genomes needs to be addressed in the future in order to improve the annotation of TE presence/absence polymorphism.)

    The final column simply records whether or not the TE is found in the relevant reference genome sequence or not.

About

Supplementary material related to Rogers et al.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published