GitHub - ThorntonLab/simyakCNVSI: Supplementary material related to Rogers et al.

#Supplemental files for Rogers et al. Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans.

Strain names and numbers are found in FlyLines.txt CY=Cameroon D.yakuba NY= Nairobi D.yakuba

MD=Madagascar D.simulans NS=Nairobi D.simulans

Chromosome names and numbers are found in DsimChromNames.txt DyakChromNames.txt

Genes captured by duplications are found in DupGenes.dsim.txt DupGenes.dyak.txt

Bed Format files compatible with UCSC browser are found in DsimTandemDups.bed
DyakTandemDups.bed Format: chrom start stop identifier freqency

##CNV calls Raw data calls prior to all filtering (including ancestral duplications, reference duplications, and divergent reads spanning more than 25kb) are in RawCallsByLineSim/*.div.cov3.mm2.rearr RawCallsByLineYak/*.div.cov3.mm2.rearr

The format of these files is 10 columns:

id = Event identification number (arb. integer)
chrom1 = Chromosome number in reference where the first read cluster is
coverage = Number of read pairs supporting the event
strand1 = Strand of first read cluster. 0 = plus, 1 = minus
start1 = Start position of first read cluster.
stop1 = Stop position of second read cluster.
chrom2 = Chromosome number in reference where the second read cluster is
start2 = Start position of second read cluster.
stop2 = Stop position of second read cluster.
reads = Pipe-separated (the pipe is the | character) list of the read pairs supporting the event. Format is readPairName;start,stop,strand,start,stop,strand, where the last two values are for the two reads in the pair.

##Transposable element (TE) calls

The TE calls are in the TEcalls directories. The file formats are based on the line number used in the paper. Thus, there are 20 files per species. The format is the following:

Chromsosome
Insert site position 1
Insert site position 2
Annotation information
shared or novel>
The second and third column represent the range of possible insert site postions. These positions are zero-offset (e.g., position 1 on each chromosome is 0). Column 4 is a crude attempt at annotating what the TE is. (Note that the annotation scheme differs from Cridand et al. doi: 10.1093/molbev/mst129. Further, our purpose in the paper was to define presence/absence and thus we ingore the annotation information for our own analyses. The issue of accurate TE annotation of the reference genomes needs to be addressed in the future in order to improve the annotation of TE presence/absence polymorphism.)

The final column simply records whether or not the TE is found in the relevant reference genome sequence or not.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
RawCallsByLineSim		RawCallsByLineSim
RawCallsByLineYak		RawCallsByLineYak
TECalls		TECalls
CY17C.pacbioconf.txt		CY17C.pacbioconf.txt
CY21B3.pacbioconf.txt		CY21B3.pacbioconf.txt
DsimChromNames.txt		DsimChromNames.txt
DsimTandemDups.bed		DsimTandemDups.bed
DupGenes.dsim.txt		DupGenes.dsim.txt
DupGenes.dyak.txt		DupGenes.dyak.txt
DyakChromNames.txt		DyakChromNames.txt
DyakTandemDups.bed		DyakTandemDups.bed
FlyLines.txt		FlyLines.txt
NY66-2.pacbioconf.txt		NY66-2.pacbioconf.txt
NY73.pacbioconf.txt		NY73.pacbioconf.txt
README.md		README.md
RearrFreq.sim.txt		RearrFreq.sim.txt
RearrFreq.yak.txt		RearrFreq.yak.txt

ThorntonLab/simyakCNVSI

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks