Skip to content

code and scripts relevant to the Drosophila Laboratory Pangenome Database

License

Notifications You must be signed in to change notification settings

chakrabortymlab/DLPD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

Drosophila Laboratory Pangenome Database (DLPD)

Since T.H. Morgan and his associates in the famous Fly Room began their foundational work in genetics using the mighty fruit fly Drosophila melanogaster, numerous strains of D. melanogaster with diverse genetic backgrounds have been used in laboratories worldwide. These include transgenic strains, deficiency strains, RNAi, genome editing, and balancers, as well as wild-type strains (e.g., Oregon-R, w1118, Canton-S). The genetic background of these strains differs from the reference strain ISO1 and is unknown. These uncharacterized differences confound the interpretation of experiments investigating the genotype-phenotype relationship using these non-reference laboratory strains. To solve this problem, we introduce the Drosophila Laboratory Pangenome Database (DLPD), a collection of ever-growing reference genome assemblies of popular D. melanogaster laboratory strains. Although we will release eleven genome assemblies initially, more will be added in the future (depending on the feedback from the community).

Submit a Sequencing Request for your Strain of Interest

Do you work with a popular strain of D. melanogaster that doesn't have a high-quality reference genome assembly? Please submit the following form to request that we sequence your strain of interest for inclusion in our database: Google Forms DLPD Request (Strains requested by multiple labs will be prioritized)

Data Access

You can find the genome assemblies under the following google drive link: https://drive.google.com/drive/folders/1NiBAB0Nvd9a2Wd0-d5jWRBmXSUuGFpvj

Stay tuned as we are planning on hosting these assemblies on a genome browser for ease of use and access soon. Meanwhile, please send any requests for the raw reads to either tdmillar@tamu.edu or mahul@tamu.edu

Assembly Statistics for Strains in DLPD to Date

Strain N50* (Mb) L50* Significance
ISO1 (v6.53) 21.4 3 The primary reference assembly for D. Melanogaster
BL5905 24.21 3 W1118 wild type strain
BL3605 24.21 4 W1118 wild type strain
BL5 22.97 4 Oregon-R-C wild type strain
BL64349 24.18 3 Canton-S wild type strain
BL36303 24.16 3 phiC31 integrase-mediated transformation
BL36304 23.93 4 phiC31 integrase-mediated transformation
BL54591 23.63 3 Expresses Cas9 protein under control of nanos regulatory sequences
BL25211 24.46 3 Used in modENCODE functional genomics experiments
BL8765** 24.58 3 GAL4 expression in the nervous system and CyO balancer
BL3954** 22.26 4 GAL4 expression driven by Act5C promoter, TM6B balancer
BL36283** 22.91 4 Piggybac mobilization, FRT site, balancers FM7a, and TM3
BL4737 24.31 3 D. simulans strain. Produces fertile female offspring when crossed with D. melanogaster

*N50 and L50 are measures used to evaluate the quality of genome assemblies. The contig N50 is a value in megabase pairs which quantifies how well the assembly process has pieced together the genome. Specifically, 50% of the assembly is found in contigs (pieces) that are N50_value or longer. The L50 represents the number of contigs that represent the same 50% of the genome assembly. ISO1 reference assembly statistics are included for reference.

**Hi-C contact data is being used to phase and improve de novo genome assemblies for the balancer chromosomes

Citation

We are writing a manuscript describing DLPD, and we'll post the citation here once the paper is ready. Meanwhile, you can use this resource for your work. Please let us know if you want to publish results utilizing this resource before we have a manuscript.

About

code and scripts relevant to the Drosophila Laboratory Pangenome Database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages