Skip to content

Genomic Resources

Sam White edited this page Apr 4, 2019 · 76 revisions

Here we try to compile genomic resources such that they are readily available and somewhat described. An effort will be made to keep respective index files alongside so these files can be directly used in IGV etc.

Next Gen Sequencing Database

Crassostrea gigas

Genome:

Bisulfite Genome:

Genome Feature Tracks


Crassostrea virginica

NCBI FTP

Genomes:

  • Cvirginica_v300.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa

    • MD5 = f9135e323583dc77fc726e9df2677a32

    • FastA index (samtools faidx)

  • GCF_002022765.2_C_virginica-3.0_genomic.fna : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/022/765/GCF_002022765.2_C_virginica-3.0/GCF_002022765.2_C_virginica-3.0_genomic.fna.gz

    • compressed version of Cvirginica_v300.fa (same files)

Bisulfite Genomes:

  • Cvirginica_v300_bisulfite.tar.gz : http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300_bisulfite.tar.gz
    • Gzipped tarball of bisulfite genome for use with Bismark
    • Creation details here

Genome Feature Tracks


Ostrea lurida

Genome:

  • Olurida_v081.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa

    • MD5 = 3ac56372bd62038f264d27eef0883bd3

    • This is v080 with only contigs > 1000bp. Details of how v080 was reduced found here.

    • FastA index (samtools faidx)

      • Olurida_v081.fa.fai : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa.fai
  • Olurida_v080.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa

    • MD5 = 9258398f554493e08fdc30e9c1409864

    • FastA index (samtools faidx)

      • Olurida_v080.fa.fai : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080.fa.fai
    • Also known as pbjelly_sjw_01. Details found here, though confirmation would be good.

    Bisulfite Genomes:

    • Olurida_v080_bisulfite.tar.gz : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v080_bisulfite.tar.gz

      • Gzipped tarball of bisulfite genome for use with Bismark
      • Creation details here

Transcriptome:

Genome Feature Tracks

  • Olurida_v081.mRNA.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.mRNA.gff

  • Olurida_v081-ks-blastn : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-ks-blastn.gff

    • MD5 = 9f8f0e3e7a69ba15c14eef9b74849a34

    • Gene track developed from blasting Silliman big transcriptome to genome.

  • Olurida_v081_TE-Cg.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081_TE-Cg.gff

  • Olurida_v081.all.gff : http://owl.fish.washington.edu/Athaliana/20180807_wqmaker_run_oly_02/Olurida_v081.all.gff

    • MD5 = 2116e8a52b522a498fce88e82be3326b

Output from running Maker. Details in Sam's Notebook

  • Olurida_v081.all.sorted.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.all.sorted.gff

  • Olurida_v081.protein2genome.sorted.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.protein2genome.sorted.gff

    • MD5 = a152bf7401fa1a5158c0c6a14c2add9

    • Derived from Olurida_v081.all.gff

  • Olurida_v081.blastx.sorted.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.blastx.sorted.gff

    • MD5 = 35596fa126fbd234c0b1e9311d5d8abc

    • Derived from Olurida_v081.all.gff

  • Olurida_v081.blastn.sorted.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.blastn.sorted.gff

    • MD5 = 518bfb0b00a63cdf1a37b49472d746ce

    • Derived from Olurida_v081.all.gff

  • Olurida_v081.est2genome.sorted.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.est2genome.sorted.gff

    • MD5 = fc829d1b22b649ca7692f77b57b4833b

    • Derived from Olurida_v081.all.gff

  • Olurida_v081.repeatrunner.sorted.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.repeatrunner.sorted.gff

MD5 = 0ea4dfb59c3a8b81144f330e06ddf117

Derived from Olurida_v081.all.gff

MD5 = 69b72a27bd565c0de732fec7dc8c2e4e

Derived from Olurida_v081.all.gff

IGV Sessions

https://raw.githubusercontent.com/sr320/nb-2018/master/O_lurida/analyses/0827_igv_session.xml comment: split full Maker out


Panopea generosa

Genome:

Bisulfite Genome:

  • Pgenerosa_v070_bisulfite.tar.gz : http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_v070_bisulfite.tar.gz

    • Gzipped tarball of bisulfite genome for use with Bismark
    • Creation details here
  • Pgenerosa_v071_bisulfite.tar.gz : http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_v071_bisulfite.tar.gz

    • Gzipped tarball of bisulfite genome for use with Bismark
    • Creation details here
  • Pgenerosa_v073_bisulfite.tar.gz : http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_v073_bisulfite.tar.gz

    • Gzipped tarball of bisulfite genome for use with Bismark
    • Creation details here

Genome Feature Tracks

  • Pgenerosa_v071_snap02.renamed.mRNA.gff : http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_v071_snap02.renamed.mRNA.gff

    • MD5 = 2778a3f268040a0b533fa086c6979fd2

    • awk '$3 == "mRNA" {print}' Pgenerosa_v071_snap02.all.renamed.gff > Pgenerosa_v071_snap02.renamed.mRNA.gff

Transcriptome:

  • Pgenerosa_transcriptome_v5.fasta : http://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_transcriptome_v5.fasta

    • MD5 = 5a21424ecbc88c3b01daefe56bed79da

Transcriptome generated from various libaries - details here


QPX

Genome:

  • QPX_v017.fasta : http://eagle.fish.washington.edu/QPX_genome/QPX_v017.fasta

CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.

CLC v5.1 Mismatch cost = 2; Perform scaffolding = Yes; Mapping mode = Map reads back to contigs (slow); Deletion cost = 3; Similarity fraction = 0.9; Length fraction = 0.8; Insertion cost = 3; Update contigs = Yes; Automatic word size = Yes; Minimum contig length = 10000; Automatic bubble size = Yes; input: filtered_QPX_DNA_GTGAAA_L001_R1 trimmed.

De novo assembly was performed with Genomics Workbench v. 5.0 (CLC Bio, Germany) on quality trimmed sequences with the following parameters: mismatch cost = 2, deletion cost = 3, similarity fraction = 0.9, insertion cost = 3, length fraction = 0.8 and minimum contig size of 100 bp for genomic data and 200 bp for transcriptomic data. In order to remove ribosomal RNA sequences from the transcriptome data, consensus sequences were compared to the NCBI nt database using the BLASTn algorithm [59]. Sequences with significant matches (9) were removed and not considered in subsequent analyses.

Manuscript: https://doi.org/10.1371/journal.pone.0074196

Transcriptome:

QPX_transcriptome_v1_clean.fasta

QPX_Transcriptome v2.1

Subset of version 1 (v1) that only includes sequences with e-value < 1E-20. Based on Swiss-Prot blastx output, all sequences are oriented 5' - 3'. nucleotides between stop codons; minimum size 200.

You can’t perform that action at this time.