Skip to content

Latest commit

 

History

History
221 lines (195 loc) · 9.52 KB

VGP_specimen_naming_scheme.md

File metadata and controls

221 lines (195 loc) · 9.52 KB

VGP specimen naming scheme

Individual specimens to be seqeunced as part of the Vertebrate Genomes Project (VGP) will be assigned a VGP ID according to the following scheme. The ID will take the form:

[abfmrs]AbcXyz{#}

where

  • The one letter prefix [abfmrs] corresponds to one of:

    prefix class
    a amphibians
    b birds
    f fishes
    m mammal
    r reptiles
    s sharks and relatives
  • The six letter combination AbcXyz is a species/strain designator. In most cases, this will be GenSpe for Genus/Species but that is not required (see below for how to resolve clashes).

  • {#} is an incremental number per individual specimen from the same species.

For each species in the VGP ordinal project, the 7-letter prefix [abfmrs]AbcXyz will be pre-assigned to avoid conflicts. Across all species, there will be clashes for this 7-letter prefix. There are a few options for dealing with these:

  1. Allow the clashes and with individuals/species disambiguated by the final incremental number.
  2. Allow variation within the six letter species designator, e.g. a 2-4 split (GeSpec), or modified capitalisation (GENSpe)

The EBI are in the process of setting up a registry where VGP IDs can be assigned and avoid individual IDs clashing between centres.

Examples

VGP ID Species and common name
fGouWil2 Gouania willdenowi; blunt-snouted clingfish
mLemCat1 Lemur catta; ring-tailed lemur
aRhiDar3 Rhinoderma darwinii; Darwin's frog
bCalAnn1 Calypte anna; Anna's hummingbird
rDerCor1 Dermochelys coriacea; leatherback sea turtle
sCarCar1 Carcharodon carcharias; great-white shark

Tissue samples

For a single individual, there may be multiple tissue samples used for transcriptome sequencing. The proposed scheme to distinguish these samples is:

[abfmrs]AbcXyz{#}.tissue{#}

where tissue should come from an agreed list of terms (to be decided). Examples: fGouWil2.brain1, fGouWil2.eye2.

If the tissue used for transcriptome sequencing is from a different indiviual than the one sequenced to produce the assembly, then an new individual VGP ID should registered.

Biosamples

Having assigned VGP IDs, a BioSamples accession ID should also be generated for the individual. Agreed metadata (to be decided) should be attached to the BioSamples entries. Tissue samples should be assigned metadata based on an agreed ontology such as Uberon and should used the Derived from linking facility in BioSamples to indicate the individual source of that tissue sample.

Extension beyond vertebrates

If this scheme were to extend beyond vertebrates in the VGP, the below is a proposal which would use all the letters of the alphabet to cover the Tree of Life. This is meant as a pragmatic division rather then a strict taxonomic one.

prefix class count group notes
a amphibians 6439 chordates
b birds 10301 chordates
c non-vascular plants 14222 plants
d dicotyledons 200000 plants not monophyletic
e echinoderm 6753 other animals
f fishes 31862 chordates lobe-finned and ray finned = Osteichthyes = Teleostomi (excluding tetrapods)
g fungi 123126 other eukaryotes
h platyhelminths 9164 other animals
i insects 795000 other animals
j jellyfish and other cnidaria 9747 other animals
k other chordates 1926 chordates cephalochordates, urochordates (tunicates), jawless fish; not monophyletic
l monocotyledons (lilies etc.) 51595 plants 'l' for lily
m mammals 4863 chordates
n nematodes 3455 other animals
o sponges 8499 other animals
p protists 12695 other eukaryotes defined here as eukaryotes not animals or plants or fungi; not monophyletic
q other arthropods 120000 other animals not insects; not monophyletic
r reptiles 9789 chordates excluding birds
s sharks and relatives 1149 chordates Chondricthyes = Elasmobranchs and Chimaeras
t other animal phyla 165 other animals
u algae 2056 plants not monophyletic
v other vascular plants 66717 plants ferns, cycads, conifers, gingko etc.; not monophyletic
w annelids (worms) 12738 other animals
x molluscs 41646 other animals the "scs" in "moluscs" sounds a bit like it contains an 'x'
y bacteria 6468 prokaryotes
z archea 281 prokaryotes mosses, liverworts, hornworts; not monophyletic
- viruses '-' for missing

Equivalently, presented by group:

group prefix class
chordates (including vertebrates) m mammals
b birds
r reptiles
a amphibians
f fishes
s sharks
k other chordates
other animals e echinoderms
x molluscs
i insects
q other arthropods
w annelids (worms)
n nematodes
h platyhelminths
j jellyfish and other cnidaria
o sponges
t other animal phyla
plants d dicotyledons
l monocotyledons
v other vascular plants
c non-vascular plants
u algae
other eukaryotes g fungi
p protists
prokaryotes y bacteria
z archaea