Skip to content

Microbial transcriptome indices #1722

@georgiadoing

Description

@georgiadoing

Context

For many microbes, it is not clear which transcriptome indices are best for building compendia.

Problem or idea

The first hurdle is that genome assemblies are not in the main Ensemble! database. The second is that many microbes have many genome assemblies without a standard naming scheme to point out the 'best' reference. Part of this comes from the multitude of strains for each microbial species and in turn the multitude of assemblies for each strain.

Solution or next step

  1. A first step would be if transcriptome indices can be called for EnsembleBacteria in a manner similar to how the README instructs for Surveyor Jobs like so:

./foreman/run_surveyor.sh survey_all --accession "Pseudomonas aeruginosa, EnsemblBacteria"

  1. Next we would need to decide and specify which strain to use. This is a bit harder to solve and my only short-term solution is to manually specify for each microbe of interest based on expert knowledge. For example, if our preferred reference strain of Pseudomonas aeruginosa PAO1 could be specified, perhaps like so:

./foreman/run_surveyor.sh survey_all --accession "Pseudomonas aeruginosa PAO1, EnsemblBacteria"

  1. Thirdly would be to make sure we are getting the best assembly for that strain. For example, for Pseudomonas aeruginosa PAO1 there are 7 genomes in EnsemblBacteria and we would like to specifically get ASM676v1, the most complete reference genome that would be best for building a compendium.

In summary, if there is a way, for a given microbial organism, to specify the strain and assembly in the accession call, and if we curated a list of preferred strains and assemblies we could begin to tackle this.

Below is a link to a working gsheet of organisms, strains and assemblies that I will curate, in case it is useful.

https://docs.google.com/spreadsheets/d/1Lbi68UP2dQtfp-KoxtXpE7jhCOxgP_FweGznbbOiMkw/edit?usp=sharing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions