Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use prodigal results on eukaryotig contigs rather than refseq eukaryotic proteins #4

Open
4 tasks
jvollme opened this issue Nov 30, 2021 · 0 comments
Open
4 tasks
Assignees
Labels
enhancement New feature or request

Comments

@jvollme
Copy link
Collaborator

jvollme commented Nov 30, 2021

ORF calling on eukaryotes functions drastically different than for bacteria. Instead of using Eukaryotic proteins as references, rather run prodigal with prokaryotic and metagenomic settings on reference Eukaryotic genomes (not necessary for viral genomes) and use those as references. This is more likely to mimic what happens with eukaryotic contigs during metagenome analyses pipelines.

for this:

  • download refseq release eukaryotig genomes (nucleotide sequences)
  • randomly cut into chunks of ~ 5kb, but also cut at stretches of "N"s (discard chunks that end up smaller than 200bp)
  • run prodigal, derepilicate proteins (95% identity? or 90% identity?) to reduce database size. always keep largest representative --> protein diamond db: eukaryotic-refprotein-db
  • extract all remaining chunks without any predicted CDS (non-coding reference chunks), dereplicate (95% or 90% identity?). always keep largest representative --> nucleotide blastn-db: eukaryotic-noncoding-chunks-db
@jvollme jvollme added the enhancement New feature or request label Nov 30, 2021
@jvollme jvollme added this to the Improvements A milestone Nov 30, 2021
@jvollme jvollme self-assigned this Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant