Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include NCBI TaxID alongside the participants names #16

Open
cpauvert opened this issue Jul 2, 2021 · 2 comments
Open

Include NCBI TaxID alongside the participants names #16

cpauvert opened this issue Jul 2, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@cpauvert
Copy link
Owner

cpauvert commented Jul 2, 2021

No description provided.

@cpauvert cpauvert added the enhancement New feature or request label Jul 2, 2021
@cpauvert cpauvert self-assigned this Jul 7, 2021
@cpauvert
Copy link
Owner Author

cpauvert commented Jul 7, 2021

This seems feasible if relying on the great taxize R package from ropensci. The following example:

library(taxize) # might need an API key for entrez
get_uid("Zygotorulaspora florentina", rank_query="species")
# Provides 48255 OR NA if wrong name

@cpauvert
Copy link
Owner Author

With the issue #18 in mind, I started some time ago to work on this issue and few problems arose.

The search for NCBI Tax ID failed for some participants names (ex: row 1 in table below), probably because of trailing sp. or spp. (which seems easy to tackle).

Moreover some participants names are fuzzy (ex: row 5 or 10) which has two consequences:

  1. They will need renaming but in a tractable way
  2. The taxonomic resolution might also change (row 1 P1 will become only at the genus resolution instead of species)

I am still not sure how to deal with these problems nor exactly how to properly sanitize the names and tax id without hand corrections.

                Participant_1                   Participant_2     TR1     TR2
1           Acanthamoeba spp.          Candidatus Procabacter species   genus
2       Acetobacterium woodii         Pelobacter acidigallici species species
3               Acinetobacter              Pseudomonas putida   genus species
4       Alteromonas macleodii                 Prochlorococcus species   genus
5  Ammonia-oxidizing bacteria      Nitrite-oxidizing bacteria   class   class
6            Archaea (ANME-2)              Desulfosarcina sp.  phylum   genus
7        Aspergillus nidulans      Streptomyces rapamycinicus species species
8             Azotobacter sp.                  Alternaria sp. species species
9                Bacillus sp.            Debaryomyces vanriji species species
10         Bacteroides ovatus Bacteroides vulgatus and others species species

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant