You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
maybe use 98% aminoacid identity cut-off?
proteins that are unique for one species in a genus would still be attributed to that individual species (but only one copy would be kept, in case of multi-copy entries)
"redundant" proteins, that occur identically in multiple species of a genus would be attributed to the genus instead of the species (again only represented by one copy)
--> reduces dataset size
--> increases diamond/blast search speeds
--> increases speed of LCA-classifications (a little bit)?
The text was updated successfully, but these errors were encountered:
maybe use 98% aminoacid identity cut-off?
proteins that are unique for one species in a genus would still be attributed to that individual species (but only one copy would be kept, in case of multi-copy entries)
"redundant" proteins, that occur identically in multiple species of a genus would be attributed to the genus instead of the species (again only represented by one copy)
--> reduces dataset size
--> increases diamond/blast search speeds
--> increases speed of LCA-classifications (a little bit)?
The text was updated successfully, but these errors were encountered: