Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new command 'make label' assign a label to query draft assembly based on the best hits from COBS #263

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jorgeavilacartes
Copy link

Hello,

In this pull request, I included:

  1. a modification of the Snakefile to support a larger number of input files. Why? when using ~400 files, the code crashed because the concatenation of their names was too long. So I simply modified the get_filename_for_all_queries() function to return a fixed string. See here
  2. "fna" was included in the list of accepted extensions, since this is the default format of assemblies downloaded from NCBI (with ncbi-datasets).
  3. scripts and files to assign a label to a query draft assembly at the species level,

How are labels assigned to a query draft assembly?
Since each contig in a draft assembly is considered as a query, I parsed the output file from intermediate/04_filter to collect all hits of each assembly (i.e. the collection of hits of its contigs).
Each hit (represented by the sampleID of an assembly) is mapped to its label, and the label assigned to the query assembly corresponds to the most common label of its hits.

The labels correspond to the second column of the Kraken Braken (most abundant species) file that was used to create the clusters. The file data/labels_krakenbracken_by_sampleid.txt was included in the repository.

NOTE: these modifications do not interfere with the main pipeline, since it can be run after make match. See updated README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant