Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionality to join two kraken output files together #130

Open
tayabsoomro opened this issue Oct 23, 2018 · 6 comments
Open

Functionality to join two kraken output files together #130

tayabsoomro opened this issue Oct 23, 2018 · 6 comments

Comments

@tayabsoomro
Copy link

tayabsoomro commented Oct 23, 2018

Hi, I am wondering if there is a functionality to join two kraken output files together?

Thanks,
Tayab Soomro.

@jenniferlu717
Copy link
Collaborator

What kind of outputs are you trying to combine? Are they the single sample and you want the report for the sum of the two? Or are you trying to compare them?

@tayabsoomro
Copy link
Author

tayabsoomro commented Oct 30, 2018

So, I have two kraken-style output files generated by performing two DNA classification runs. These outputs show the proportions of reads present in the sample, an example is shown just below:

 6.00  600     600     U       0       unclassified
 94.00  9400    0       -       1       root
 94.00  9400    0       -       131567    cellular organisms
 94.00  9400    0       D       2           Bacteria
 94.00  9400    0       -       1783272       Terrabacteria group
 94.00  9400    0       P       1239            Firmicutes
 94.00  9400    0       C       91061             Bacilli
 94.00  9400    0       O       1385                Bacillales
 94.00  9400    0       F       186817                Bacillaceae
 94.00  9400    0       G       1386                    Bacillus
 94.00  9400    0       S       86661                     Bacillus cereus group
 94.00  9400    3463    S       1392                        Bacillus anthracis
 58.71  5871    5871    S       198094200                             B.anthracis Ames
0.66  66      66      S       191218100                               B.anthracis A2012

Now, imagine the second kraken-style report file having some overlapping species present, and some different species. I would like to generate final kraken-style report from the two previous ones which merges the two data together.

So, for example if there is B. anthracis Ames in the 2nd kraken-style report as well, then it would show it only once in the final kraken-style report with the proportions increased. But if there is another strain in the 2nd kraken-style report under Bacillus anthracis which is not present in the 1st kraken-style report, the final kraken-style report would add that under Bacillus anthracis and update the proportions accordingly.

@jenniferlu717
Copy link
Collaborator

@tayabsoomro I know this is very late to say but we are working on a set of "Kraken-Tools" that can/will provide additional support for such projects as this.

@tayabsoomro
Copy link
Author

That is good to hear! Although I ended up creating such a tool myself but it will be great if it is added to Kraken. Thanks.

@susheelbhanu
Copy link

Hey @tayabsoomro. I'm interested in doing something similar, so could you please share this tool you're referring to?

Thank you!

@tayabsoomro
Copy link
Author

tayabsoomro commented Jun 13, 2019

Hey @tayabsoomro. I'm interested in doing something similar, so could you please share this tool you're referring to?
Thank you!

I ended up using the Centrifuge tool and its command centrifuge-kreport to generate the kraken-style report.

So I combined the multiple centrifuge reports together using python's file append and then once I had the accumulated centrifuge report file, I generated kraken-style report file from it.

Here is the snippet of code that I created, hope it helps:

https://github.com/coadunate/MICAS/blob/24db33140419219320ebf6d230e4519894f1bc2d/server/app/main/utils/FASTQFileHandler.py#L58-L83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants