Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'Bacteria' #38

Open
chassenr opened this issue Sep 7, 2022 · 4 comments
Open

KeyError: 'Bacteria' #38

chassenr opened this issue Sep 7, 2022 · 4 comments
Assignees

Comments

@chassenr
Copy link

chassenr commented Sep 7, 2022

Hi,

I have been running MDMcleaner on 2 different bin sets. In one run, not all bins were processed without error (see #37). Using the same reference data, I am now getting the following error for the last bin to be processed:

--> writing to output files
        writing detailed contig infos to ./T4-48_bin.49.orig/fullcontiginfos_beforecleanup.tsv
        appending overview data to overview_all_before_cleanup.tsv
        creating output fastas
        creating krona input-table
reference-database contaminations detected during this run: 69
blasting 13 entries with blastx against reference proteins (another 5 entries were too long to blastx efficiently
Traceback (most recent call last):
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.3/bin/mdmcleaner", line 10, in <module>
    sys.exit(main())
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.3/lib/python3.10/site-packages/mdmcleaner/mdmcleaner.py", line 217, in main
    blacklist_additions = clean.main(args, configs)
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.3/lib/python3.10/site-packages/mdmcleaner/clean.py", line 230, in main
    if "contamination" in db_suspects.collective_diamondblast():
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.3/lib/python3.10/site-packages/mdmcleaner/review_refdbcontams.py", line 319, in collective_diamondblast
    eval_list.append(self.evaluateornot(self.blastxjobs[x], blastxdone = True))
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.3/lib/python3.10/site-packages/mdmcleaner/review_refdbcontams.py", line 283, in evaluateornot
    return_category, return_note = comp.count_contradictions() #todo: redundant. streamline blastcontigs() and countcontradictions() more
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.3/lib/python3.10/site-packages/mdmcleaner/review_refdbcontams.py", line 123, in count_contradictions
    domain_counts_expected = domain_counts[comparison_domain] #todo: only in try_except statement for debugging
KeyError: 'Bacteria'

Any advice would be appreciated. Thanks!

@jvollme jvollme self-assigned this Sep 7, 2022
@jvollme
Copy link
Collaborator

jvollme commented Sep 7, 2022

I will look into this error. Would be great if you could send me the problematic bin fastas for that if possible, though ( I can provide an upload link for that soon, as soon as a current problem on our fileserver has been solved).

Until then a temporary workaround may be to use the --fast_run argument here als, as already suggested for #37 (preferrably only for the problematic bins though).

@chassenr
Copy link
Author

I re-ran MDMcleaner for this bin set omitting the bin that seemed to have caused the error in the previous run. However, I received the same error message for the last bin to be run. Maybe the error message is related to a step in the program that is run after the individual processing of the bins? The bin_name_filtered_*.gz files are written for all bins and the error list file is empty. However, despite the potential reference DB contamination that was detected during the run (according the log file) no new_blacklist_additions.tsv is being produced. I am now re-running the complete bin set with the --fast_run option. That worked without a problem. I find it curious, though, that I only received the KeyError: 'Bacteria' when running MDMcleaner for this bin set, and not the one that I refer to in #37. I used the same database and conda environment.

@jvollme
Copy link
Collaborator

jvollme commented Jan 14, 2023

for some reeason i could not reproduce this particular error for the test bins provided. But i added a temporary fix for the related "silva_conflict" problem. please Contact me if this keyerror still persists with the new mdmcleaner version and database...

@chassenr
Copy link
Author

chassenr commented May 8, 2023

Hi @jvollme ,

I am very sorry that it took me so long to get back to you. I updated MDMcleaner to the latest version and all my previous issues were resolved. Thank you so much. Everything seems to have worked fine. However, I get the following error message at the end of the run:


------------------------------finished------------------------------
Traceback (most recent call last):
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.7/bin/mdmcleaner", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.7/lib/python3.11/site-packages/mdmcleaner/mdmcleaner.py", line 214, in main
    blacklist_additions = clean.main(args, configs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/bio/Software/anaconda3/envs/mdmcleaner-0.8.7/lib/python3.11/site-packages/mdmcleaner/clean.py", line 243, in main
    os.remove(f)
FileNotFoundError: [Errno 2] No such file or directory: './tempdir_dnlaipkf_mdmcleaner_refdbcontam/refblast_GCA_001443455.1_LDXO01000018.1__000blastn_auxblasts_vs_concat_refgenomes.tsv'

I assume that this is just related to post-run cleaning steps and not affecting the results for the individual bins?

Thanks!

Cheers,
Christiane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants