Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during pplacer #170

Closed
morgvevans opened this issue Jul 24, 2019 · 24 comments
Closed

Error during pplacer #170

morgvevans opened this issue Jul 24, 2019 · 24 comments

Comments

@morgvevans
Copy link

Having issues with pplacer step - it seems everything else is working, and the check install step works just fine so I don't think it's an install issue...

I have tried giving the command different amounts of cpus, so I don't think it's a memory thing? As you can see it runs for about 38 minutes before the error message, this doesn't change regardless of the amount of cpus I give it.

(gtdb_env_2) -bash-4.2$ gtdbtk classify_wf --genome_dir ./metawrap/Shale/Bin_classify_GTDB/bin_one/ --out_dir ./metawrap/Shale/binone_output_072319_2 -d --force --cpus 64 --force
[2019-07-23 21:49:26] INFO: GTDB-Tk v0.3.2
[2019-07-23 21:49:26] INFO: gtdbtk classify_wf --genome_dir ./metawrap/Shale/Bin_classify_GTDB/bin_one/ --out_dir ./metawrap/Shale/binone_output_072319_2 -d --force --cpus 64 --force
[2019-07-23 21:49:26] INFO: Using GTDB-Tk reference data version r89: /users/PAS1331/osu7930/miniconda3/envs/gtdb_env_2/share/gtdbtk-0.3.2/db/
[2019-07-23 21:49:26] INFO: Identifying markers in 1 genomes with 64 threads.
[2019-07-23 21:49:26] INFO: Running Prodigal to identify genes.
==> Finished processing 1 of 1 (100.0%) genomes.
[2019-07-23 21:49:37] INFO: Identifying TIGRFAM protein families.
==> Finished processing 1 of 1 (100.0%) genomes.
[2019-07-23 21:49:43] INFO: Identifying Pfam protein families.
==> Finished processing 1 of 1 (100.0%) genomes.
[2019-07-23 21:49:44] INFO: Done.
[2019-07-23 21:49:48] INFO: Aligning markers in 1 genomes with 64 threads.
[2019-07-23 21:49:48] INFO: Processing 1 genomes identified as bacterial.
[2019-07-23 21:49:57] INFO: Read concatenated alignment for 23458 GTDB genomes.
[2019-07-23 21:50:15] INFO: Masking columns of multiple sequence alignment using canonical mask.
[2019-07-23 21:51:18] INFO: Masked alignment from 41155 to 5040 AAs.
[2019-07-23 21:51:18] INFO: 0 user genomes have amino acids in <10.0% of columns in filtered MSA.
[2019-07-23 21:51:18] INFO: Creating concatenated alignment for 23459 GTDB and user genomes.
[2019-07-23 21:51:19] INFO: Creating concatenated alignment for 1 user genomes.
[2019-07-23 21:51:19] INFO: Done.
[2019-07-23 21:51:19] INFO: Placing 1 bacterial genomes into reference tree with pplacer (be patient).
[2019-07-23 22:29:25] ERROR: An error was encountered while running pplacer.
[2019-07-23 22:29:25] ERROR: Controlled exit resulting from an unrecoverable error or warning.

================================================================================
EXCEPTION: PplacerException
MESSAGE:


Traceback (most recent call last):
File "/users/PAS1331/osu7930/miniconda3/envs/gtdb_env_2/bin/gtdbtk", line 452, in
gt_parser.parse_options(args)
File "/users/PAS1331/osu7930/miniconda3/envs/gtdb_env_2/lib/python2.7/site-packages/gtdbtk/main.py", line 602, in parse_options
self.classify(options)
File "/users/PAS1331/osu7930/miniconda3/envs/gtdb_env_2/lib/python2.7/site-packages/gtdbtk/main.py", line 415, in classify
options.debug)
File "/users/PAS1331/osu7930/miniconda3/envs/gtdb_env_2/lib/python2.7/site-packages/gtdbtk/classify.py", line 320, in run
scratch_dir)
File "/users/PAS1331/osu7930/miniconda3/envs/gtdb_env_2/lib/python2.7/site-packages/gtdbtk/classify.py", line 146, in place_genomes
pplacer.run(self.cpus, 'WAG', pplacer_ref_pkg, pplacer_json_out, user_msa_file, pplacer_out, pplacer_mmap_file)
File "/users/PAS1331/osu7930/miniconda3/envs/gtdb_env_2/lib/python2.7/site-packages/gtdbtk/external/pplacer.py", line 61, in run
raise PplacerException(proc_err)
PplacerException

@aaronmussig
Copy link
Member

Hello,

There is a known issue with pplacer where using a high number of threads can cause issues. Primarily those issues are related to the host thinking more memory is being used than it actually is. I'd recommend running it again with a smaller number of threads ~10-30.

There may be more information available in the pplacer log file which you can find in the output directory under: classify/intermediate_results/pplacer/pplacer.[marker_set].out

@morgvevans
Copy link
Author

Thank you for the quick reply.

I'll try to run with fewer threads and get back to you on if it worked or not.

Here's the output of the file you listed in the meantime -

Running pplacer v1.1.alpha19-0-g807f6f3 analysis on ./metawrap/Shale/binone_output_072319_2/align/gtdbtk.bac120.user_msa.fasta...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
Pre-masking sequences... sequence length cut from 5040 to 4646.
Determining figs... figs disabled.
Allocating memory for internal nodes... done.
Caching likelihood information on reference tree...

@morgvevans
Copy link
Author

I am still getting the same error when I use --cpus 1

@morgvevans
Copy link
Author

OK i ran a test genome that is in a slightly different format- it is an annotated .fna file - and it worked no problem!
The genomes I have been trying to run are genomic bins that are in .fna format, I can't quite figure out why I'm having a hard time w/ them.
I have included one of the files here if anyone can take a look and help. The only thing I can think is there is either a weird character, or some type of weird formatting issue.
bin.1.orig.zip

@morgvevans
Copy link
Author

More helpful info to add to the last message as well-
The bins were generated using MetaWRAP and annotated using PROKKA
I was originally using the raw, unannotated file (see above)
I've also tried the annotated files with no luck....
bin.1.orig.annotated.zip

Thanks !

@morgvevans
Copy link
Author

I was able to get all these bins to run in GTDB in KBase, so it must be an issue w/ my installation (which was done through bioconda).
I will try and get the command line version to work for future jobs that require it maybe by installing from pip instead of bioconda? If anyone has any ideas please let me know.

@vinisalazar
Copy link

vinisalazar commented Jul 25, 2019

I'm having this same problem with 1 thread. So far I couldn't get it to work.

The gtdbtk test command runs smoothly, though.

@aaronmussig
Copy link
Member

Thanks for the detailed feedback and the genomes, it was very useful for testing. Unfortunately, I haven't been able to replicate the issue trying either: pip install, conda environment, or a manual build.

What I've observed is:

  • Allocating memory for internal nodes...
    • pplacer maps around ~104 GB of data (visible as VIRT). Note that ~2GB of physical memory (RES) is actually allocated.
  • Caching likelihood information on reference tree...
    • pplacer then eventually consumes the full ~104 GB in physical memory as those values are cached.

Given pplacer is failing on the caching step, I would be inclined to say that the server has insufficient memory. Can I ask how much RAM the server has?

If you do have sufficient memory then I'll do a bit of digging into if conda has any sort of upper memory limit.

@aaronmussig
Copy link
Member

aaronmussig commented Jul 26, 2019

I tried this again with capping the maximum amount of memory using ulimit and the exception thrown did contain.

EXCEPTION: PplacerException
  MESSAGE: Uncaught exception: Out of memory
Fatal error: exception Out_of_memory

Perhaps ulimit works differently and I get a more detailed output... or perhaps memory isn't the issue, it's still worth testing.

I would also try running the command manually, you will need to change it a bit but it should be something like this:
pplacer -m WAG -j 1 -c $GTDBTK_DATA_PATH/pplacer/gtdb_r89_bac120.refpkg -o /tmp/pplacer.bac120.json ~/classify_wf/align/gtdbtk.bac120.user_msa.fasta

That way we can at least validate that it's pplacer which is the issue and maybe capture any additional output which is being missed.

@Handymanalan
Copy link

Handymanalan commented Aug 8, 2019

I am having the same problem, the test genomes worked but not my own MAGs.

Edit: It seems to work only with archaeal MAGs but not with bacterial MAGs.

@aaronmussig
Copy link
Member

@Handymanalan are you able to get this to run manually using the pplacer command listed above?
Additionally, how much memory is on the server?

@xwu35
Copy link

xwu35 commented Aug 26, 2019

I also got the same error from both pip and bioconda installed version. Then I tried to run pplacer manually as indicated above, it also did not work

@aaronmussig
Copy link
Member

Hi @xwu35, what did pplacer output when it failed? It sounds like this is a problem specific with pplacer, I just need to confirm that the input isn't malformed. Additionally, how much memory is on the server?

@xwu35
Copy link

xwu35 commented Aug 27, 2019

Hi, the code i used was: pplacer -m WAG -j 6 -c /lustre/haven/user/xwu35/database/release89/pplacer/gtdb_r89_bac120.refpkg -o /tmp/pplacer.bac120.json gtdbtk_good_bins_output/align/gtdbtk.bac120.user_msa.fasta

and the output was like this:
Pre-masking sequences... sequence length cut from 5040 to 5040.
Determining figs... figs disabled.
Allocating memory for internal nodes... done
Caching likelihood information on reference tree... kill

But the weird thing is that I just ran pplacer and gtdbtk from both versions again and they all worked...We have 192 GB of RAM per node on server. Thanks

@aaronmussig
Copy link
Member

@xwu35 Given that the caching likelihood information step is when pplacer loads 100+GB of data into memory it sounds like it ran out of memory. Unfortunately, pplacer isn't awfully descriptive with its error messages.

192 GB of RAM is sufficient to run GTDB-Tk, however, there may be some insights into how pplacer operates on HPCs which you can read about in issue #124

Thanks,
Aaron

@micro-phia
Copy link

I'd like to record an instance of this same issue:

I used an HPC slurm system to run the following gtdbtk command within a for-loop, in which ${COM} represents a different community, hosting the necessary genome folder ('MAG-genomes') as an input. (There are 11 ${COM} folders, each with ~30 MAGS in .fa format. So, essentially, gtdbtk should be operating on ~30 MAGS at a time.)

gtdbtk classify_wf --genome_dir ./${COM}/CHECKM/MAG-genomes/ --out_dir ./${COM}/GTDB -x fa

I tried several iterations of this code with the following HPC parameters to accomodate for memory requirements (after finding this forum):

  • 1 node (64GB)
  • 3 nodes (192GB)
  • 4 nodes (256GB)

I also tried varying the assigned CPU's using the --cpus flag in the gtdbtk command.

In all instances, the gtdbtk command was able to complete everything except the 'classify' step, (before moving on to the next community in the for-loop.) At the classify step of the program, pplacer was only able to place and classify the archael genomes ... when pplacer attempted to run on the bacterial genomes, it crashed with the following output messages (which demonstrate the ability to place archael genomes but not bacterial).

[2019-10-23 14:05:44] INFO: Placing 2 archaeal genomes into reference tree with pplacer (be patient).
[2019-10-23 14:06:39] INFO: Calculating average nucleotide identity using FastANI.
[2019-10-23 14:06:43] INFO: 0 genomes have been classify using FastANI and pplacer.
[2019-10-23 14:06:43] INFO: Calculating RED values based on reference tree.
[2019-10-23 14:06:43] INFO: Placing 16 bacterial genomes into reference tree with pplacer (be patient).
[2019-10-23 14:16:05] ERROR: An error was encountered while running pplacer.
[2019-10-23 14:16:06] ERROR: Controlled exit resulting from an unrecoverable error or warning.

================================================================================
EXCEPTION: PplacerException
MESSAGE:

This error was corroborated by only the presence of the archael final output files, but not bacterial.

I have not yet found a valid solution, since my HPC allotment is capped at 4 nodes. Running this in Kbase is hardly an option, since the Kbase app seems to require a separate run for each genome and I have almost 300 MAGS to work with.

Any updates or bug-fixes would be HUGELY appreciated.

Best,
Phia

@donovan-h-parks
Copy link
Collaborator

Hi Phia. Best we can tell this is an issue with how pplacer interacts with some HPC environments. The end result being that it appears pplacer is requesting increasing memory for each additional CPU though in reality this isn't the case. Unfortunately, we do not have a solution to this. I think your options are either to run GTDB-Tk with only a few CPUs in order to keep the apparent memory request within reason or to use KBase. You can create a genome set at KBase and process multiple genomes at once through GTDB-Tk.

A major part of the time requires by GTDB-Tk is loading in the reference tree. As such, it is far more efficient to process all MAGs at once (i.e. combining the MAGs from all your communities into a single job for processing).

@micro-phia
Copy link

Thank you for your quick response, dparks. I have two follow-up questions:

  1. When running this remotely on an HPC system, I am still unable to surpass the pplacer memory issue when I run with only 1CPU on 1 node (64GB) ... and run into the same issue with 1 assigned CPU on 2-4 nodes (128-256GB). I'm now understanding that the program requires 100GB but does not understand how to use multiple nodes/CPU's for that extra memory so the additional nodes don't really help ... I plan to request access to two alternative HPC systems where I can run on one 256GB node or a different, single, 512GB node. Do you expect that either of these single, big-memory nodes will solve the problem?

  2. Thank you for the tip on creating a genome set at KBase - however, it appears there is no method to transfer genomes in batch from the "staging area" to the "narrative" of KBase. This means that each of my 300 genomes would still need to be imported individually, even if the "upload" and "gtdbtk" apps can be performed on batch files. Do you know of a work-around?

Thank you in advance,
Phia

@donovan-h-parks
Copy link
Collaborator

Hi Phia. GTDB-Tk requires a machine with access to 128 GB of RAM. Sounds like each of your nodes only has 64 GB of RAM which is not sufficient. I am not familiar with KBase. Perhaps you can send a help request to them regarding how to upload multiple genomes.

@aaronmussig
Copy link
Member

Closing this issue as I believe it's been resolved, the FAQ page has been updated with a summary of what we've learned. Feel free to re-open if that's not the case.

There are two main issues which were identified in this issue:

  1. pplacer failed because it was run on a server with insufficient memory.
  2. pplacer failed because it was run on a HPC/queueing system with multiple threads - which mislead the OS into thinking it was out of memory.

Issue 1 should be easily identifiable on GTDB-Tk 1.0.2 as a warning will be displayed if the server has insufficient memory.

Issue 2 is available for reference in the FAQ, additionally, overriding the number of threads pplacer can use will be available as a feature in the next release (#195).

@micro-phia
Copy link

micro-phia commented Dec 19, 2019 via email

@marcomeola
Copy link

On a side note. It would be really helpful if the standard conda installation would be set on version 1.2.0 of gtdbtk.

@saad272
Copy link

saad272 commented Oct 21, 2022

Hi
I got the following error while running classify_wf and I don't understand why? help me please.
Thank you !

2022-10-14 19:20:59] INFO: GTDB-Tk v2.1.0
[2022-10-14 19:20:59] INFO: gtdbtk classify_wf --extension fa --cpus 14 --genome_dir . --out_dir gtdb/
[2022-10-14 19:20:59] INFO: Using GTDB-Tk reference data version r207: /home/IAME/db/gtdbtk-2.1.0/release207_v2
[2022-10-14 19:21:01] INFO: Identifying markers in 26 genomes with 14 threads.
[2022-10-14 19:21:01] TASK: Running Prodigal V2.6.3 to identify genes.
[2022-10-14 19:21:36] INFO: Completed 26 genomes in 34.98 seconds (1.35 seconds/genome).
[2022-10-14 19:21:36] TASK: Identifying TIGRFAM protein families.
[2022-10-14 19:21:45] INFO: Completed 26 genomes in 9.09 seconds (2.86 genomes/second).
[2022-10-14 19:21:45] TASK: Identifying Pfam protein families.
[2022-10-14 19:21:46] INFO: Completed 26 genomes in 0.86 seconds (30.35 genomes/second).
[2022-10-14 19:21:46] INFO: Annotations done using HMMER 3.1b2 (February 2015).
[2022-10-14 19:21:46] TASK: Summarising identified marker genes.
[2022-10-14 19:21:47] INFO: Completed 26 genomes in 0.78 seconds (33.44 genomes/second).
[2022-10-14 19:21:47] ERROR: Uncontrolled exit resulting from an unexpected error.

================================================================================
EXCEPTION: OSError
  MESSAGE: [Errno 95] Operation not supported: 'identify/gtdbtk.failed_genomes.tsv' -> 'gtdb/gtdbtk.failed_genomes.tsv'
________________________________________________________________________________

Traceback (most recent call last):
  File "/usr/bin/miniconda2/envs/gtdbtk-2.1.0/lib/python3.8/site-packages/gtdbtk/__main__.py", line 98, in main
    gt_parser.parse_options(args)
  File "/usr/bin/miniconda2/envs/gtdbtk-2.1.0/lib/python3.8/site-packages/gtdbtk/main.py", line 816, in parse_options
    self.identify(options)
  File "/usr/bin/miniconda2/envs/gtdbtk-2.1.0/lib/python3.8/site-packages/gtdbtk/main.py", line 271, in identify
    markers.identify(genomes,
  File "/usr/bin/miniconda2/envs/gtdbtk-2.1.0/lib/python3.8/site-packages/gtdbtk/markers.py", line 243, in identify
    self._report_identified_marker_genes(genome_dictionary, out_dir, prefix,
  File "/usr/bin/miniconda2/envs/gtdbtk-2.1.0/lib/python3.8/site-packages/gtdbtk/markers.py", line 117, in _report_identified_marker_genes
    symlink_f(PATH_FAILS.format(prefix=prefix),
  File "/usr/bin/miniconda2/envs/gtdbtk-2.1.0/lib/python3.8/site-packages/gtdbtk/tools.py", line 245, in symlink_f
    os.symlink(src, dst)
OSError: [Errno 95] Operation not supported: 'identify/gtdbtk.failed_genomes.tsv' -> 'gtdb/gtdbtk.failed_genomes.tsv'
================================================================================

@caizhangbin
Copy link

hi I got the same issue. and I tested with pplacer -m WAG -j 1 -c $GTDBTK_DATA_PATH/pplacer/gtdb_r89_bac120.refpkg -o /tmp/pplacer.bac120.json ~/classify_wf/align/gtdbtk.bac120.user_msa.fasta, and it showed
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
Pre-masking sequences... sequence length cut from 5035 to 4865.
Determining figs... figs disabled.
Allocating memory for internal nodes... Uncaught exception: Out of memory
Fatal error: exception Out_of_memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants