Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caliper doesn't work on Summit for me #392

Closed
yaoyi92 opened this issue Oct 29, 2021 · 2 comments
Closed

Caliper doesn't work on Summit for me #392

yaoyi92 opened this issue Oct 29, 2021 · 2 comments

Comments

@yaoyi92
Copy link

yaoyi92 commented Oct 29, 2021

Hello, I have used Caliper on normal intel CPU machines, and worked perfectly.

However, when I try to use it on the Summit IBM+GPU machine, I got this error. I don't understand what's the difference in MPI between machines. I have also pasted my submission script.

===
== CALIPER: (7): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (6): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (31): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (30): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (16): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (17): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (11): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (34): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (35): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (10): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (4): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (3): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (5): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (2): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (23): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (22): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (19): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (18): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (27): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (33): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (29): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (28): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (26): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (32): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (0): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (1): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (14): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (25): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (8): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (15): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (21): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (20): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (13): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (12): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (9): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (24): runtime-report: mpireport: MPI is already finalized. Cannot aggregate output.
===

submission script

===

#!/bin/bash

#BSUB -P MAT240
#BSUB -W 2:00
#BSUB -nnodes 1
#BSUB -alloc_flags gpumps
#BSUB -J aims-gw
#BSUB -o aims.%J
#BSUB -N yy244@duke.edu
#BSUB -q debug

module purge
#module load gcc/7.4.0 spectrum-mpi/10.3.1.2-20200121  cuda/10.1.243 essl/6.1.0-2 netlib-lapack/3.8.0 netlib-scalapack/2.0.2
module load gcc/7.5.0 spectrum-mpi/10.4.0.3-20210112 cuda/10.1.243 essl/6.1.0-2 netlib-lapack/3.8.0 netlib-scalapack/2.1.0
module load nsight-systems/2021.3.1.54

bin=/ccs/home/yaoyi92/fhiaims/FHIaims/build_gpu_caliper_fft/aims.211010.scalapack.mpi.x

export OMP_NUM_THREADS=1

ulimit -s unlimited
export CALI_CONFIG=runtime-report
jsrun -n 2 -a 18 -c 18 -g 3 -r 2 $bin > aims.out
===
@daboehme
Copy link
Member

daboehme commented Nov 1, 2021

Hi @yaoyi92 ,

Is this a Fortran code? In this case Caliper's automatic wrapping of MPI_Finalize() unfortunately doesn't work, so it can't trigger its output aggregation at the right time.

The best way to get around this is to use Caliper's ConfigManager control API, which lets you configure and start/stop profiling in the code. There's a modified TeaLeaf_CUDA implementation (https://github.com/daboehme/TeaLeaf_CUDA/tree/dev/caliper-support) as example, look specifically at https://github.com/daboehme/TeaLeaf_CUDA/blob/dev/caliper-support/tea_caliper.f90.

As another workaround you can try skipping the across-MPI aggregation and just write reports for each MPI rank. This config should produce report-0.txt, report-1.txt, and so on for each rank:

CALI_CONFIG="runtime-report,aggregate_across_ranks=false,profile.mpi,output=report-%mpi.rank%.txt"

@yaoyi92
Copy link
Author

yaoyi92 commented Nov 1, 2021

Thank you @daboehme, it works for me! Yes, I am working on a Fortran code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants