-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify help message for "lenkf.j -debug" #591
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyuan-jiang, thanks for the proposed change. I don't think it's a good idea, though. Totalview may not be available on some compute systems. Also, the one-line command isn't onerous given that the user still has to go through the totalview option windows and select the MPI stack etc. It's also good to have the message pointing to the Wiki page with instructions. The message could, of course, be added back, but if lenkf.j launches totalview, the user is almost certainly going to miss the message because they are then presented with a totalview window.
I think the wiki is obsolete. The debugger can not be launched that way. I will update wiki |
@weiyuan-jiang, @gmao-qliu: For the record, I'm adding an excerpt from an off-github email by @gmao-qliu on 13-Oct-2022: So, can we launch totalview manually from the command line after "source g5_modules"? That would be a minimal addition to the Wiki and would make the code less specific on what is available on Discover. |
Yes, we can. So you want "lenkf.j -debug" just echo help information? |
yes. that'll work. but can we remove the "source g5_module" part if with "-debug"? It outputs "g5_modules: Setting BASEDIR and modules for .." which is confusing. |
an attempt to get back to the original, system-agnostic approach of stopping lenkf.j before manually launching the debugger; probably needs further tweaking
if ( $debug_flag == 0 ) then | ||
source $GEOSBIN/g5_modules | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant to take care of @gmao-qliu's suggestion to avoid "source g5_modules" when in debug mode.
If this works, then I'm not sure why we would need unset argv; setenv argv
above that was introduced to avoid g5_modules tripping up when lenkf.j is called with the -debug flag. Maybe the suggestion to omit "source g5_modules" here won't work? I'm getting really confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since '-debug" option only prints out information, I would like to change it to "--help". It prints out and exits before source g5_modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyuan-jiang : I don't think --help is good here. Doesn't -debug run all of the preprocessing and stops only when it's time to run GEOSldas.x? In other words, -debug does not just print out info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, does the preprocessing possibly need "source g5_modules"? @gmao-qliu: Did you verify that we can skip "source g5_modules" when running with -debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the preprocessing is not necessary for debug. Except users change the directory to scratch and launch totalview
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, does the preprocessing possibly need "source g5_modules"? @gmao-qliu: Did you verify that we can skip "source g5_modules" when running with -debug?
actually "source g5_modules" is needed with -debug because otherwise $BASEDIR is undefined. I take my suggestion back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After users finish ldas_setup and get the interactive nodes, they can go to the "run" directory, source ../build/bin/g5_modules.sh , then load and launch totalview in run directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyuan-jiang : Are you seriously saying that the first 200-300 lines in lenkf.j are not needed? If that's the case, why can't we just delete them altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiyuan-jiang, maybe what you are suggesting is to run in debug mode from a crashed experiment directory? Otherwise, how can we do without preprocess_ldas.x ?
I think we should keep -debug such that it exactly replicates what sbatch lenkf.j would do.
We can discard my changes. But we need to add more information to the wiki. 1) launch totalview in scratch directory 2) config the same number of processors in totalvies as that in lenkf.j |
@weiyuan-jiang, I'm still not sure that for debugging we can simply skip the first 200-300 lines of lenkf.j in all cases. Let's keep everything as is for now and discuss at our next tag-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After lenkf.j exit, SCRDIR and GEOSBIN are not defined anymore. May be it is ease to say go to scratch directory,
source ../build/bin/g5_modules.sh
module load tview
totalview
But the "echo" commands that use $SCRDIR and $GEOSBIN are within lenkf.j, so shouldn't the user get the correct dir names printed in the help text on the screen? |
Oh, you are right. The echo should print out right information. Another suggestions: 1) The users should choose the same number of processors as that in lenkf.j 2) Just launch totalview without executable because usually we debug in parallel mode. The users will have to open a new parallel session. |
@weiyuan-jiang @gmao-rreichle's suggestion to start totalview with executable and edit the parallel setting afterwards using Ctrl+A works. The current instruction on the wiki page is overall good to me. Someone who tries to run debugging the first time might have more insight of the clarity. One minor edit suggestion: add the highlighted word to the line "To use debugging tools at NCCS, you must be on an interactive compute node, which can be obtained with the following sample command:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I further edited the Wiki page to reflect the latest changes by @weiyuan-jiang and @gmao-qliu.
Clarify help message that users see when running "lenkf.j -debug".