-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug gsi.x aborts in deter_sfc_gmi with invalid array index #778
Comments
Attention: @xincjin-NOAA , @emilyhcliu , @azadeh-gh , @CatherineThomas-NOAA |
I saw this error before. It went away when I ran again. The failed point was indeed from GMI while performing spatial averaging. We need to fix this. |
@xincjin-NOAA Could you take a look at the failure from GMI spatial averaging? The initial conditions that Russ used in his test can be found in the following location on HERA: |
@emilyhcliu @RussTreadon-NOAA Can you give me some guideline so that I can reproduce this errors? or I just clone the GSI and Build develop at e82365d. Then run ctests or one cycle test run. I believe in that I found the issue in the code:
klonn <0 should be replaced with klonn < 1
|
I am going to run experiment on Hera to see if this solve the issue.
I guess I don't set correctly those paths for the initial and or obs data, Can you take a look at it when you have time and give some advice? @emilyhcliu @RussTreadon-NOAA |
@xincjin-NOAA , the paths to Emily's data are not quite correct. The script I used to run Emily's case is
|
@RussTreadon-NOAA Thank you so much for sharing the run script with me. I have used the modified run script to reproduce the issue:
after changed the code in the deter_sfc_mod.f90 as stated above., rebuilt, and run script, the output shows:
Therefore this issue is solved. Two notes:
|
@xincjin-NOAA , I added your bug fix to a working copy of the code from PR #779. The gsi was built in debug mode and run on Hera using all data for the 2023060712 case. The code fails in the same way as you report
|
@RussTreadon-NOAA, I have cloned gsi from: https://github.com/RussTreadon-NOAA/GSI.git and checked out branch: feature/thompson_reff. However, I got different output when broken:
I am not sure what I missed. my runscript is /scratch1/NCEPDEV/da/Xin.C.Jin/debug_gsi/run_script/rungsi_dev_russ.sh, which is almost the same as yours mentioned above. The running directory is /scratch2/NCEPDEV/stmp1/Xin.C.Jin/tmp382/debug_778.2023060712 |
@xincjin-NOAA , the rest of the trace back in
The code aborts due to a floating point exception in the mpi reduction operation
Dan Kokron comments on the behavior in GSI PR #772. He found that initializing My working copy of thompson_reff on Hera initializes
This change is not needed on WCOSS2 for the debug |
FYI @xincjin-NOAA . PR #779 was merged into |
@RussTreadon-NOAA Thank you so much for your information. I made changes as you suggested above. However, it was killed because the time-limit of 30 minutes for the debug queue. How many hours do you set for the run? Thanks, |
@xincjin-NOAA , I arbitrarily bumped the wall clock limit up to 3 hours, 30 minutes. Doing so requires the queue to be changed from debug to batch. The debug queue has a maximum wall clock limit of 30 minutes. |
@RussTreadon-NOAA and @emilyhcliu I extended the time-limit to 6.5 hours, made a few experiments and found the cause of the issue is that there are NaNs in the bias file: /scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens/gdas.20230607/06/analysis/atmos/gdas.t06z.abias
After I removed these NaNs and re-run the experiment. the experiment was normal until canceled after arrived the time-limit of 6.5 hours.
|
Good detective work, @xincjin-NOAA. You should open a PR to get your fix to subroutine deter_sfc_gmi into @emilyhcliu , the operational |
@RussTreadon-NOAA In v16.3, GMI is not assimilated. So, we set it to monitoring mode. Later, the NESDIS upgrade their satellite ingest for GMI and a few other data and our obsproc team modifies their operation accordingly. However, the size of GMI data double, the data are duplicated twice. This caused memory problem with NaN in the bias correction. The obsproc team fixed the problem and size of GMI is back to normal. |
Thank you @emilyhcliu for sharing how |
We can reset the bias correction in our obs upgrade before v17 implementation. |
@RussTreadon-NOAA I have created a pull request (#781 ). Do you know how to remove the fix directory from the changed files. Thanks, Xin |
Encounter an unexpected error while working on issue #777 .
Build debug
gsi.x
on Hera fromdevelop
ate82365d
. Run 2023060712 case using files from/scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens
The debug
gsi.x aborts
indeter_sfc_mod.f90
with the messageThe code in question is the
isli_full(i,j)
line belowThis code is in subroutine
deter_sfc_gmi
in filedeter_sfc_mod.f90
Prints added to the code confirm that
klonn
is 0.j=0
is not a valid index value for arrayisli_full
.The text was updated successfully, but these errors were encountered: