Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdasgldas task fails with Restart Tile Space Mismatch #622

Closed
BrettHoover-NOAA opened this issue Jan 31, 2022 · 106 comments · Fixed by #1018
Closed

gdasgldas task fails with Restart Tile Space Mismatch #622

BrettHoover-NOAA opened this issue Jan 31, 2022 · 106 comments · Fixed by #1018
Labels
bug Something isn't working

Comments

@BrettHoover-NOAA
Copy link

Expected behavior
gdasgldas task should complete successfully and global-workflow should continue to cycle fv3gdas

Current behavior
gdasgldas task is failing on the first cycle in which the task is not skipped (first 00z analysis period when enough data has been produced to trigger task).

Machines affected
This error is being expressed on Orion.

To Reproduce
I am seeing this bug in a test of global-workflow being conducted on Orion in the following directories:
expid: /work/noaa/da/bhoover/para/bth_test
code: /work/noaa/da/bhoover/global-workflow
ROTDIR: /work/noaa/stmp/bhoover/ROTDIRS/bth_test
RUNDIR: /work/noaa/stmp/bhoover/RUNDIRS/bth_test

This run is initialized on 2020082200, and designed to terminate 2 weeks later on 2020090500.

Experiment setup:
/work/noaa/da/bhoover/global-workflow/ush/rocoto/setup_expt.py --pslot bth_test --configdir /work/noaa/da/bhoover/global-workflow/parm/config --idate 2020082200 --edate 2020090500 --comrot /work/noaa/stmp/bhoover/ROTDIRS --expdir /work/noaa/da/bhoover/para --resdet 384 --resens 192 --nens 80 --gfs_cyc 1

Workflow setup:
/work/noaa/da/bhoover/global-workflow/ush/rocoto/setup_workflow.py --expdir /work/noaa/da/bhoover/para/bth_test

Initial conditions:
/work/noaa/da/cthomas/ICS/2020082200/

The error is found in the gdasgldas task on 2020082600.

Log file:
/work/noaa/stmp/bhoover/ROTDIRS/bth_test/logs/2020082600/gdasgldas.log

Context
This run is being used by a new Orion user and member of the satellite DA group, only to familiarize myself with the process of carrying out an experiment. There have been no code-changes made for this run. I followed directions for cloning and building the global-workflow, and setting up a cycled experiment, from the available wiki:

https://github.com/NOAA-EMC/global-workflow/wiki/

I did not create the initial condition files, they were instead produced for me. The global-workflow repository was cloned on January 25 2022 (d3028b9)

The task fails with the following error in the log-file:

0: NOAH Restart File Used: noah.rst
0: 1 1536 768 389408
0: Restart Tile Space Mismatch, Halting..
0: endrun is being called
0: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

The dimension size of 389408 is suspicious, since earlier in the log a different dimension size is referenced, e.g.:

0: MSG: maketiles -- Size of Grid Dimension: 398658 ( 0 )

When I search for "389408" in the log-file, it only appears in two places, one is in the Restart Tile Space Mismatch error, and the other is while running exec/gldas_rst, when reporting the results of a FAST_BYTESWAP:

216.121 + /work/noaa/da/bhoover/global-workflow/exec/gldas_rst
216.121 + 1>& 1 2>& 2
FAST_BYTESWAP ALGORITHM HAS BEEN USED AND DATA ALIGNMENT IS CORRECT FOR 4 )
1536 768 4 9440776
2 tmp0_10cmdown GLDAS STC1
3 tmp10_40cmdown GLDAS STC2
4 tmp40_100cmdown GLDAS STC3
5 tmp100_200cmdown GLDAS STC4
6 soill0_10cmdown GLDAS SLC1
7 soill10_40cmdown GLDAS SLC2
8 soill40_100cmdown GLDAS SLC3
9 soill100_200cmdown GLDAS SLC4
10 soilw0_10cmdown GLDAS SMC1
11 soilw10_40cmdown GLDAS SMC2
12 soilw40_100cmdown GLDAS SMC3
13 soilw100_200cmdown GLDAS SMC4
15 landsfc
18 vtypesfc
71 tmpsfc GLDAS SKNT
72 weasdsfc GLDAS SWE
79 cnwatsfc GLDAS CMC
88 snodsfc GLDAS SNOD
1 1536 768 389408
216.602 + err=0

I believe that the error is related to the difference in tile-size between these two values.

Detailed Description
I have proposed no change or addition to the code for this run.

Additional Information
Prior gdasgldas tasks in the run from initialization to 2020082600 were successful, but they were all skipped either because the analysis was for a non-00z period or because the requisite number of cycles had not been completed to allow the task to trigger. There are no successful gdasgldas tasks in this run that I can use to compare to the one that has failed. I have conferred with more experienced EMC users of fv3gdas and the cause of the problem is not obvious.

Possible Implementation
I have no implementation plan to offer.

@BrettHoover-NOAA BrettHoover-NOAA added the bug Something isn't working label Jan 31, 2022
@CatherineThomas-NOAA
Copy link
Contributor

@HelinWei-NOAA

@HelinWei-NOAA
Copy link
Contributor

see the email from Dave

Hi Helin and Jun,

I am finding with a recent upgrade of the global workflow that there is a mismatch between the land-sea mask algorithms of the UFS and the GLDAS. This is resulting in a failure of the gdasgldas job, specifically when the Land Information System (LIS) executable is run. The error reported is 'Restart Tile Space Mismatch, Halting..', which can be found in sorce/gldas_model.fd/lsms/noah.igbp/noahrst.F:104. I have only verified this error for a C192/C96 run on 2020080500, where NCH = 97296 and LIS%D%NCH = 99582.

With the recent upgrade, the UFS is modifying the input land sea mask during the first gdasfcst and outputting this modified mask to the tiled surface restart files. Thus, all future forecasts, analyses, etc use this modified mask until the GLDAS reads its own from $FIX/fix_gldas/FIX_T382/lmask_gfs_T382.bfsa.

So I have a few questions. First, the UFS's modification of the land-sea mask is expected, correct? Secondly, should a new fix file be created for the GLDAS with the modified land-sea mask or is the UFS-modified land-sea mask time dependent and thus not fixed? Lastly, should I expect this to be an issue at all resolutions?


Since GLDAS will not be included in the next operational implementation, we need to have someone to decide if we should spend more time on this task.

@yangfanglin
Copy link
Contributor

Is the cycling working without gdasgldas ? Have the user tried at the operational resolutions (C768/C384) ?

@BrettHoover-NOAA
Copy link
Author

BrettHoover-NOAA commented Feb 1, 2022 via email

@CatherineThomas-NOAA
Copy link
Contributor

@yangfanglin The experiment had no errors when running in the first few days before GLDAS turns on, so I assume that it would run if GLDAS was turned off altogether.

If as @HelinWei-NOAA says that GLDAS will not be used going forward, should we turn it off now in our experiments? The DA team is still running atmosphere-only cases since we have non-coupling data upgrades to worry about.

@yangfanglin
Copy link
Contributor

@CatherineThomas-NOAA If the cycling at C384/C192 resolutions with gdasgldas turned on is working on WCOSS, there is likely issues related to the setups on Orion. I had some discussion with Daryl about the use of gdasgldas in future systems. We can discuss more about this issue with Daryl offline.

@CatherineThomas-NOAA
Copy link
Contributor

@AndrewEichmann-NOAA Has your WCOSS experiment progressed to the point where the gldas step is run? If so, did it fail or run without issue?

@AndrewEichmann-NOAA
Copy link
Contributor

@CatherineThomas-NOAA No - I ran into rstprod access issues and am waiting for a response from helpdesk

@CatherineThomas-NOAA
Copy link
Contributor

Thanks @AndrewEichmann-NOAA. I can run a short warm start test on WCOSS.

@CatherineThomas-NOAA
Copy link
Contributor

The warm start test on WCOSS ran the gldas step without failure. Now that Hera is back, I can try a quick test there as well.

@CatherineThomas-NOAA
Copy link
Contributor

Now that Hera is back, I can try a quick test there as well.

Global-workflow develop does not build on Hera (#561), so I can't run this test at this time.

@DavidHuber-NOAA
Copy link
Contributor

I ran into this issue with the S4 and Jet ports, which I reported to Helin. I have since turned off GLDAS altogether and everything has run OK out to 10 days.

Below is more of the thread between Helin, @junwang-noaa, and myself:

David,

GLDAS should use the same land-sea mask as UFS. If the land-sea mask can be changed during the forecast, that certainly will bring up some issues for GLDAS.

Helin

Dave,

The model does check/modify the land-sea mask according to the input land/lake fraction and soil type values, this is applied to both non-fractional grid and fractional grid. The changes are required to make sure the model has consistent land sea mask, and soil types. The new land sea mask is output in the model sfc file. I expect you do not change the oro data, the soil type data during your run, so this land sea mask won't change. In other words, I think you need to create lmask_gfs_T382.bfsa once with this land sea mask in the history file.

Jun

@arunchawla-NOAA
Copy link
Contributor

@HelinWei-NOAA @barlage and @yangfanglin should Jun's suggestion be followed at the mask be generated for GLDAS from the history file ? It would be good to know if the same issue is seen on other platforms

@yangfanglin
Copy link
Contributor

@arunchawla-NOAA Cathy reported that "the warm start test on WCOSS ran the gldas step without failure". So the failures on other platforms might be a porting issue. @CatherineThomas-NOAA Cathy, can you confirm ? What is the resolution you tested on WCOSS ? I assume you were using the GFS.v16* tag instead of the UFS, right ?

@WalterKolczynski-NOAA
Copy link
Contributor

@BrettHoover-NOAA I'm unable to access either the code or experiment directories.

I ran a test on Orion a few days ago and didn't have any issue, so this probably isn't a port issue. I'm setting up another test just to be sure.

@CatherineThomas-NOAA
Copy link
Contributor

My test on WCOSS was C384/C192 with warm start initial conditions from the PR #500 test. This was using the head of develop global-workflow at the time (d3028b9), compiling and running with atmosphere only. Since then, I've run new tests on Hera and Orion with the recent update to develop (97ebc4d) and ran into no issues with gldas.

@BrettHoover-NOAA It's possible this issue got fixed inadvertently with the recent develop update. It could also be related to the set of ICs that you started from. How about you try to replicate the test I ran first? I'll point you to my initial conditions offline.

@WalterKolczynski-NOAA
Copy link
Contributor

WalterKolczynski-NOAA commented Feb 4, 2022

I was able to run a 6½-cycle warm-start from a fresh clone overnight without issue. I'm also using the C384 ICs Cathy produced for PR #500.

@BrettHoover-NOAA
Copy link
Author

BrettHoover-NOAA commented Feb 7, 2022

@CatherineThomas-NOAA I was able to complete your test with the new develop (97ebc4d) and warm-start ICs on Orion, and I ran into no problems, gdasgldas appears to finish successfully.

@CatherineThomas-NOAA
Copy link
Contributor

@BrettHoover-NOAA Great to hear. It looks like your Orion environment is working properly. Maybe to round out this set of tests you could try the warm-start 2020083000 ICs but with the original workflow that you cloned, assuming you still have it.

@BrettHoover-NOAA
Copy link
Author

@CatherineThomas-NOAA I have that test running right now, I'll report back ASAP

@BrettHoover-NOAA
Copy link
Author

@CatherineThomas-NOAA The warm-start test with the original workflow also finished successfully.

@CatherineThomas-NOAA
Copy link
Contributor

@BrettHoover-NOAA Great! There may have been an incompatibility with the other ICs then.

@HelinWei-NOAA @DavidHuber-NOAA Is the land-sea mask problem that you mentioned early documented elsewhere? Can this issue be closed?

@HelinWei-NOAA
Copy link
Contributor

@CatherineThomas-NOAA No. It hasn't been documented elsewhere. But I have let Fanglin and Mike know this issue. IMO this issue can be closed now.

@BrettHoover-NOAA Great! There may have been an incompatibility with the other ICs then.

@HelinWei-NOAA @DavidHuber-NOAA Is the land-sea mask problem that you mentioned early documented elsewhere? Can this issue be closed?

@DavidHuber-NOAA
Copy link
Contributor

@CatherineThomas-NOAA I'm also OK with this issue being closed.

@DavidHuber-NOAA
Copy link
Contributor

I gave this a fresh cold start test (C192/C96) on Orion over the weekend and received the same error. Initial conditions were generated on Hera (/scratch1/NESDIS/nesdis-rdo2/David.Huber/ufs_utils/util/gdas_init), outputting them here: /scratch1/NESDIS/nesdis-rdo2/David.Huber/output/192. UFS_Utils was checked out using the same hash as the global workflow checkout script (04ad17e2).

These were then transferred to Orion where a test ran from 2020073118 through 2020080500, where gdasgldas failed with the same message ("Restart Tile Space Mismatch, Halting.."). The global workflow hash used was 64b1c1e and can be found here: /work/noaa/nesdis-rdo2/dhuber/gw_dev. Logs from the run can be found here: /work/noaa/nesdis-rdo2/dhuber/para/com/test_gldas/logs.

A comparison of the land surface mask (lsmsk) between the IC tile1 surface file and the tile1 restart file shows a difference.

I also created initial conditions for C384/C192 and compared the land surface mask against Cathy's tile 1 restart surface file, which shows no difference.

Lastly, I copied the C384/C192 ICs over to Orion and executed just the gdasfcst job, then compared the lsmsk field as before and there was a difference.

It is this modification by the UFS that triggers problems for GLDAS and I think any further tests could be limited to just a single half-cycle run of gdasfcst. I tracked this modification to the addLsmask2grid subroutine, which is tied to ocean fraction which in turn is set in the orographic fix files, which are identical on Orion and Hera. So I am at a loss as to why these differ between warm and cold starts. Is this expected behavior and if so should GLDAS be turned off for cold starts?

@KateFriedman-NOAA
Copy link
Member

All, is there guidance on whether users should turn off GLDAS when running cold-started experiments for now? @jkhender just hit the same error on Orion with a cold-started experiment using global-workflow develop as of 2/2/22. Thanks!

@jkhender
Copy link
Contributor

correction - my experiment is running on Hera

@DavidHuber-NOAA
Copy link
Contributor

This is what I would suggest. I don't think that a fix file update will work for everyone since warm starts seem to be using the current fix files without a problem, implying that an update would result in a mismatch for those users (testing could confirm this). Alternatively, two sets of fix files could be created, one for warm starts and one for cold, but that would require some scripting to know which to link to.

@DavidHuber-NOAA
Copy link
Contributor

@KateFriedman-NOAA @HelinWei-NOAA The new fix files should only be used for cold starts. Warm starts should continue to use the old fix files. The reason for this is that warm starts continue to use the same version of the GFS, which uses an older algorithm for the land mask and thus does not change the land mask to match what is in the new fix files. My suggestion was to create two sets of fix files -- one for cold starts and one for warm starts. This will require changes to the gldas scripts to point to the correct fix dataset.

See my more detailed explanation above.

@KateFriedman-NOAA
Copy link
Member

@DavidHuber-NOAA Ok I see. So I should use the following in these situations then (please check my understanding):

  • cold-start:
    • frac_grid on: /scratch1/NCEPDEV/global/glopara/fix/gldas/20220920/frac_grid/FIX_T1534
    • frac_grid off: /scratch1/NCEPDEV/global/glopara/fix/gldas/20220920/nofrac_grid/FIX_T1534
  • warm-start (use older/existing set):
    • frac_grid on (?): /scratch1/NCEPDEV/global/glopara/fix/gldas/20220805/FIX_T1534
    • frac_grid off: ?

What about frac_grid off in warm-start? If warm-start use the same set regardless of frac_grid?

What bugs me about this still is that what happens in cold vs warm is carried through the tests even though they are both warm after the initial cycle. Seems to me that the cold vs warm difference should only exist in the first cycle and the same fix files should be used in either started run after that cycle. I'm not a forecast model expert though so this is just my view from the workflow end of things. :)

@DavidHuber-NOAA
Copy link
Contributor

@KateFriedman-NOAA I believe the current fix files should work for both nofrac_grid and frac_grid warm starts since production does not modify the land-sea mask.

@DavidHuber-NOAA
Copy link
Contributor

I agree with your point that the new fix files should be used after the first cycle. Why the land sea mask does not change in future warm-start cycles does not make sense to me, either.

@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Sep 21, 2022

I believe the current fix files should work for both nofrac_grid and frac_grid warm starts since production does not modify the land-sea mask.

Ok, thanks @DavidHuber-NOAA for confirming. Will likely need to run nofrac_grid tests at some point.

I agree with your point that the new fix files should be used after the first cycle. Why the land sea mask does not change in future warm-start cycles does not make sense to me, either.

@HelinWei-NOAA @junwang-noaa @yangfanglin Is there a particular reason that a different land sea mask file is used depending on how the run was started? If a cycle is using warm-starts (after the initial cold-start cycle) why is it different than a warm-started run that is also using warm-starts in it's cycles? Is it possible to only use the special cold-start GLDAS fix file set in the first gldas job of a cold-started run and then use the existing fix set for later cycles (and in all warm-started run cycles)? Hope my questions make sense. Thanks!

@yangfanglin
Copy link
Contributor

@KateFriedman-NOAA @DavidHuber-NOAA The same set of fix files should be used for all cases discussed here, no matter it is warm or cold start, first cycle or the cycles after, fractional or non-fractional grid. If you are using the current operational GFS then you do need the "old" fix files. Any UFS based applications should use the new fix files. If they new files are not working then there are probably still issues.

@KateFriedman-NOAA
Copy link
Member

The same set of fix files should be used for all cases discussed here, no matter it is warm or cold start, first cycle or the cycles after, fractional or non-fractional grid. If you are using the current operational GFS then you do need the "old" fix files. Any UFS based applications should use the new fix files.

@yangfanglin Noted. Ok, trying to wrap my head around the different IC scenarios we have here then, which need to be supported in the workflow since we will have GFSv16 warm-starts for a while still. Clarification questions below.

Here are more details on my ICs and runs:

  1. C192C96L127 run cold-started with ICs generated earlier this year using ops GFSv16 warm-start restarts. GLDAS job now works with updated fix files (only tested frac_grid=.true. scenario) after failing with old set.
  2. C768C384L127 run with ops GFSv16 warm-start restarts (which were run through checksum (ncatted -a checksum,,d,, $FILE) for use by the updated UFS model). GLDAS jobs work with old fix files and fail with new ones (frac_grid=.true.).

Both sets of ICs were originally from GFSv16 ops but were different based on cold (run through chgres_cube) or warm. I ran with the Prototype-P8 tag of ufs-weather-model in both tests.

Question 1) All warm-started runs of UFS that use warm-start ICs from GFSv16 need to use the old GLDAS fix files, correct? Regardless of the checksum run on them?

Question 2) Any run of the UFS that uses either cold-start (regardless of source) or warm-start from a different GFSv17 run use the new GLDAS fix files? Is the correct?

Question 3) Is there an aspect of the ICs that I can check at the start of a run to help decide which set of fix files the workflow will use?

Question 4) Is there a way to process GFSv16 warm-starts for use with the new GLDAS fix files?

Thanks!

@yangfanglin
Copy link
Contributor

yangfanglin commented Sep 22, 2022

@KateFriedman-NOAA We cannot run Prototype-P8 tag of ufs-weather-model using GFS.v16 warm start files because of changes in a few physics schemes including land and microphysics etc. GFS.v16 warm start ICs need to be converted to cold start ICs using CHGRES even if you are running at the C768L127 resolution . We have been doing this for a while. You can find a set of cold start ICs at the C768L127 resolution converted from GFS.v16 warm start ICs archived on HPSS at /NCEPDEV/emc-global/5year/Fanglin.Yang/ICs/C768L127

@NOAA-EMC NOAA-EMC deleted a comment from JunDu-NOAA Sep 22, 2022
@NOAA-EMC NOAA-EMC deleted a comment from JunDu-NOAA Sep 22, 2022
@KateFriedman-NOAA
Copy link
Member

We cannot run Prototype-P8 tag of ufs-weather-model using GFS.v16 warm start files because of changes in a few physics schemes including land and microphysics etc.

Got it, thanks for explaining!

GFS.v16 warm start ICs need to be converted to cold start ICs using CHGRES even if you are running at the C768L127 resolution .

Ok, I was wondering if that would be the case. Thanks!

You can find a set of cold start ICs at the C768L127 resolution converted from GFS.v16 warm start ICs archived on HPSS at /NCEPDEV/emc-global/5year/Fanglin.Yang/ICs/C768L127

Good to know, thanks!

KateFriedman-NOAA added a commit that referenced this issue Sep 22, 2022
Absorb GLDAS scripts into global-workflow and fix GLDAS job by updating scripts to use new GLDAS fix file set.

* Remove GLDAS scripts from .gitignore
* Remove GLDAS script symlinks from link_workflow.sh
* Add GLDAS scripts to global-workflow
* Updates to GLDAS scripts, includes converting GLDAS script to replace machine checks with CFP variables
* Address linter warnings and remove obsolete platforms

Refs #622 #1014
@KateFriedman-NOAA
Copy link
Member

@BrettHoover-NOAA @DavidHuber-NOAA @HelinWei-NOAA @yangfanglin PR #1018 mostly fixed this issue, however PR #1009 will bring in changes to have the GLDAS job use the updated fix set. We will announce that the GLDAS job is working again after #1009 goes into global-workflow develop.

Thanks for your help on resolving this issue!

WalterKolczynski-NOAA added a commit to WalterKolczynski-NOAA/global-workflow that referenced this issue Sep 26, 2022
GLDAS scripts were recent moved into the workflow repo and need to
be updated for the new fix structure like other components.

Refs: NOAA-EMC#622, NOAA-EMC#966
WalterKolczynski-NOAA added a commit to WalterKolczynski-NOAA/global-workflow that referenced this issue Oct 4, 2022
GLDAS scripts were recent moved into the workflow repo and need to
be updated for the new fix structure like other components.

Refs: NOAA-EMC#622, NOAA-EMC#966
@RussTreadon-NOAA
Copy link
Contributor

Restart Tile Space Mismatch for C96 parallel

Running a C96L127 parallel on Orion. The 2021122500 gdasgldas job is the first job in the parallel with sufficient sfluxgrb files to run gldas. Executable ./LIS fails with the following error:

 0:  noah_scatter()
 0:  NOAH Restart File Used: noah.rst
 0:            1         384         192       24793
 0:  Restart Tile Space Mismatch, Halting..
 0:  endrun is being called
 0: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
 0: In: PMI_Abort(1, application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0)
 0: slurmstepd: error: *** STEP 7161891.1 ON Orion-11-36 CANCELLED AT 2022-10-11T09:48:05 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

Here are details of the parallel

  • EXPDIR=/work/noaa/da/Russ.Treadon/para_gfs/prgsida4
  • HOMEgfs=/work/noaa/da/Russ.Treadon/git/global_workflow/develop @ e915eb6
  • ROTDIR=/work/noaa/stmp/rtreadon/comrot/prgsida4
  • jog log file: /work/noaa/stmp/rtreadon/comrot/prgsida4/logs/2021122500/gdasgldas.log
  • job run directory: /work/noaa/stmp/rtreadon/RUNDIRS/prgsida4/2021122500/gdas/gldas.63881

Tagging @KateFriedman-NOAA , @WalterKolczynski-NOAA , @HelinWei-NOAA

Have we successfully run gldas on Orion for C96 after #1009 was merged into develop?

@DavidHuber-NOAA
Copy link
Contributor

DavidHuber-NOAA commented Oct 11, 2022

@HelinWei-NOAA @KateFriedman-NOAA @WalterKolczynski-NOAA Looking through the history, it seems we never generated the sfc.gaussian.nemsio file for C96 and the associated fix file for T192. Helin, it looks like the file you would need is /work/noaa/stmp/rtreadon/RUNDIRS/prgsida4/2021122500/gdas/gldas.63881/sfc.gaussian.nemsio.20211222.

@HelinWei-NOAA
Copy link
Contributor

@DavidHuber-NOAA You are right. We haven't updated the gldas fixed fields for the resolution C96. Now we have one for fraction grid (/work/noaa/stmp/rtreadon/RUNDIRS/prgsida4/2021122500/gdas/gldas.63881/sfc.gaussian.nemsio.20211222). Would you mind running another C96 case but turning off fractional grid? Thanks.

@DavidHuber-NOAA
Copy link
Contributor

@HelinWei-NOAA Will do.

@DavidHuber-NOAA
Copy link
Contributor

@HelinWei-NOAA OK, the C96, non-fractional grid sfc file is here: /work/noaa/nesdis-rdo2/dhuber/for_helin/96_nofrac/sfc.gaussian.nemsio.20220401.

@HelinWei-NOAA
Copy link
Contributor

@DavidHuber-NOAA This nemsio file is for C192 resolution even you ran the model @c96. My program is not robust enough to fail with the wrong resolution input file. I check the previous nemsio tarball you created. The data is always one level higher resolution than it should. @KateFriedman-NOAA I need to recreate some GLDAS fixed fields because of this mismatch.

@KateFriedman-NOAA
Copy link
Member

I need to recreate some GLDAS fixed fields because of this mismatch.

@HelinWei-NOAA Okie dokie. Pass me the updated/new files when ready and I'll copy them into the fix set. Thanks!

@HelinWei-NOAA
Copy link
Contributor

@DavidHuber-NOAA When you ran gdas@c96, gfs was always run @ 1 level higher resolution (c192 for this case). nemsio file was created by gfs?

@DavidHuber-NOAA
Copy link
Contributor

DavidHuber-NOAA commented Oct 14, 2022

@HelinWei-NOAA The resolution is correct for C96. The GLDAS runs at T190 for a C96 case and that is the resolution of the nemsio file I provided. The GLDAS resolution is determined by the global workflow script scripts/exgdas_atmos_gldas.sh at line 58:
JCAP=$((2*res-2))

This is confirmed on lines 90 through 93 where the linking to the correct fix files is written as

if [[ "${FRAC_GRID:-".true."}" = ".true." ]] ; then
  ln -fs "${FIXgldas}/frac_grid/FIX_T${JCAP}" "${RUNDIR}/FIX"
else
  ln -fs "${FIXgldas}/nonfrac_grid/FIX_T${JCAP}" "${RUNDIR}/FIX"
fi

So I believe you have correctly derived the fix files for C192 - C768 and that between Russ' and my nemsio files, you will have what you need for C96.

If T190 is supposed to correspond to C192, then the gldas scripts need to be rewritten.

@HelinWei-NOAA
Copy link
Contributor

@DavidHuber-NOAA You are absolutely right. Thank you for clarification. However, my program interpreted them wrongly, so I need to recreate the whole set of gldas fixed filed data. Do you have the nemsio files for C192 and C384 without fractional grid? Thanks.

@HelinWei-NOAA
Copy link
Contributor

@DavidHuber-NOAA I found those two nemsio files. Thanks.

@HelinWei-NOAA
Copy link
Contributor

@KateFriedman-NOAA @DavidHuber-NOAA It turns out only C96 (T190) has a problem. Please copy the updated data for this resolution FIX_T190 from /scratch1/NCEPDEV/global/Helin.Wei/save/fix_gldas for both nonfrac_grid and frac_grid. Thanks.

@KateFriedman-NOAA
Copy link
Member

@HelinWei-NOAA @DavidHuber-NOAA @RussTreadon-NOAA I have rsync'd the updated FIX_190 nonfrac_grid and frac_grid files into the fix/gldas/20220920 subfolder on all supported platforms. Please retest the GLDAS job that failed and let me know if further updates are needed. Thanks!

kayeekayee pushed a commit to kayeekayee/global-workflow that referenced this issue May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.