Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor EMC_post decomposition from 1D to 2D as part of EMC_post refactoring #274

Closed
GeorgeVandenberghe-NOAA opened this issue Mar 5, 2021 · 149 comments · Fixed by #339
Closed
Assignees
Labels
enhancement New feature or request

Comments

@GeorgeVandenberghe-NOAA
Copy link
Contributor

GeorgeVandenberghe-NOAA commented Mar 5, 2021

EMC_post is currently decmposed on latitude (J) only. This is adequate for several more years but since post is generally being refactored, now is a good time to make the jump to 2D. A second goal is to make the 2D decomposition either flexible, or just have it mimic the ufs-weather-model decomposition so developers working on both codes can exploit commonality. This will be a modestly difficult project with most effort, figuring out the plumbing of the code (in progress). This issue is being created for management and project leader tracking and per EMC management directives and also best practices, results should be tracked through this Github issue or slack, NOT email.

There are many OTHER scaling issues in the post that are not affected by the decomposition. Most of the issues are orthogonal to the decomposition though and can be worked independently. The most salient is input I/O of model state fields in the standalone post.

By 03/01/2021:

  • The offline post testing procedure provided by Jesse can be found at here

  • The inline post testing procedure provided by Bo can be found at here

  • Jesse's FV3 branch can be found at here

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

And Input I/O of model state fields IS affected by decomposition, just noting.

@WenMeng-NOAA WenMeng-NOAA added the enhancement New feature or request label Mar 8, 2021
@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

Wading through the code. A large fraction of the work will be modifying the I/O to either scatter 2D subdomains rather than 1D contiguous slices (the serial option), or modifying the parallel I/O to get the subdomains. The rest looks like bookkeeping with loop indices but I have not looked for stencil operators yet that need halo exchanges. I need to learn much more about the NetCDF API also. That's the status so far. Working on the standalone FV3 portion first.

@HuiyaChuang-NOAA
Copy link
Collaborator

@GeorgeVandenberghe-NOAA Agreed. My plan is to have @JesseMeng-NOAA and @BoCui-NOAA do the bookkeeping parts of changing I loop indices and take care of halo exchanges when necessary.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

Work on standalone post was promising. Many issues just assembling a testcase for inline post for ufs-weather-model. I am trying to find where in the model this is called from and how the model history files are assembled on the I/O group side and it took several days to get a working testcase, then isolate a UPP library from the build so I could work with it and that's where I am now on Jet since WCOSS is down for a week. This process has taken much more time than expected. GWV 3/17

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

For what it's worth the intel tracebackqq('string ',iret) issues a traceback from wherever it's called, then keeps going. I tried that
to get the call tree but it segfaults in ESMF itself, but still provides enough information for me.

If iret is zero the program terminates
if iret is -1 a traceback is written to stderr and the program continues running.

Using this it looks like PROCESS( ) a major post routine, is called directly from something in ESMF and there are at least thirty ESMF routines in the call chain above it. Jet intel is currently frozen by a transient system issue on Jet

@HuiyaChuang-NOAA
Copy link
Collaborator

Thank you @GeorgeVandenberghe-NOAA for the update. Sound like you're testing stand-alone post and in-line post
at the same time? Could you come to next Tuesday's UPP-re-engineering tag-up?

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

Also the differences are small

from cmp -l
1468035867 0 1
1469293425 0 1
NATLEV.GrbF06
1240962334 0 1
1242219892 0 1
PRSLEV.GrbF06

@HuiyaChuang-NOAA
Copy link
Collaborator

After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.

look at SURFCE.f history on Github, the latest update was Jim's fix to threading violation 7 days ago. The commit prior to this was back in Dec. I believe you started your folk after Dec, right? @WenMeng-NOAA Did latest threading fix changed UPP regression test results?

@WenMeng-NOAA
Copy link
Collaborator

@HuiyaChuang-NOAA There are no changed results from Jim's fixes in UPP regression tests.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

GeorgeVandenberghe-NOAA commented Apr 1, 2021 via email

@WenMeng-NOAA
Copy link
Collaborator

Sometimes, the changes will make UPP grib2 file size changed. In UPP regression tests, we add field by field value comparison. It would be fine for no unexpected changed results.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

GeorgeVandenberghe-NOAA commented Apr 1, 2021 via email

@HuiyaChuang-NOAA
Copy link
Collaborator

@GeorgeVandenberghe-NOAA can you point me at your regression test output directory. I will take a look.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

GeorgeVandenberghe-NOAA commented Apr 1, 2021 via email

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

I found the following line in CLDRAD.f

  real    FULL_CLD(IM,JM)   !-- Must be dimensioned for the full domain

Why do we need full domain? Concerned I may miss others that need full domain although so far I am only redimensioning partial J domain arrarys replacing IM with isx:iex

@WenMeng-NOAA
Copy link
Collaborator

@GeorgeVandenberghe-NOAA My understanding is full_cld is used for calling routine AllGETHERV for hallo exchange? See line 938. @HuiyaChuang-NOAA may chime in for detail.

@BoCui-NOAA
Copy link
Contributor

Wen is right, FULL_CLD(IM,JM) must be defined for full domain due to subroutine allgetherv(mpi_allgather) where mpi_allgather is called there and grid1 must have dimension (im,jm).

I took a note at document https://docs.google.com/spreadsheets/d/10jlqaBHlcg8xHHc4kH1JWJbTPGMcZeZLNbMCszdza2c/edit#gid=0

@WenMeng-NOAA @GeorgeVandenberghe-NOAA

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

Has anyone looked at the "inspection only" PR submitted late last week, April 1 or so for second opinions and comments?

@WenMeng-NOAA
Copy link
Collaborator

@GeorgeVandenberghe-NOAA I haven't got the chance to look at it yet. I might do this week.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

GeorgeVandenberghe-NOAA commented Apr 7, 2021 via email

@BoCui-NOAA
Copy link
Contributor

I will start to look at it this week.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

There is a data structure datapd. THe following line in CLDRAD.f
datapd(1:im,1:jend-jsta+1,cfld)=GRID1(1:im,jsta:jend)

suggests it's used as some kind of halo pad. Could someone describe this in more detail before I figure out how
it should be decomposed in the I direction. Will it need the full I dimension, a superset of the rank's I domain of isx:iex
or just the rank's I domain?

@WenMeng-NOAA
Copy link
Collaborator

My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA or @junwang-noaa for detail.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

GeorgeVandenberghe-NOAA commented Apr 7, 2021 via email

@WenMeng-NOAA
Copy link
Collaborator

@GeorgeVandenberghe-NOAA I have reviewed your inspection PR which makes sense to me. I sent you my comments on specific places. Thanks!

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

GeorgeVandenberghe-NOAA commented Apr 8, 2021 via email

@HuiyaChuang-NOAA
Copy link
Collaborator

My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA or @junwang-noaa for detail.

Yes, Wen was right. Jun created this array to store all data to be written to Grib2 output thus its dimensions can not be changed.

@JesseMeng-NOAA
Copy link
Contributor

One more thing I have mentioned in my comment above but not in the conversation today.
There is also the subroutine POLEAVG.f that computes averages around the entire i circle at j=1 and j=jm.
@GeorgeVandenberghe-NOAA could you please also look into this? Thanks!

@WenMeng-NOAA
Copy link
Collaborator

@GeorgeVandenberghe-NOAA The itag for gfs is created in post workflow script. You might see new gfs itag format like:
&model_inputs
fileName='nemsfile'
IOFORM='netcdfpara'
grib='grib2'
DateStr='2021-02-16_06:00:00'
MODELNAME='GFS'
fileNameFlux='flxfile'
/
&NAMPGB
KPO=57,PO=1000.,975.,950.,925.,900.,875.,850.,825.,800.,775.,750.,725.,700.,675.,650.,625.,600.,575.,550.,525.,500.,475.,450.,425.,400.,375.,350.,325.,300.,275.,250.,225.,200.,175.,150.,125.,100.,70.,50.,40.,30.,20.,15.,10.,7.,5.,3.,2.,1.,0.7,0.4,0.2,0.1,0.07,0.04,0.02,0.01,
/

@JesseMeng-NOAA
Copy link
Contributor

Done all my shares of 2D DECOMPOSITION subroutines including those I took over from Bo.
Wen's regression test in venus:/u/Wen.Meng/noscrubd/ncep_post/post_regression_test_new
still works and I added numx=1 in itag namelist. numx=1 passes regression test for most of the models except for nmmb RH on just a few sigma levels, not all levels. I will look into this.
1125:462230579:RH:0.47-1 sigma layer:rpn_corr=0.992464:rpn_rms=2.55337
1126:462562119:RH:0.47-0.96 sigma layer:rpn_corr=0.993236:rpn_rms=2.52673
1127:462897214:RH:0.18-0.47 sigma layer:rpn_corr=0.996433:rpn_rms=1.58959
1128:463197281:RH:0.84-0.98 sigma layer:rpn_corr=0.980175:rpn_rms=3.61572
1129:463563671:MCONV:0.85-1 sigma layer:rpn_corr=0.999514:rpn_rms=6.62663e-09
1482:487727642:RH:0.47-1 sigma layer:rpn_corr=0.992464:rpn_rms=2.55337

@BoCui-NOAA
Copy link
Contributor

BoCui-NOAA commented Nov 6, 2021 via email

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

So.ooo
How do we want to handle that IMx2 structure that is needed for full polar I domains. Should we change the exch binding or just document something extra that is available in anything
that uses CTLBLK_MOD.f after exch is called, or generate an exch_pole routine that fills this structure when called with the subdomain field, similar to what exch does with the boundaries except what's returned is a new filled data structure. None of this is hard.

Note exch.f itself remains full of debug code that should be removed as soon as the 2D decomposition is debugged for all domains and all reasonable decompositions.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

I have run into a snag with the boundary halos. The subdomains are dimensioned with the lowest I to be the greater of ista-2, or 1. The 1 is causing issues for I=1, no place to put the cyclic IM value to the left. An analagous situation occurs for I=im.

Pondering how to cleanly fix this.

@HuiyaChuang-NOAA
Copy link
Collaborator

Summary from 11/23/2021 2D decomposition meeting today:

  • The team discussed whether to decompose INITPOST for nemsio and wrfio and decided not to. These are legacy IO to be phased out in the near future. Additionally, the re-engineered version reproduces these two options for numx=1
  • Jesse will update WRFPOST to set numx=1 for nemsio and wrfio, and print out a warning message that these two IO options only work with Y only decomposition
  • Jesse and Bo reported all UPP subroutines have been decomposed. The only remaining issue is dealing with computing vorticity across poles and doing pole averaging at poles.
  • George will continue to work on updating UPP to compute Y derivative across poles.
  • George suggested for Pole average, first do a gather to a local array, and then do averaging, and then broadcast the averaged value. He will work with Jesse to make this update.

@BoCui-NOAA
Copy link
Contributor

@JesseMeng-NOAA Please check the following two routines and lines and see if (i,j,jj) would be replaced by (i,ii,j,jj).

MISCLN.f:4177:! $omp parallel do private(i,j,jj)

SURFCE.f:463:!$omp parallel do private(i,j,jj)
SURFCE.f:913:!$omp parallel do private(i,j,jj)
SURFCE.f:1738:!$omp parallel do private(i,j,jj)
SURFCE.f:1966:!$omp parallel do private(i,j,jj)
SURFCE.f:2334:!$omp parallel do private(i,j,jj)
SURFCE.f:2607:!$omp parallel do private(i,j,jj)
SURFCE.f:2724:!$omp parallel do private(i,j,jj)
SURFCE.f:6083:!$omp parallel do private(i,j,jj)
SURFCE.f:463:!$omp parallel do private(i,j,jj)
SURFCE.f:913:!$omp parallel do private(i,j,jj)
SURFCE.f:1738:!$omp parallel do private(i,j,jj)
SURFCE.f:1966:!$omp parallel do private(i,j,jj)
SURFCE.f:2334:!$omp parallel do private(i,j,jj)
SURFCE.f:2607:!$omp parallel do private(i,j,jj)
SURFCE.f:2724:!$omp parallel do private(i,j,jj)
SURFCE.f:6083:!$omp parallel do private(i,j,jj)

@JesseMeng-NOAA
Copy link
Contributor

JesseMeng-NOAA commented Dec 7, 2021

Yes all should be (i,ii,j,jj). Thanks for double checking. I will update those.

@JesseMeng-NOAA Please check the following two routines and lines and see if (i,j,jj) would be replaced by (i,ii,j,jj).

MISCLN.f:4177:! $omp parallel do private(i,j,jj)

SURFCE.f:463:!$omp parallel do private(i,j,jj) SURFCE.f:913:!$omp parallel do private(i,j,jj) SURFCE.f:1738:!$omp parallel do private(i,j,jj) SURFCE.f:1966:!$omp parallel do private(i,j,jj) SURFCE.f:2334:!$omp parallel do private(i,j,jj) SURFCE.f:2607:!$omp parallel do private(i,j,jj) SURFCE.f:2724:!$omp parallel do private(i,j,jj) SURFCE.f:6083:!$omp parallel do private(i,j,jj) SURFCE.f:463:!$omp parallel do private(i,j,jj) SURFCE.f:913:!$omp parallel do private(i,j,jj) SURFCE.f:1738:!$omp parallel do private(i,j,jj) SURFCE.f:1966:!$omp parallel do private(i,j,jj) SURFCE.f:2334:!$omp parallel do private(i,j,jj) SURFCE.f:2607:!$omp parallel do private(i,j,jj) SURFCE.f:2724:!$omp parallel do private(i,j,jj) SURFCE.f:6083:!$omp parallel do private(i,j,jj)

@HuiyaChuang-NOAA
Copy link
Collaborator

@JesseMeng-NOAA @BoCui-NOAA @GeorgeVandenberghe-NOAA @WenMeng-NOAA @fossell
Summary from 12/21/2021 2D decomposition tag-up:

  1. Jesse reported the team is done with 2D decomposition work, thanks to George's recent work to update EXCH to carry our derivatives across the poles.
  2. Jesse updated UPP so only GFS runs with 2D decomposition.
  3. Wen is planning to unify initialization for GFS and RRFS when using parallel NetCDF read. When that's completed, both RRFS and GFS can run with 2D decomposition. However, she needs someone from modeling side to unify diag table first.
  4. Jesse and Bo both ran UPP regression tests with numx=1 for Y decomposing only and the 2D decomposition branch re-produced all regression test output. They also ran with numx>1 for GFS and confirmed output reproduced as well.

Action items:

  1. George will work on cleaning up EXCH.f and document his work
  2. Jesse and Bo will work on a user guide on how to run UPP with 2D decomposition
  3. Kate mentioned Tracy is working on adding Doxygen to UPP and can work with George to document his work
  4. Huiya will add to agenda for 1/4/2022 2D decomposition meeting to discuss release plan for 2D decomposed UPP

@JesseMeng-NOAA
Copy link
Contributor

Tested George's fix for EXCH.f and identical results were reproduced. Code pushed to github.
A developer's guide for testing UPP GFS 2D Decomposition has been drafted,
https://docs.google.com/document/d/1jvky8T3S7uo7z_Y_hHDj9Ak0DgmkPI75Q8RcqAazeno

@HuiyaChuang-NOAA
Copy link
Collaborator

Summary from 1/4/2022 decomposition meeting:
. DTC will run their version of EMC's version's UPP regression tests on 2D decomposition branch and report their findings at next tag-up
. George will do a quick clean-up and hand off his code to Jesse to commit (done)
. Jesse will work on a quick user guide on running UPP with 2D decomposition (done)
. Wen will work on syncing 2D decomposition branch to develop
. George will continue to clean up and document.
. Wen would like 2D decomposition branch to be tested on Jet and Orion as GSL runs on Jet and HAFS runs on Orion. The plan is:
a) Wen will provide Jesse and Bo a script to run UPP on Orion.
b) Jesse and Bo will test 2D decomposition branch on Orion.
c) After DTC verified their testing is positive, Huiya will reach out to GSL to test 2D decomposition branch on Jet.
. The group discussed areas where EPIC infrastructure team can help enhance UPP:
a) Short-term: updating inline post to use and test 2D decomposed UPP and unification of RRFS and GFS UPP interfaces
b) Long term: adding bufr sounding to inline post

@HuiyaChuang-NOAA
Copy link
Collaborator

Summary from 1/18 2D decomposition tag-up

  1. George cleaned up EXCH and MPI_FIRST subroutines and will send them to Bo and Jesse to run UPP regression tests with.
  2. Bo and Jesse will run the above-mentioned tests and record timing and will send their findings to the group.
  3. DTC tested 2D decomposition branch in their own regression tests and received bit identical results. However, since they only use serial NetCDF read for GFS, their tests only involve Y only decomposition.
  4. DTC is working on adding GFS parallel NetCDF read into their regression tests and will update EMC on their progress later this week.
  5. George will continue to work on documentaion
  6. Huiya will email Jun about possibility of her team testing 2D decomposition branch in in-line post.

@HuiyaChuang-NOAA
Copy link
Collaborator

Summary from 2/15 2D decomposition meeting:

  1. Jesse and Bo have been updating UPP inline interfaces to do 2D decomposition. They encountered issues running inline post using 2D decomposition branch with Y decomposition only. They're getting Jun's help to debug.
  2. George was working on wcoss2 issues.

@JesseMeng-NOAA
Copy link
Contributor

The interface between UPP/post_2d_decomp and ufs-weather-model/FV3 inline post has been developed for both gfs and regional models.

To test this functionality,
First build the regression test bases with the ufs-weather-model/develop branch by running ufs-weather-model/tests/rt.sh with only the two controls defined in rt.conf
control
regional_control

Then run the UPP/post_2d_decomp regression test.
Under the ufs-weather-model test directory, checkout Jesse's fv3atm/upp_2d_decomp branch
https://github.com/JesseMeng-NOAA/fv3atm/tree/upp_2d_decomp
and Wen's UPP/upp_2d_decomp branch
https://github.com/WenMeng-NOAA/UPP/tree/post_2d_decomp

Turn on the WRITE_DOPOST flag in
ufs-weather-model/tests/tests/control_2dwrtdecomp
ufs-weather-model/tests/tests/regional_control_2dwrtdecomp
export WRITE_DOPOST=.true.
and run ufs-weather-model/tests/rt.sh with only the two 2dwrtdecomp experiments defined in rt.conf
control_2dwrtdecomp
regional_control_2dwrtdecomp

More details can be found in this menu,
https://docs.google.com/document/d/15whXaBCQzUmJUJa1TBCTrFUsZYpebSmqVwexbFcAfaw/edit

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

GeorgeVandenberghe-NOAA commented Mar 1, 2022 via email

@HuiyaChuang-NOAA
Copy link
Collaborator

I started the documentation after breaking from continuing WCOSS crises. Below is a very high level schematic. Information on the actual variables changed will follow later. ! The 1D decomposition can read state from a model forecast file, either by reading on rank 0 ! and scattering, or by doing MPI_IO on the model history file using either nemsio, sigio, ! or netcdf serial or parallel I/O. Very old ! post tags also implement the more primitive full state broadcast or (a ! performance bug rectified 10/17) read the entire state on all tasks. This ! is mentioned in case a very old tag is encountered. The 2D decomposition ! only supports MPI_IO for the general 2D case but all I/O methods remain ! supported for the 1D special case of the 2D code. This 1D special case ! works for all cases currently supported by older 1D tags and branches. ! ! to repeat, ONLY 2D NETCDF PARALLEL I/O WILL BE SUPPORTED FOR THE ! GENERAL CASE OF 2D DECOMPOSITION. ! ! **************************** 2D design enhancements ************************ ! ! ! ! The 2D decomposition operates on subdomains with some latitudes and ! some longitudes. The subdomains are lonlat rectangles rather than ! strips. This means state must be chopped into pieces in any ! scatter operation and the pieces reassembled in any gather ! operation that requires a continuous in memory state. I/O and halo ! exchanges both require significantly more bookkeeping. ! ! ! The structural changes needed for the 2D decomposition are ! implemented in MPI_FIRST.f and CTLBLK.f! CTLBLK.f contains ! numerous additional variables describing left and right domain ! boundaries. Many additional changes are also implemented in EXCH.f ! to support 2D halos. Many additional routines required addition of ! the longitude subdomain limits but changes to the layouts are ! handled in CTLBLK.f and the "many additional routines" do not ! require additional changes when subdomain shapes are changed and ! have not been a trouble point. ! ! ! Both MPI_FIRST and EXCH.f contain significant additional test code ! to exchange arrays containing grid coordinates and ensure EXACT ! matches for all exchanges before the domain exchanges are ! performed. This is intended to trap errors in the larger variety of ! 2D decomposition layouts that are possible and most of it can ! eventually be removed or made conditional at build and run time. ! !

On Tue, Mar 1, 2022 at 8:57 AM WenMeng-NOAA @.> wrote: Assigned #274 <#274> to @GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA. — Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FTM3EGKKBARI6RLLCDU5YO5NANCNFSM4YVYMD5Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were assigned.Message ID: @.>
-- George W Vandenberghe IMSG at NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 @.*** 301-683-3769(work) 3017751547(cell)

Thank you, George. This high level documentation looks good. Looking forward to your documentation on variables you added/updated and their descriptions

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

@fossell
Copy link
Contributor

fossell commented Mar 15, 2022

@GeorgeVandenberghe-NOAA - Is this second piece of documentation you posted going to go into the same overview document as the previous schematic text you provided? I'm mocking up some pages to be included in the formal documentation, so just wanted to check.

@GeorgeVandenberghe-NOAA
Copy link
Contributor Author

@HuiyaChuang-NOAA
Copy link
Collaborator

Summary from 3/15/2022 2D decomposition meeting:

  1. Bo ran some more tests with 2D decomposition branch with extreme case scenario such as no y decomposition and post failed. Jesse added condition to reset numx to 1 if numx>num_proc/2
  2. Jesse fixed issues Bo discover. He also opened an issue and submitted a PR on UFS Github. He's waiting for Jun to let him know what's next.
  3. Wen said Jun will ask people to test Jesse's PR. She ran Jesse's UFS PR but did not generate bit identical results. Jesse will help her figure out why.
  4. George is working on documenting variables he added to do 2D decomposition.
  5. Kate will help George use doxygen to add George's documentaion.
  6. George would like someone not as familiar with 2D decomposition code to read his documentation. Huiya volunteered.
  7. Huiya will email Jun to get timeline on when her team will test Jesse's PR as UPP re-engineering needs to finish by the end of May.
  8. Huiya will ask Shelley to add UPP re-engineering tasks for Jesse and Bo for Q3FY22

@WenMeng-NOAA
Copy link
Collaborator

@HuiyaChuang-NOAA Since Jesse's UFS PR is available on UFS repository, you might invite model developers from GFS, RRFS, and HAFS to test this upp 2d decomposition capability. @junwang-noaa may chime in.

@WenMeng-NOAA
Copy link
Collaborator

The UPP standalone regression tests failed at RRFS and HAFS due to syncing PR #441. The regional FV3 read interface INITPOST_NETCDF.f wasn't updated with 2D decomposition. The workaround is turning off RRFS and HAFS in your UPP RT tests. I would fix the issue after PR #453 committed (Unify regional and global FV3 read interfaces).

@HuiyaChuang-NOAA
Copy link
Collaborator

Summary from 3/29/2022 meeting:

  1. Wen is planning to commit her PR to unify stand-alone UPP interfaces to develop this or next week.
  2. Bo will then merge this change to 2D decomposition branch and resolve the conflicts.
  3. Jesse will continue to resolve issues with 2D decomposed UPP interface in regional UFS regression tests.
  4. Kate has a PR to document George's 2D decomposition documentation using doxygen
  5. George will review Kate's PR
  6. Huiya will review George's documentation

@HuiyaChuang-NOAA
Copy link
Collaborator

@JesseMeng-NOAA @BoCui-NOAA @WenMeng-NOAA @GeorgeVandenberghe-NOAA @fossell
Summary from 4/11/2022 tag-up:

  1. Jesse fixed a bug in the regional UPP inline interface which contributed to issues with his earlier regional UFS RTs.
  2. Jesse would like to have someone else conduct RTs using 2D decomposition branch to confirm all RTs are passed
  3. Bo is working on merging all latest UPP develop changes to 2D decomposition branch. She's a bit concerned about just accepting "auto merge". She's working on resolving conflicts.
  4. Wen thinks "auto merge" mentioned by Bo above should be safe. She suggested starting with her latest unified FV3 interface and apply 2D decomposition on this subroutine; instead of trying to resolve all conflicts.
  5. Wen is working on an unified inline UPP interface.
  6. Huiya spoke to NCAR GTG group about running GTG inline but the consequence it more people will be able to access GTG code. NCAR didn't like the ideas so GTG will still be run in stand-alone mode in GFS V17.

Action items

  1. Huiya will ask GFS, RRFS, and HALF implementation managers to test 2D decomposition branch per Wen's suggestions
  2. Jesse will send Huiya instructions on how to test 2D decomposition branch when running UPP inline
  3. George will review Huiya's comments
  4. Kate will provide George with doxygen output
  5. Huiya will email Rahul that GTG still needs to be generated in stand-alone post in GFS V17.

EricJames-NOAA pushed a commit to EricJames-NOAA/UPP that referenced this issue Dec 14, 2022
…es and add support for global/regional grib2 functionality in chgres_cube (NOAA-EMC#274)

* Remove all references to /lfs3 on Jet

* Add Ben and Ratko to the CODEOWNERS file

* Replace hard-coded make_orog module file with build-level module file in UFS_UTILS

* Remove hard-coded make_sfc_climo module file

* Add changes for merged chgres_cube code

* Add changes for merged chgres_cube code

* Minor tweak to FCST_LEN_HRS default in config.community.sh

* Changes to make the release version of chgres_cube run in regional_workflow

* Changes for regional_grid build on Jet

* Changes to regional_grid build for Hera

* Change regional_grid makefile for hera

* Remove leading zero from FCST_LEN_HRS in config.community.sh

* Remove /sorc directory

* Remove build module files for codes originally in the regional_workflow repository.  Remove run-time make_grid module file for all platforms.  Will be sourced from UFS_UTILS from now on.

* Update regional grid template for newest code

* Copy make_grid module file from UFS_UTILS

* Add make_grid.local file for most platforms

* Remove alpha and kappa parameters from the regional_grid namelist

* Modifications to file type conventions in the chgres_cube namelist for FV3GFS and GSMGFS nemsio files

* Set convert_nst=False for global grib2 FV3GFS files when running chgres_cube

* Add tracers back into nemsio file processing

* Changes to the make_lbcs ex-script (remove all surface-related variables)

* Fix for modulefiles

* Fixes after merging authoritative repo into fork

* Add Thompson climo to chgres_cube namelist for appropriate external model/SDF combinations

* Commit new locations for Thompson climo fix file

* Change FIXsar to FIXLAM

* Change gfs_bndy.nc to gfs.bndy.nc

* Move file

* Bug fixes to setup.sh and exregional_make_ics.sh

* Add support for NAM grib2 files

* Path fix

* Typo fix

* Fix extension on UPP grib2 files

* Bug fix for if statement

* Add .grib2 extension to soft links

* Fix nsoill_out values based on LSM scheme in CCPP suite

* Fix grib2 extensions

* Add if statement for varmap tables when using Thompson MP and initializing from non-RAP/HRRR data

* Final modifications to support NAM grib2 files in regional_workflow

* Set climo as default for soil variables when using HRRRX (user will need change this if they know these variables are available for the dates they are running).

* Add FV3_CPT_v0 to varmap if statement

* Changes to post file names to make model lowercase and move ${fhr} to three values instead of two

* Change "rrfs" to "${NET}" instead

* Revert "Add FV3_CPT_v0 to varmap if statement"

This reverts commit b04ad0b3c8c554f664c6790030a4f33b5a395023.

* Add HALO_BLEND back into config_defaults.sh

* Set do_deep=false for RRFS_v1beta and other RAP/HRRR suites

* Fix if statement for lsoil variable in generate step.

* Remove link to fixed file in UFS_UTILS source code directory, now points to FIXgsm

* Typo in config_defaults.sh and remove unused SDFs from the make_ics/lbcs scripts, from Gerard's PR

* Add FV3_CPT_v0 to list of SDFs that use the GFS varmap table

* Remove wgrib2 module load from local make_ics/lbcs files

* Use correct system calls for setting up Cheyenne python environment now that these are proper modulefiles

Co-authored-by: Michael Kavulich, Jr <kavulich@ucar.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

9 participants