-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cam6_3_129: Update externals to match cesm2_3_alpha16d and bring in test_driver which supports derecho #888
cam6_3_129: Update externals to match cesm2_3_alpha16d and bring in test_driver which supports derecho #888
Conversation
…am_update_ext_test_driver
…am_update_ext_test_driver
…pdate_ext_test_driver
@nusbaume - This should have all of Chris' updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one change requested based off a previous conversation.
test/system/test_driver.sh
Outdated
@@ -551,6 +618,10 @@ if [ "${cesm_test_suite}" != "none" -a -n "${cesm_test_mach}" ]; then | |||
chey* | r* ) | |||
testargs="${testargs} --queue ${CAM_BATCHQ} --test-root ${cesm_testdir} --output-root ${cesm_testdir}" | |||
;; | |||
# derecho | |||
derc* | dec* ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be derec*
based off the statement here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committing that change bit both Chris and I. It is now committed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now, thanks!
@jtruesdal - I have a failing test: /glade/scratch/cacraig/ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3.20230925_121636_y2r02w/bld/atm.bldlog.230925-123421 This test is failing with an error message about no rule to make libfms.a which is needed by libfv3core.a. Up until this update of externals, we have been holding the FMS external frozen, but it appears that that is no longer going to work. Any ideas on how to solve this? Do we just break FV3 with CAM checkouts with this tag until FV3 is brought in, or do you have a solution I should apply? |
I think this can be fixed with a change to buildlib under cime_config and am looking into it. |
@cacraigucar It turns out the fix is a little more involved since the external updates for this PR include changes to the way the fms library is built. To allow the older FMS/FV3 code that cam is currently using to compile under the latest set of externals would require changes to 4 files. cime - Makefile The cime and CAM changes would also have been applied when we bring FV3/fms versions up to what CESM is using. We could apply this part of the fms/FV3 update now, the only caveat is that we need a new cime tag for the cime/Makefile change.
|
@jtruesdal - I vote for #3. The main purpose of this PR was to bring in all of the externals. I am willing to wait for a new cime tag and try this PR again at that point. |
@cacraigucar I found a workaround that just requires changes to cime_config/buildlib which will allow the older version of fms/fv3 to be built and the failing regression tests to pass. I can update the cime component Makefile and cime_config/buildlib files when bringing fms/FV3 up to the current cesm version. Can I commit the mods to your cam_update_ext_test_driver branch for review or would you like the file update another way? |
Yes, please update my branch with the change. If you don’t have permission,
go ahead and open a PR to my branch and I can bring it in that way as well.
On Wed, Sep 27, 2023 at 8:19 AM jtruesdal ***@***.***> wrote:
@cacraigucar <https://github.com/cacraigucar> I found a workaround that
just requires changes to cime_config/buildlib which will allow the older
version of fms/fv3 to be built and the failing regression tests to pass. I
can update the cime component Makefile and cime_config/buildlib files when
bringing fms/FV3 up to the current cesm version. Can I commit the mods to
your cam_update_ext_test_driver branch for review or would you like the
file update another way?
—
Reply to this email directly, view it on GitHub
<#888 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACRX2XZWEXMLRFLYNC5YJKTX4QYVNANCNFSM6AAAAAA5BXAVSQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Cheryl Craig ***@***.***
NCAR Voice: (303) 497-2901
P.O. Box 3000 Fax: (303) 497-1324
Boulder, CO 80307
|
… cam was responsible for building libfms
import sys, os, filecmp, shutil, imp | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtruesdal - I am getting the following error:
File "/glade/u/home/cacraig/cam_update_ext_test_driver/cime_config/buildlib", line 128, in
_main_func()
File "/glade/u/home/cacraig/cam_update_ext_test_driver/cime_config/buildlib", line 122, in _main_func
_build_cam(caseroot, libroot, bldroot)
File "/glade/u/home/cacraig/cam_update_ext_test_driver/cime_config/buildlib", line 92, in _build_cam
libfms = os.path.join(fmsbuilddir, "libfms.a")
UnboundLocalError: local variable 'fmsbuilddir' referenced before assignment
cime_config/buildlib
Outdated
libfms = os.path.join(fmsbuilddir, "libfms.a") | ||
expect(os.path.isfile(libfms), | ||
"err: Missing the FMS library libfms.a - cannot complete CAM build") | ||
shutil.copy(libfms, libroot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this section be put inside an if cam_dycore == "fv3":
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtruesdal - I went ahead and put this in an if block and it appears to be working. I have asked for a review from you to confirm this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cacraigucar Yes. I forgot this is run for all dycores but only needed for fv3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching and fixing that error.
@jtruesdal - This regression is still failing: /glade/scratch/cacraig/aux_cam_20230927145521/ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3.GC.aux_cam_20230927145521 The error message is: |
@jedwards4b when determining whether to use 'threads' or 'nothreads' in the in the shared library path, the code you provided for PR #808 checked the case environment variable BUILD_THREADED to decide between the two options. In the failing regression test mentioned by Cheryl, the library path being built includes 'threaded' even though BUILD_THREADED is false. I do see that SMP_PRESENT is set TRUE for this case denoting that at least one component is multithreaded although as far as I can tell all components seem to be using 1 thread. Should BUILD_THREADED and SMP_PRESENT should be consistent? In the ERP test failure mentioned above they are not. I've verified that this is the case in older regression test suites for aux_cam so this behavior has been present well before this PR. If this is a bug I could commit a workaround for this PR that checks for BUILD_THREADED or SMP_PRESENT to determine whether to use 'threads' or 'nothreads' in the path name. |
@jtruesdal This is a feature of the ERP test. https://github.com/ESMCI/cime/blob/master/CIME/data/config/config_tests.xml#L263 |
@cacraigucar I have a fix for cams buildlib - can I push it to your branch? |
Thanks Jim! |
That change did not impact the failing test. I still get the error message about the FMS library not being found. I also have a couple of other failing tests that I am still looking into with the error message: MPI error (MPI_File_write_al_all) : I/O error |
@cacraigucar can you point me to the case that can't find libfms.a |
The case is: The source code (with your change) is at: |
@cacraigucar I tried again here: /glade/scratch/jedwards/ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3.20230929_102509_ua4p8l it works for me and I can't spot the difference - can you? My source is in /glade/work/jedwards/sandboxes/CAM |
@jedwards4b - I tried a single test and it worked fine for me as well. I have started up a the full suite testing again, but it will be awhile until it gets to this point. I'm wondering if there could be an incompatibility between the SHAREDLIB_BUILD and this test? I won't know until my testing gets to this point again. |
@jedwards4b and @jtruesdal - I did confirm that when I run with the full regression test suite, it fails to find the compiled FMS library (which appears to be there, but I guess it can't find it). When I run the single test (test_cac in the testlist_cam.xml) it runs fine. The two case directories to compare: /glade/scratch/cacraig/aux_cam_20230929122120/ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3.GC.aux_cam_20230929122120 /glade/scratch/cacraig/test_cac_20230929113325/ERP_Ln9_Vnuopc.C96_C96_mg17.F2000climo.cheyenne_intel.cam-outfrq9s_mg3.GC.test_cac_20230929113325/ |
I've made a change to print the missing file and path so I should be able to see what is going on. I'll report back. |
The test which is failing with the missing FMS library is the only one which is holding back this tag. (That and awaiting review by @nusbaume). The tests with the MPI I/O error appear to have just been a transitory failure as they are now passing without any change to the source code. |
@JEdwards @cacraig I see what the problem is. There is a common sharedlibroot for the aux_cam regression testing where the fms library should be built but it is being built in the sharedrootlib directory of the ERP case. Thats why it works when executing that one case directly and not via a set of regression tests that is supposed to use a common sharedlibroot directory. Is this an effect of us using an older FMS version with its buildlib and Makefile.cesm or is it an issue with the regression test harness not building FMS in the correct directory? |
This error just arose when I updated the externals and the changes that were provided in this PR. It also is only this one test which has this "feature". |
@jtruesdal I think that the error is https://github.com/ESCOMP/CAM/pull/888/files#diff-7ce3095aab695d30cc02c4ee2a2d78290dfa34060661daade519474eb91beac0R93 - the directory where it is looking for libfms.a is wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just a minor typo and a question related to the ChangeLog
.
… cam_update_ext_test_driver
Merge pull request ESCOMP#888 from cacraigucar/cam_update_ext_test_driver cam6_3_129: Update externals to match cesm2_3_alpha16d and bring in test_driver which supports derecho ESCOMP commit: ce55d98
Closes #879 (test_driver.sh supports derecho)
Closes #886 (fix line too long in HEMCO external)
Updates externals to match cesm2_3_alpha16d (except ccs_config is newer)
Limited review as reviews were performed in #884 before it was broken into a more limited PR