-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global workflow v16.3.7 will not build on WCOSS2 #1812
Comments
Changing the order of module loads in
yields a rc=0 execution of I do not know why moving the w3nco load makes a difference. Is the above behavior known and expected? @WalterKolczynski-NOAA , who is the tropcy code manager? We should assign this issue to him/her for resolution. |
I think it is @JiayiPeng-NOAA |
Also, w3nco may not even be needed anymore. |
Thanks @WalterKolczynski-NOAA . @JiayiPeng-NOAA needs to fix the tropcy build and test it before g-w staff cut a gfs.v16.3.8 tag for NCO to pick up. |
… On Mon, Aug 21, 2023 at 10:58 AM Walter Kolczynski - NOAA < ***@***.***> wrote:
I think it is @JiayiPeng-NOAA <https://github.com/JiayiPeng-NOAA>
—
Reply to this email directly, view it on GitHub
<#1812 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALRTVNONJRC7GZVIUIZZUFTXWNZSNANCNFSM6AAAAAA3YO6FZU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@JiayiPeng-NOAA at the bottom of every email is a link that says "view it on GitHub". |
Hi Russ,
What is "tropcy"?
Thanks,
Jiayi
…On Mon, Aug 21, 2023 at 11:03 AM RussTreadon-NOAA ***@***.***> wrote:
Thanks @WalterKolczynski-NOAA <https://github.com/WalterKolczynski-NOAA>
. @JiayiPeng-NOAA <https://github.com/JiayiPeng-NOAA> needs to fix the
tropcy build and test it before g-w staff cut a gfs.v16.3.8 tag for NCO to
pick up.
—
Reply to this email directly, view it on GitHub
<#1812 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALRTVNI4AHQPU6FUWGHWD6DXWN2CPANCNFSM6AAAAAA3YO6FZU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@JiayiPeng-NOAA , @ADCollard found a problem with the operational build for tropcy. This issue (#1812) documents the problem. We had to move the w3nco load after nemsiogfs in order to get tropcy to build. Is this known behavior? |
@RussTreadon-NOAA @ADCollard |
@aerorahul That gets sourced in sorc/machine-setup.sh |
@WalterKolczynski-NOAA @aerorahul I can confirm that Russ's solution allows the build to proceed. If I put in a PR for this change to be added to https://github.com/NOAA-EMC/global-workflow/tree/release/gfs.v16.3.8, can we get this version tagged? This is getting time-critical and I don't think we are going to find a definitive solution soon. We can advise NCO not to do a full re-build as none of the changes are related to code (two fix file, one script). |
So, I commented out the sorc [EMC-v16.3.7|✚ 2]
11:15 $ git diff
diff --git i/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua w/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
index 33cd59f0..7a03b195 100755
--- i/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
+++ w/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
@@ -12,7 +12,7 @@ load(pathJoin("libpng", os.getenv("libpng_ver")))
load(pathJoin("zlib", os.getenv("zlib_ver")))
load(pathJoin("bacio", os.getenv("bacio_ver")))
index 33cd59f0..7a03b195 100755
--- i/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
+++ w/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
@@ -12,7 +12,7 @@ load(pathJoin("libpng", os.getenv("libpng_ver")))
load(pathJoin("zlib", os.getenv("zlib_ver")))
load(pathJoin("bacio", os.getenv("bacio_ver")))
-load(pathJoin("w3nco", os.getenv("w3nco_ver")))
+--load(pathJoin("w3nco", os.getenv("w3nco_ver")))
load(pathJoin("nemsio", os.getenv("nemsio_ver")))
load(pathJoin("nemsiogfs", os.getenv("nemsiogfs_ver")))
load(pathJoin("sigio", os.getenv("sigio_ver")))
diff --git i/sorc/build_tropcy_NEMS.sh w/sorc/build_tropcy_NEMS.sh
index 0e96cfcc..08d2cd13 100755
--- i/sorc/build_tropcy_NEMS.sh
+++ w/sorc/build_tropcy_NEMS.sh
@@ -13,7 +13,9 @@
#
set -eux
+set +x
source ./machine-setup.sh > /dev/null 2>&1
+set -x
cwd=`pwd`
# Check final exec folder exists
@@ -21,23 +23,30 @@ if [ ! -d "../exec" ]; then
mkdir ../exec
fi
+set +x
module use ${cwd}/../modulefiles
module load modulefile.storm_reloc_v6.0.0.$target
+module list
+#module show w3nco/2.4.1
+#module show nemsio
+#module avail
+set -x
+exit And then calling this script: /sorc [EMC-v16.3.7|✚ 2]
11:15 $ ./build_tropcy_NEMS.sh
+ set +x
++ pwd
+ cwd=/lfs/h2/emc/eib/noscrub/rahul.mahajan/ops/global-workflow/sorc
+ '[' '!' -d ../exec ']'
+ set +x
Currently Loaded Modules:
1) craype-x86-rome (H) 5) PrgEnv-intel/8.1.0 9) jasper/2.0.25 13) nemsio/2.5.4 17) sp/2.3.3
2) libfabric/1.11.0.0. (H) 6) craype/2.7.10 10) libpng/1.6.37 14) nemsiogfs/2.5.3 18) g2/3.4.5
3) craype-network-ofi (H) 7) intel/19.1.3.304 11) zlib/1.2.11 15) sigio/2.3.2 19) modulefile.storm_reloc_v6.0.0.wcoss2
4) envvar/1.0 8) cray-mpich/8.1.9 12) bacio/2.4.1 16) w3emc/2.9.2
Where:
H: Hidden Module
+ exit |
I can confirm that loading of I can also confirm that removing the |
This is the diff that goes with the above comment: 11:41 $
✔ /lfs/h2/emc/eib/noscrub/rahul.mahajan/ops/global-workflow/sorc [EMC-v16.3.7|✚ 2]
11:41 $ git diff
diff --git i/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua w/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
index 33cd59f0..e03d8994 100755
--- i/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
+++ w/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
@@ -14,7 +14,7 @@ load(pathJoin("zlib", os.getenv("zlib_ver")))
load(pathJoin("bacio", os.getenv("bacio_ver")))
load(pathJoin("w3nco", os.getenv("w3nco_ver")))
load(pathJoin("nemsio", os.getenv("nemsio_ver")))
-load(pathJoin("nemsiogfs", os.getenv("nemsiogfs_ver")))
+--load(pathJoin("nemsiogfs", os.getenv("nemsiogfs_ver")))
load(pathJoin("sigio", os.getenv("sigio_ver")))
load(pathJoin("w3emc", os.getenv("w3emc_ver")))
load(pathJoin("sp", os.getenv("sp_ver")))
diff --git i/sorc/build_tropcy_NEMS.sh w/sorc/build_tropcy_NEMS.sh
index 0e96cfcc..0091cc1f 100755
--- i/sorc/build_tropcy_NEMS.sh
+++ w/sorc/build_tropcy_NEMS.sh
@@ -13,7 +13,9 @@
#
set -eux
+set +x
source ./machine-setup.sh > /dev/null 2>&1
+set -x
cwd=`pwd`
# Check final exec folder exists
@@ -21,8 +23,11 @@ if [ ! -d "../exec" ]; then
mkdir ../exec
fi
+set +x
module use ${cwd}/../modulefiles
module load modulefile.storm_reloc_v6.0.0.$target
+module list
+set -x
export FC=$myFC
export JASPER_LIB=${JASPER_LIB:-$JASPER_LIBRARIES/libjasper.a}
@@ -33,7 +38,8 @@ export LIBS_SUP="${W3EMC_LIBd} ${W3NCO_LIBd}"
echo lset
echo lset
export LIBS_REL="${W3NCO_LIB4}"
-export LIBS_REL="${NEMSIOGFS_LIB} ${NEMSIO_LIB} ${LIBS_REL} ${SIGIO_LIB} ${BACIO_LIB4} ${SP_LIBd}"
+#export LIBS_REL="${NEMSIOGFS_LIB} ${NEMSIO_LIB} ${LIBS_REL} ${SIGIO_LIB} ${BACIO_LIB4} ${SP_LIBd}"
+export LIBS_REL="${NEMSIO_LIB} ${LIBS_REL} ${SIGIO_LIB} ${BACIO_LIB4} ${SP_LIBd}"
export LIBS_SIG="${SIGIO_INC}"
export LIBS_SYN_GET="${W3NCO_LIB4}"
export LIBS_SYN_MAK="${W3NCO_LIB4} ${BACIO_LIB4}" |
Hi Water,
The #1812 is related to "storm_reloc_v6.0.0", most likely for the update of
TC-vitals.
Qingfu is the POC for the issue #1812.
Thanks,
Jiayi
…On Mon, Aug 21, 2023 at 11:42 AM Rahul Mahajan ***@***.***> wrote:
This is the diff that goes with the above comment:
11:41 $
✔ /lfs/h2/emc/eib/noscrub/rahul.mahajan/ops/global-workflow/sorc [EMC-v16.3.7|✚ 2]
11:41 $ git diffdiff --git i/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua w/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua
index 33cd59f0..e03d8994 100755--- i/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua+++ w/modulefiles/modulefile.storm_reloc_v6.0.0.wcoss2.lua@@ -14,7 +14,7 @@ load(pathJoin("zlib", os.getenv("zlib_ver")))
load(pathJoin("bacio", os.getenv("bacio_ver")))
load(pathJoin("w3nco", os.getenv("w3nco_ver")))
load(pathJoin("nemsio", os.getenv("nemsio_ver")))-load(pathJoin("nemsiogfs", os.getenv("nemsiogfs_ver")))+--load(pathJoin("nemsiogfs", os.getenv("nemsiogfs_ver")))
load(pathJoin("sigio", os.getenv("sigio_ver")))
load(pathJoin("w3emc", os.getenv("w3emc_ver")))
load(pathJoin("sp", os.getenv("sp_ver")))diff --git i/sorc/build_tropcy_NEMS.sh w/sorc/build_tropcy_NEMS.sh
index 0e96cfcc..0091cc1f 100755--- i/sorc/build_tropcy_NEMS.sh+++ w/sorc/build_tropcy_NEMS.sh@@ -13,7 +13,9 @@
#
set -eux
+set +x
source ./machine-setup.sh > /dev/null 2>&1+set -x
cwd=`pwd`
# Check final exec folder exists@@ -21,8 +23,11 @@ if [ ! -d "../exec" ]; then
mkdir ../exec
fi
+set +x
module use ${cwd}/../modulefiles
module load modulefile.storm_reloc_v6.0.0.$target+module list+set -x
export FC=$myFC
export JASPER_LIB=${JASPER_LIB:-$JASPER_LIBRARIES/libjasper.a}@@ -33,7 +38,8 @@ export LIBS_SUP="${W3EMC_LIBd} ${W3NCO_LIBd}"
echo lset
echo lset
export LIBS_REL="${W3NCO_LIB4}"-export LIBS_REL="${NEMSIOGFS_LIB} ${NEMSIO_LIB} ${LIBS_REL} ${SIGIO_LIB} ${BACIO_LIB4} ${SP_LIBd}"+#export LIBS_REL="${NEMSIOGFS_LIB} ${NEMSIO_LIB} ${LIBS_REL} ${SIGIO_LIB} ${BACIO_LIB4} ${SP_LIBd}"+export LIBS_REL="${NEMSIO_LIB} ${LIBS_REL} ${SIGIO_LIB} ${BACIO_LIB4} ${SP_LIBd}"
export LIBS_SIG="${SIGIO_INC}"
export LIBS_SYN_GET="${W3NCO_LIB4}"
export LIBS_SYN_MAK="${W3NCO_LIB4} ${BACIO_LIB4}"
—
Reply to this email directly, view it on GitHub
<#1812 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALRTVNN5TC2M6DPPBFNLKYDXWN6XVANCNFSM6AAAAAA3YO6FZU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Qingfu-Liu Can you please take a look at the comments from @RussTreadon-NOAA and @aerorahul to address build issues with the tropcy programs Specifically Please let us know which solution would you prefer. We need to provide a tag to NCO asap. |
The nemsio version thing isn't something new for this release, so let's not pull on that thread here. Let's just fix the w3nco blocker for this release. @ADCollard One the release branch is finalized and tested to work, I will create a tag. |
gfsnemsio is a wrapper to call nemsio, tailored for the I/O of GFS NEMSIO version. It is probably the right time to clean up the storm_reloc code, and any other code, to remove dependences on nemsio and gfsnemsio. (Sorry, Walter, for pulling on this). |
@yangfanglin This needs to get to NCO to be implemented in 8 days. It can be fixed in develop, or even a future v16 release. Unless something is actually broken, we should leave it. |
@WalterKolczynski-NOAA OK. Once @Qingfu-Liu has replied to @aerorahul with regards to which solution we should follow I will issue the new PR. Testing obviously will have to wait until this evening due to Cactus being unavailable unless @Qingfu-Liu can suggest a standalone test we can run on Acorn, for example. |
@aerorahul <https://github.com/aerorahul> I do not have much background on
the computer library, I mainly worked on the FORTRAN code and the related
scripts. There are no changes for the last several years for the FORTRAN
code and the related scripts. I searched my recent checkout for the
workflow (different versions), and did not find the
script build_tropcy_NEMS.sh. So I am following this thread and checking out
a new version on HERA, and will run the build to see what happens.
…On Mon, Aug 21, 2023 at 1:13 PM Andrew Collard ***@***.***> wrote:
@WalterKolczynski-NOAA <https://github.com/WalterKolczynski-NOAA> OK.
Once @Qingfu-Liu <https://github.com/Qingfu-Liu> has replied to @aerorahul
<https://github.com/aerorahul> with regards to which solution we should
follow I will issue the new PR. Testing obviously will have to wait until
this evening due to Cactus being unavailable unless @Qingfu-Liu
<https://github.com/Qingfu-Liu> can suggest a standalone test we can run
on Acorn, for example.
—
Reply to this email directly, view it on GitHub
<#1812 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGTS6UTV6MN7BL4YGPR55PLXWOJJRANCNFSM6AAAAAA3YO6FZU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Qingfu-Liu there is probably no reason to test this on Hera as the build does not fail there. This is a WCOSS2 issue. |
I checked out the workflow on HERA following the command:
git clone -b EMC-v16.3.7
https://github.com/NOAA-EMC/global-workflow.git
global_workflow_v16.3.7
cd global_workflow_v16.3.7/sorc
./checkout.sh
I am able to successfully compiled the code using the command:
./build_tropcy_NEMS.sh
There are six executables created:
…-rwxr-xr-x 1 Qingfu.Liu global 1093904 Aug 21 13:44 supvit
-rwxr-xr-x 1 Qingfu.Liu global 910688 Aug 21 13:44 syndat_getjtbul
-rwxr-xr-x 1 Qingfu.Liu global 977008 Aug 21 13:44 syndat_maksynrc
-rwxr-xr-x 1 Qingfu.Liu global 1868632 Aug 21 13:44 syndat_qctropcy
-rwxr-xr-x 1 Qingfu.Liu global 2666496 Aug 21 13:44 tave.x
-rwxr-xr-x 1 Qingfu.Liu global 2713728 Aug 21 13:44 vint.x
On Mon, Aug 21, 2023 at 1:31 PM Qingfu Liu - NOAA Federal <
***@***.***> wrote:
@aerorahul <https://github.com/aerorahul> I do not have much background
on the computer library, I mainly worked on the FORTRAN code and the
related scripts. There are no changes for the last several years for the
FORTRAN code and the related scripts. I searched my recent checkout for the
workflow (different versions), and did not find the
script build_tropcy_NEMS.sh. So I am following this thread and checking out
a new version on HERA, and will run the build to see what happens.
On Mon, Aug 21, 2023 at 1:13 PM Andrew Collard ***@***.***>
wrote:
> @WalterKolczynski-NOAA <https://github.com/WalterKolczynski-NOAA> OK.
> Once @Qingfu-Liu <https://github.com/Qingfu-Liu> has replied to
> @aerorahul <https://github.com/aerorahul> with regards to which solution
> we should follow I will issue the new PR. Testing obviously will have to
> wait until this evening due to Cactus being unavailable unless
> @Qingfu-Liu <https://github.com/Qingfu-Liu> can suggest a standalone
> test we can run on Acorn, for example.
>
> —
> Reply to this email directly, view it on GitHub
> <#1812 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AGTS6UTV6MN7BL4YGPR55PLXWOJJRANCNFSM6AAAAAA3YO6FZU>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
I am not able to login WCOSS2 to test those now. My guess is that the fail is related to the new software or libraries on WCOSS2, and I do not have any ideals about the change, and need helps from some experts having the knowledge on the software changes on WCOSS2. |
Qingfu, you have devonprod access.
Are devonprod users permitted to log onto the production machine to compile code? |
I logged onto Dogwood, cloned g-w branch @Qingfu-Liu, would you please identify the approach to pass to NCO and confirm that the modified build does not alter gfs/gdas tropcy results. |
@RussTreadon-NOAA I am looking the changes, but not sure what the differences between those two. I feels the nemsiogfs should be loaded before the w3nco. I am doing more research on this |
I might have data on Cactus, but I can't login now. I will try late today to see if I have files on Cactus to compare |
@RussTreadon-NOAA I looked the script change and the executables produced, the generated executables are the same for both cases. The library NEMSIOGFS is used for relocation program, so it is no longer necessary to be there. So both changes are good. |
The If GFSv16.3.x depends on |
@aerorahul the easiest way to fix this is to change the library order as @RussTreadon-NOAA suggested |
It may be easy, but it alters the library versions of |
So the root of this q. is why does load("nemsio")
prereq("nemsio") These lines forces 15:26 $ ls -lrt /apps/ops/prod/libs/intel/19.1.3.304/cray-mpich/8.1.4/nemsio
total 8
drwxr-sr-x 5 ops.prod prod 2048 Oct 14 2021 2.5.2
lrwxrwxrwx 1 ops.prod prod 24 Aug 15 11:44 2.5.4 -> ../../8.1.9/nemsio/2.5.4 On Aug 15, 2023 NCO (or someone) installed a new version of If we want to keep the package same, NCO can remove the two lines from |
To make sure that there is no ✔ /lfs/h2/emc/eib/noscrub/rahul.mahajan/ops/global-workflow/sorc [EMC-v16.3.7|✚ 2]
17:45 $ grep -ir nemsio_gfs *.fd/*
✘-1 /lfs/h2/emc/eib/noscrub/rahul.mahajan/ops/global-workflow/sorc [EMC-v16.3.7|✚ 2] The public interfaces for There was no hit confirming that nothing in the tropcy programs depend on |
Here is my suggestion:
|
For some reason the sorc/build_tropcy_NEMS.sh no longer works on WCOSS2 machines. @RussTreadon-NOAA and @aerorahul found fixes that allow the build to complete. This PR implements one of these fixes - removing the references to the NEMSIOGFS module, which is being loaded but does not have a version number attached. Six executables result from this build as expected. They should still be tested for functionality. This references #1812.
The current operational version of the GFS workflow is not building correctly.
Results in:
modulefile.storm_reloc_v6.0.0.wcoss2
will load with a clean login environment, just not in the./build_tropcy_NEMS.sh
script or when./machine-setup.sh
is run first.The text was updated successfully, but these errors were encountered: