Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input-mesh fails to compile on queenbee2 #1119

Closed
wwlwpd opened this issue May 5, 2023 · 7 comments
Closed

input-mesh fails to compile on queenbee2 #1119

wwlwpd opened this issue May 5, 2023 · 7 comments
Assignees

Comments

@wwlwpd
Copy link
Collaborator

wwlwpd commented May 5, 2023

make -j 1 NETCDFPATH=/work/asgs/asgs-prod/opt NETCDF=enable NETCDF4=enable NETCDF4_COMPRESSION=enable MACHINE_NAME=queenbee compiler=intel
ifort -fpp -DINTEL -DASGSNETCDF -DASGSNETCDF -DHAVE_NETCDF4 -DNETCDF_CAN_DEFLATE -I. -I/work/asgs/asgs-prod/opt/include -L/work/asgs/asgs-prod/opt/lib -c ../../../output/logging.f90 -lnetcdff
ifort -fpp -DINTEL -DASGSNETCDF -DASGSNETCDF -DHAVE_NETCDF4 -DNETCDF_CAN_DEFLATE -I. -I/work/asgs/asgs-prod/opt/include -L/work/asgs/asgs-prod/opt/lib -c ../../../output/adcmesh.f90 -lnetcdff
ifort -fpp -DINTEL -DASGSNETCDF -DASGSNETCDF -DHAVE_NETCDF4 -DNETCDF_CAN_DEFLATE -I. -I/work/asgs/asgs-prod/opt/include -L/work/asgs/asgs-prod/opt/lib -c ../nodalattr/nodalattr.f90 -lnetcdff
ifort -fpp -DINTEL -DASGSNETCDF -DASGSNETCDF -DHAVE_NETCDF4 -DNETCDF_CAN_DEFLATE -I. -I/work/asgs/asgs-prod/opt/include -L/work/asgs/asgs-prod/opt/lib -c ../../../output/asgsio.f90 -lnetcdff
ifort -fpp -DINTEL -DASGSNETCDF -DASGSNETCDF -DHAVE_NETCDF4 -DNETCDF_CAN_DEFLATE -I. -I/work/asgs/asgs-prod/opt/include -L/work/asgs/asgs-prod/opt/lib -o boundaryFinder.x boundaryFinder.f90 adcmesh.o asgsio.o logging.o -lnetcdff
asgsio.o: In function `asgsio_mp_determinenetcdffilecharacteristics_':
asgsio.f90:(.text+0x4d2a): undefined reference to `nodalattr_mp_initnodalattributenames_'
asgsio.f90:(.text+0x4df4): undefined reference to `nodalattr_mp_nodalattributenames_'
asgsio.f90:(.text+0x51f1): undefined reference to `nodalattr_mp_nodalattributenames_'
make: *** [boundaryFinder] Error 1
command for "Step for in util/input/mesh" exited with an error code of 2, stopping. Please fix and rerun.
@wwlwpd wwlwpd changed the title wgrid2 fails to compile on queenbee2 input-mesh fails to compile on queenbee2 May 5, 2023
@wwlwpd
Copy link
Collaborator Author

wwlwpd commented May 5, 2023

Basic approach to troubleshoot is:

  • build with ./install-asgs.sh -b -x "--skip-steps input-mesh"
  • get into the shell, ./asgsh
  • start by reproducing the build error by running the failed command,

wwlwpd added a commit that referenced this issue May 6, 2023
…linking on QB.

Issue 1119: Added explicitly the required libraries during the linking phase, seems
everything was correct except this final step. It's a mystery why this was only being
exposed on queenbee2 (LONI), but this changes fixes it.

Resolves #1119.
wwlwpd added a commit that referenced this issue May 6, 2023
…linking on QB.

Issue 1119: Added explicitly the required libraries during the linking phase, seems
everything was correct except this final step. It's a mystery why this was only being
exposed on queenbee2 (LONI), but this changes fixes it.

Resolves #1119.
Resolves #1121 (PR).
@wwlwpd wwlwpd self-assigned this May 6, 2023
wwlwpd added a commit that referenced this issue May 8, 2023
…linking on QB.

Issue 1119: Added explicitly the required libraries during the linking phase, seems
everything was correct except this final step. It's a mystery why this was only being
exposed on queenbee2 (LONI), but this changes fixes it.

Resolves #1119.
Resolves #1121 (PR).
@wwlwpd wwlwpd closed this as completed in 19fef6c May 8, 2023
wwlwpd added a commit that referenced this issue May 16, 2023
…linking on QB.

Issue 1119: Added explicitly the required libraries during the linking phase, seems
everything was correct except this final step. It's a mystery why this was only being
exposed on queenbee2 (LONI), but this changes fixes it.

Resolves #1119.
Resolves #1121 (PR).

(cherry picked from commit 7b0b5d6)
Signed-off-by: wwlwpd <46434714+wwlwpd@users.noreply.github.com>
wwlwpd added a commit that referenced this issue May 25, 2023
…linking on QB.

Issue 1119: Added explicitly the required libraries during the linking phase, seems
everything was correct except this final step. It's a mystery why this was only being
exposed on queenbee2 (LONI), but this changes fixes it.

Resolves #1119.
Resolves #1121 (PR).

(cherry picked from commit 7b0b5d6)
Signed-off-by: wwlwpd <46434714+wwlwpd@users.noreply.github.com>
@wwlwpd
Copy link
Collaborator Author

wwlwpd commented Aug 11, 2023

I just tried to login to hatteras, and I don't think my login works any more. Here's how I debug everything.

Given,

prompt> ./init-asgs.sh -b -x "--list-steps"
           setup-env - Updates current environment with variables needed for subsequent steps. It only affects the environment within the asgs-brew.pl environment.
             openmpi - Downloads and builds OpenMPI on all platforms for ASGS. Note: gfortran is required, so any compiler option causes this step to be skipped.
                hdf5 - Downloads and builds the version of HDF5 that has been tested to work on all platforms for ASGS.
             netcdf4 - Downloads and builds the versions of NetCDF and NetCFD-Fortran that have been tested to work on all platforms for ASGS.
              wgrib2 - Downloads and builds wgrib2 on all platforms for ASGS. Note: gfortran is required, so any compiler option passed is overridden.
       cpra-postproc - Runs the makefile and builds associated utilities in the output/cpra_postproc directory
              output - Runs the makefile and builds associated utilities in the output/ directory.
                util - Runs the makefile and builds all associated utilities in the util/ directory.
          input-mesh - Runs the makefile and builds all associated util/input/mesh in the input-mesh/ directory.
     input-nodalattr - Runs the makefile and builds associated utilities in the util/input/nodalattr directory.
                perl - Install local Perl version used for ASGS.
        perl-modules - Install Perl modules used for ASGS.
        image-magick - Install local ImageMagick tools and Perl module Image::Magick.
             python3 - install python 3 locally and install required modules
              ffmpeg - Install ffmpeg and required libraries (nasm)
             gnuplot - Install gnuplot (commandline only)
               units - Install GNU Units utility
                 nco - Install The netCDF Operators (NCO) Toolkit
                pigz - Install pigz, unpigz - parallel gzip
              adcirc - Builds ADCIRC and SWAN if $HOME/adcirc-cg exists.

You can straight up skip breaking steps for debugging purposes. For example,

./init-asgs.sh -x "--skip-steps input-mesh"

You can skip multiple steps, via:

./init-asgs.sh -x "--skip-steps input-mesh,image-magick,..."

This is how I debug. Basically I skip whatever is breaking, then the given command build; then you can run ./asgsh and manually debug the "build" commands present in ./cloud/general/asgs-brew.pl inside of the proper environment.

Also, look at the created ./update-asgs script - it contains the full asgs-brew.pl, which lets you then run it instead of ./init-asgs.sh,

./update-asgs "flags passed to asgs-brew.pl command ..."

@wwlwpd wwlwpd reopened this Aug 11, 2023
@wwlwpd
Copy link
Collaborator Author

wwlwpd commented Aug 11, 2023

An extreme command to immediately get to the shell, with nothing built but the environment set up would be:

prompt> ./init-asgs.sh -b -x "--update-shell"
...
prompt> ./asgsh

@notstarboard
Copy link
Collaborator

notstarboard commented Aug 11, 2023

All the debugging tips are very useful. It turns out that the issue I was hitting when I ran with --force and the other TMPDIR was actually before the perl creation step, so I may have closed the other issue too early! At any rate, I'm now hitting a similar error message to the above even without the updated TMPDIR and the --force flag, so it is something I'll also have to deal with. Maybe --force works by uninstalling and reinstalling everything and something that it tried to reinstall failed? Unsure. Here's the error I'm hitting:

Writing wrapper ASGSH Shell command wrapper 'update-asgs' for use later...
/usr/share/Modules/software/compiled_w_intel/mvapich2/2.3.7/bin/mpif90

make -j 1 NETCDFPATH=/home/joshua_p/asgs/opt NETCDF=enable NETCDF4=enable NETCDF4_COMPRESSION=enable MACHINE_NAME=hatteras compiler=intel
ifort -cpp -shared-intel -DASGSNETCDF -DHAVE_NETCDF4 -DNETCDF_CAN_DEFLATE -I. -I../output -I. -I/home/joshua_p/asgs/opt/include -L../output -L/home/joshua_p/asgs/opt/lib -o makeMax.x makeMax.f90 ../output/adcmesh.o ../output/asgsio.o ../output/logging.o -L/home/joshua_p/asgs/opt/lib -lnetcdff
../output/asgsio.o: In function `asgsio_mp_determinenetcdffilecharacteristics_':
asgsio.f90:(.text+0x4e20): undefined reference to `nodalattr_mp_initnodalattributenames_'
asgsio.f90:(.text+0x4eea): undefined reference to `nodalattr_mp_nodalattributenames_'
asgsio.f90:(.text+0x52fc): undefined reference to `nodalattr_mp_nodalattributenames_'
make: *** [makeMax] Error 1
command for "Step for in util/" exited with an error code of 2, stopping. Please fix and rerun.

At any rate, when I run with "--skip-steps util,perl-modules" I'm able to get past both errors (the one like #766 and this one); the full install isn't done yet, but it's been chugging along for another 10+ minutes thus far. I'm going to try to debug more later from within ASGSH as you suggested.

@notstarboard
Copy link
Collaborator

I feel like I should be able to figure these out, but so far it's just been a lot of unsuccessful trial and error. I'm attaching the output I'm getting from running these commands individually from within ASGSH in the hope that you may have thoughts.

For util, I submitted the following command, and received the output in the attached make_util.txt.

make -j 1 NETCDFPATH=/home/joshua_p/asgs NETCDF=enable NETCDF4=enable NETCDF4_COMPRESSION=enable MACHINE_NAME=hatteras compiler=intel

The error looks like it's having trouble finding the NetCDF module, and the "NETCDFPATH := /usr" here feels sketchy, but given the lines below it it could be just fine; I just don't know what we actually need to include.


For perl-modules, I submitted the following command, and received the output in the attached perl-modules.txt.

bash ./cloud/general/init-perl-modules.sh /home/joshua_p/asgs/perl5

Its output specifically references a build.log file which should have more details on the failure, and I've also attached this. Per build.log, it looks like the build processed errored out after the following command:

icc -Ic -I/home/joshua_p/asgs/opt/perl5/perls/perl-5.38.0/lib/5.38.0/x86_64-linux/CORE '-DVERSION="1.31"' '-DXS_VERSION="1.31"' -fPIC -c '-std=gnu99' -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE '-D_FILE_OFFSET_BITS=64' '-D_FORTIFY_SOURCE=2' -O2 -o lib/Params/Validate/XS.o lib/Params/Validate/XS.c

It looks like this needs to be run from within asgs/opt/perl5/perls/perl-5.38.0/lib/5.38.0 given the lib paths near the end of the command, but when I try this from the shell I get:

icc: error #10236: File not found:  'lib/Params/Validate/XS.c'
icc: command line error: no files specified; for help type "icc -help"

And, sure enough, lib/Params only has a single file in it: Check.pm. There isn't even a Validate folder. So, I'm not sure if that should all be created during the compilation process / elsewhere in init-perl-modules.sh, or if I'm running this in the wrong place, or what.

Would appreciate any tips or answers you may have!
build.log
make_util.txt
perl-modules.txt

@wwlwpd
Copy link
Collaborator Author

wwlwpd commented Aug 14, 2023

@notstarboard I am closing this issue because it was an already solved one. I know it's a little extra work, but please open this up as either another issue or in the discussions. I will converse with you there and will add my debugging steps to the wiki.

@wwlwpd wwlwpd closed this as completed Aug 14, 2023
@wwlwpd
Copy link
Collaborator Author

wwlwpd commented Aug 14, 2023

Troubleshooting steps outlined above:

#1179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants