Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when running more than one turbine #1068

Closed
3 of 13 tasks
justhawk98 opened this issue May 21, 2024 · 11 comments
Closed
3 of 13 tasks

Segmentation fault when running more than one turbine #1068

justhawk98 opened this issue May 21, 2024 · 11 comments
Labels
bug:amr-wind Something isn't working

Comments

@justhawk98
Copy link

Bug description

I'm running into a segmentation fault when trying to simulate multiple turbines. If I run just one, everything works fine and the program runs to completion. The error occurs during initialization of the 2nd instance of OpenFAST. Is there something I'm missing in my input file or .fst files? I'm using OpenFAST 3.2.1. Could the OpenFAST version be the issue?

Steps to reproduce

I've included a .zip file which contains the AMR-Wind input file, the FAST files for the turbines used, and the terminal output which includes the segmentation fault. The simulation uses only the FreeStream and Actuator physics models so anyone should be able to run my case without any precursor ABL simulations.

Steps to reproduce the behavior:

  1. Compiler used

    • GCC
    • LLVM
    • oneapi (Intel)
    • nvcc (NVIDIA)
    • rocm (AMD)
    • with MPI
    • other:
  2. Operating system

    • Linux
    • OSX
    • Windows
    • other (do tell ;)):
  3. Hardware:

    • CPU
    • GPU
  4. Machine details ():
    It's a slurm system. Spack was used for package management of both AMR-Wind and OpenFAST. Ran on the Brigham Young University supercomputer.

  5. Input file attachments and .txt file of the terminal output

error_report.zip

Expected behavior

AMR-Wind information

==============================================================================
                AMR-Wind (https://github.com/exawind/amr-wind)

  AMR-Wind version :: v0.0.1
  AMR-Wind Git SHA :: UNKNOWM
  AMReX version    :: 24.01

  Exec. time       :: Tue May 21 08:05:12 2024
  Build time       :: Mar 22 2024 11:09:34
  C++ compiler     :: GNU 11.4.0

  MPI              :: ON    (Num. ranks = 1)
  GPU              :: OFF
  OpenMP           :: ON    (Num. threads = 1)

  Enabled third-party libraries: 
    NetCDF    4.9.0
    HYPRE     2.26.0
    OpenFAST  

           This software is released under the BSD 3-clause license.           
 See https://github.com/Exawind/amr-wind/blob/development/LICENSE for details. 
------------------------------------------------------------------------------

Additional Content

I mentioned I'm using AMR-Wind and OpenFAST via spack. Here's the "spack find -lv" for AMR-Wind and OpenFAST which shows the flags used during compilation. Is there a flag I need to compile with to enable multiple turbines?

spack find -lv amr-wind
-- linux-rhel7-haswell / gcc@11.4.0 -----------------------------
jewntlv amr-wind@mainascentcuda+hypreipomasa+mpi+netcdf+openfast+openmp~rocm+shared+tests build_system=cmake build_type=RelWithDebInfo

spack find -lv openfast
-- linux-rhel7-haswell / gcc@11.4.0 -----------------------------
2ohvfco openfast@3.2.1+cxx+dll-interface+double-precisioniponetcdf+openmp+pic+shared build_system=cmake build_type=RelWithDebInfo

@justhawk98 justhawk98 added the bug:amr-wind Something isn't working label May 21, 2024
@marchdf
Copy link
Contributor

marchdf commented May 21, 2024

This definitely feels like it could be due to using old versions of the stack. You mentioned using spack and I am not sure the best way to do that here. We tend to use cmake or exawind-manager (that uses spack under the hood). I would definitely prioritize updating the stack and then seeing if the segfault remains.

From the attached files you shared, it definitely looks like the error is in openfast.

@psakievich, @jrood-nrel is there a way they can use vanilla spack here? I am a bit surprised it went with an old version of openfast. Or should they try the exawind-manager route?

@psakievich
Copy link
Contributor

psakievich commented May 21, 2024

@justhawk98 what spack version are you using? I would prefer we fix this in spack if that is the issue rather than pushing more people to exawind-manager.

Using spack@develop I get this:

$ spack solve amr-wind@main ^openfast@3.2.1
==> Error: concretization failed for the following reasons:

   1. Cannot select a single "version" for package "openfast"
   2. Cannot satisfy 'openfast@3.5:'
   3. Cannot satisfy 'openfast@3.2.1'
   4. Cannot satisfy 'openfast@2.6.0:3.4.1'
   5. Cannot satisfy 'openfast@3.5:'
        required because amr-wind depends on openfast@3.5: when @2:+openfast
          required because amr-wind@main ^openfast@3.2.1 requested explicitly
   6. Cannot satisfy 'openfast@3.2.1'
        required because amr-wind@main ^openfast@3.2.1 requested explicitly
   7. Cannot satisfy 'openfast@3.5:' and 'openfast@3.2.1
        required because amr-wind depends on openfast@3.5: when @2:+openfast
          required because amr-wind@main ^openfast@3.2.1 requested explicitly
        required because amr-wind@main ^openfast@3.2.1 requested explicitly
   8. Cannot satisfy 'openfast@3.2.1' and 'openfast@3.5:
        required because amr-wind depends on openfast@3.5: when @2:+openfast
          required because amr-wind@main ^openfast@3.2.1 requested explicitly
        required because amr-wind@main ^openfast@3.2.1 requested explicitly

So I suspect this is an older version of spack before we updated all the package requirements.

@psakievich
Copy link
Contributor

@justhawk98 you need to use openfast@3.5, but your binaries also have to be old. amr-wind won't compile with openfast@3.2.1 anymore. Several releases have been added and updated in the latest spack. Looks like those latest changes didn't make it into spack@0.22.0 so going to develop would be best. Many fixes are in for the package using spack@0.22.0 but we know many openfast bugs were fixed as part of the 3.5 release so that would be better to use.

@rybchuk
Copy link
Contributor

rybchuk commented May 21, 2024

@justhawk98 you need to use openfast@3.5, but your binaries also have to be old. amr-wind won't compile with openfast@3.2.1 anymore. Several releases have been added and updated in the latest spack. Looks like those latest changes didn't make it into spack@0.22.0 so going to develop would be best. Many fixes are in for the package using spack@0.22.0 but we know many openfast bugs were fixed as part of the 3.5 release so that would be better to use.

For what it's worth, Justin messaged me a few weeks back before AMR-Wind 2.0 and the OpenFAST 3.5 changeover, and he was having this same problem at that time

@lawrenceccheung
Copy link
Contributor

Hi @justhawk98, I noticed that you're running two openfast turbines but have only 1 rank/mpi process for the entire simulation. I believe that amr-wind will let this happen (both openfast turbine instances will end up on the same processor), but it is probably not good practice.

Lawrence

@justhawk98
Copy link
Author

Thanks for the help and suggestions everyone. Currently, we are using spack 0.19.2. It's looking like the best option is to recompile with an updated OpenFAST. I'll reach out to our research computing office and see what they think is the best way to do that. Also I wanted to ask, are there any flags we're missing that would be good to turn on? Or any that would be better to turn off?

@psakievich
Copy link
Contributor

@justhawk98 if you are stuck on Spack 0.19.2 then you won't have the newest versions registered in spack. You can pull down the amr-wind and openfast package.py files from spack and create a custom repo in spack to use them, but you might have to do some editing.

Alternatively for openfast you could give the spec openfast@git.3.5.2=3.4.1 to have spack treat the 3.5.2 git tag like a known version in spack (in this example it was 3.4.1) when it builds. That might be sufficient.

@psakievich
Copy link
Contributor

@justhawk98 in terms of flags, I don't think we typically run with openmp. I would try turning that off on both. We recently had to turn that off in openfast when it was accidentally turned on. I believe the issues was segfaults. So that could be your issue...

@marchdf
Copy link
Contributor

marchdf commented May 21, 2024

And even if it is off, it could be still picking it up: OpenFAST/openfast#2229. We've asked that it be completely disabled so (very) recent commits of openfast should be safe(r)

@bscarmo
Copy link

bscarmo commented May 28, 2024

I was having the same issue when running small tests in my laptop with OpenMP. I sorted it out running with MPI. For instance:
mpirun -np 2 amr_wind <inputfile.inp>
I guess when you compile with MPI you must have at least one MPI rank per turbine?
By the way, I did not have to change anything in the code, nor recompile with a different version of OpenFAST.

@justhawk98
Copy link
Author

justhawk98 commented Jun 3, 2024

Hi everyone, thank you for all your help. The issue has been resolved. Here's a report on how I fixed the issue in case anyone comes across this in the future.

TL;DR : I had to recompile OpenFAST and AMR-Wind from source with OpenMP off for both and MPI on for AMR-Wind.

REPORT:
I compiled everything from source, including all dependencies such as BLAS and LAPACK. I had to get a little bit hacky and copy some files which were installed elsewhere on the BYU supercomputer but in the end it worked. Specifically, I had to copy libgfortran.so.5 to my openfast/lib directory and libmpi.so.40 to my amrwind/lib directory. My build uses OpenFAST-v3.5.2 and AMR-Wind-v2.1.0. I should note, compiling with these two was enough to get past the two initializations of OpenFAST (Which was the original issue) but the run would still crash after the initial pressure itterations, right after writing the first plot file, with the following error:

terminate called after throwing an instance of 'std::runtime_error'
what(): FastIface: Error calling OpenFAST function:
FAST_Solution:FAST_AdvanceStates:ED_ABM4:ED_AB4:ED_Input_ExtrapInterp:ED_Input_ExtrapInterp2:t(1) must not equal t(2) to avoid a division-by-zero error.

But even with this build, a single turbine run would still finish without issue.
Anyway, at @psakievich 's suggestion, I recompiled with OpenMP off and MPI on for AMR-Wind. This fixed the issue and I am now able to run multiple turbines without the program crashing. Just in case anybody is curious as to the exact cmake command I used to build the programs, here they are:

OpenFAST compilation from openfast/build (As you can see the only additional flag was one to install to a specific directory):
cmake -DCMAKE_INSTALL_PREFIX=$HOME/exawind/openfast/my_openfast ..

AMR-Wind:
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/exawind/amr-wind/myamr7 -DAMR_WIND_ENABLE_OPENFAST=ON -DOpenFAST_DIR=$HOME/exawind/openfast/my_openfast/lib/cmake/OpenFAST -DCMAKE_EXE_LINKER_FLAGS="-static-libstdc++" -DAMR_WIND_ENABLE_MPI=ON

Once again, thank you everybody for your help. I couldn't have fixed this without you.

p.s.
Built with cmake-v3.27.8 and GCC-v13.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:amr-wind Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants