Skip to content

NOMAD segfaults on Manjaro Linux when Julia is started with "-t n" with n>1 #39

@simonp0420

Description

@simonp0420

Firstly, thanks for making this great optimizer available to Julia users!

I have an expensive objective function that takes about 20 seconds to evaluate with threading enabled in Julia. When I try to optimize with NOMAD on Manjaro Linux, starting Julia with -t2 -t3, etc., on my 8-core machine, I get the following error (-t1 works fine, though slowly):

julia> include("cpssopt.jl")
All variables are granular. MAX_EVAL is set to 1000000 to prevent algorithm from circling around best solution indefinetely
Caught seg fault in thread 0
terminate called after throwing an instance of 'NOMAD_4_0_0::Exception'
  what():  NOMAD::Exception thrown (/workspace/srcdir/nomad/src/Algos/Step.cpp, 103) Caught seg fault

signal (6): Aborted
in expression starting at /run/timeshift/backup/simonp_win/julia/packages/PSSFSS/sandbox/sjoberg_cpss/nomad1/threadtest/cpssopt.jl:97
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
__verbose_terminate_handler at /workspace/srcdir/gcc-9.1.0/libstdc++-v3/libsupc++/vterminate.cc:95
__terminate at /workspace/srcdir/gcc-9.1.0/libstdc++-v3/libsupc++/eh_terminate.cc:47
terminate at /workspace/srcdir/gcc-9.1.0/libstdc++-v3/libsupc++/eh_terminate.cc:57
__cxa_throw at /workspace/srcdir/gcc-9.1.0/libstdc++-v3/libsupc++/eh_throw.cc:95
_ZN11NOMAD_4_0_04Step13debugSegFaultEi.cold.119 at /home/simonp/.julia/artifacts/c8c50bbe7723f08c41d066d6269a774dde10aa5e/lib/libnomadAlgos.so (unknown line)
killpg at /usr/lib/libc.so.6 (unknown line)
jl_mutex_wait at /buildworker/worker/package_linux64/build/src/locks.h:37 [inlined]
jl_mutex_lock at /buildworker/worker/package_linux64/build/src/locks.h:88 [inlined]
jl_generate_fptr at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:318
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1970
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:2236 [inlined]
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2229 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Allocations: 114487065 (Pool: 114449704; Big: 37361); GC: 46

This error does not occur on my Windows machine. Here is my configuration:

julia> versioninfo(verbose=true)
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      "Manjaro Linux"
  uname: Linux 5.10.30-1-MANJARO #1 SMP Wed Apr 14 08:07:27 UTC 2021 x86_64 unknown
  CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz: 
              speed         user         nice          sys         idle          irq
       #1  4598 MHz    1142208 s        652 s    1037768 s   13116888 s      32771 s
       #2  4552 MHz    1142082 s        328 s    1033468 s   13123801 s      31878 s
       #3  4589 MHz    1265023 s        638 s     998944 s   12975603 s      66731 s
       #4  4580 MHz    1134830 s         88 s    1046800 s   13110539 s      34344 s
       #5  4588 MHz    1134600 s        893 s    1041140 s   13120661 s      32812 s
       #6  4576 MHz    1132240 s       1030 s    1036202 s   13128789 s      32403 s
       #7  4561 MHz    1133090 s        125 s    1059432 s   13062803 s      46101 s
       #8  4566 MHz    1139859 s        868 s    1050228 s   13101772 s      34894 s
       
  Memory: 62.53139114379883 GB (12211.14453125 MB free)
  Uptime: 1.537912e6 sec
  Load Avg:  0.35  0.66  0.7
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8
  DRAWHOME = /usr/share/opencascade/resources/DrawResources
  PATH = /home/simonp/.local/bin:/usr/local/bin:/usr/bin:/var/lib/snapd/snap/bin:/usr/local/sbin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/simonp/scuff-em-installation/bin
  HOME = /home/simonp
  TERM = xterm-256color
  WINDOWPATH = 2

I'm using NOMAD.jl v. 2.1.0.

Actually, from looking at the information my objective function writes out, it looks like the segfault is occurring in the objective function, presumably when the first Threads.@threads statement is encountered. But, as I noted previously, this error doesn't occur on my Windows machine, where I'm using 8 threads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions