Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic julia error #12

Closed
JPDarby opened this issue Mar 2, 2023 · 38 comments
Closed

Sporadic julia error #12

JPDarby opened this issue Mar 2, 2023 · 38 comments

Comments

@JPDarby
Copy link

JPDarby commented Mar 2, 2023

log.txt

Use square brackets [] for indexing an Array.' occurred while calling julia code:
using ACE1x

            elements = basis_info["elements"]
            cor_order = basis_info["cor_order"]
            maxdeg = basis_info["maxdeg"]
            r_cut = basis_info["r_cut"]
            smoothness_prior_param = basis_info["smoothness_prior"]
            
            B = ACE1x.ace_basis(elements = Symbol.(elements), 
                        order = cor_order, 
                        totaldegree = maxdeg, 
                        rcut = r_cut)

            B_length = length(B)
            if isnothing(smoothness_prior_param)
                P_diag = nothing
            elseif smoothness_prior_param[1] isa String && smoothness_prior_param[2] isa Number && lowercase(smoothness_prior_param[1]) == "algebraic"
                P_diag = diag(smoothness_prior(B; p = smoothness_prior_param[2]))
            else
                throw(ArgumentError("Unknown smoothness_prior"))
            end

I have seen this error twice now and have attached the full log. It doesn't seem specific to the basis chosen and when I restarted HAL from the same configurations I couldn't reproduce it.

@casv2
Copy link
Collaborator

casv2 commented Mar 2, 2023

I'm worried this bug is buried deep inside Julia, we're evaluating the same code snippet with the same inputs dozens of times before?

@bernstei
Copy link
Collaborator

bernstei commented Mar 2, 2023

Sadly, I haven't been able to make this code deterministic no matter what I do (at least with BayesianRidge, which appears to be non-deterministic because of the SVD), so the fact that it isn't reproducible isn't surprising.

[edited - Cas pointed out where the attachment is]

@bernstei
Copy link
Collaborator

bernstei commented Mar 2, 2023

I saw a bug like this before when I accidentally redefined one of the functions as a variable that contained a vector (namely smoothness_prior, before I rename the variable smoothness_prior_param). However, I'm looking at everything that uses "calling" notation, i.e. symbol(..., and I don't see any symbols that have plausibly been redefined, and especially not after such a large number of iterations. Can we ask one of the julia experts whether it's possible to extract the julia line number on which this error is occuring?

@JPDarby how often is this happening?

@casv2
Copy link
Collaborator

casv2 commented Mar 2, 2023

It happened on two separate HAL runs, both around 40-50 HAL iterations in... Restarting from the same database and selected basis params actually did not raise the bug annoyingly.

@JPDarby
Copy link
Author

JPDarby commented Mar 2, 2023

Yeah exactly as Cas said

@casv2
Copy link
Collaborator

casv2 commented Mar 2, 2023

@cortner Do you have any thoughts on this issue? We're evaluating this exact code snippet 40-50 times and the next iteration leads to this error..?

@bernstei
Copy link
Collaborator

bernstei commented Mar 2, 2023

2 out of how many? If I run it 5 times, can I expect it to happen once? 10 times? 100 times?

@JPDarby
Copy link
Author

JPDarby commented Mar 2, 2023

I did 2 runs, both 40-50ish iterations and they both ended with this error. I've restarted one of them and will see if it happens a 3rd time...

@cortner
Copy link
Member

cortner commented Mar 3, 2023

The only thing I can think of what happened here is what Noam said above: that some function that we are trying to call has be overwritten by a variable that is an array.

@cortner
Copy link
Member

cortner commented Mar 3, 2023

Unfortunately the LOG.txt doesn't give the Julia stack trace so I don't have a way of tracking down where the exception was thrown. Is it possible to reproduce this in pure Julia? I'm afraid I don't have the time and energy to start digging into how it is called from Python and how that might affect the results ...

@bernstei
Copy link
Collaborator

bernstei commented Mar 3, 2023

I have a hard time imagining how we can reproduce this in pure julia, since it's deep into a long run. @cortner do you know where that log message is generated? julyp?

@cortner
Copy link
Member

cortner commented Mar 3, 2023

definitely not julip. First time I've seen such a message.

@bernstei
Copy link
Collaborator

bernstei commented Mar 3, 2023

I guess we can tell from the python stack trace that it's just python's julia module. I'll try to see if I can find a way to add more details. I may follow up here with questions about julia's exception objects, but probably I'll be able to find the docs.

@bernstei
Copy link
Collaborator

bernstei commented Mar 3, 2023

I have a simpler idea for debugging, at least for now. @JPDarby if it's at all reproducible, I'll send you the patch so you can test it and we can get more info about what's happening.

@bernstei
Copy link
Collaborator

bernstei commented Mar 4, 2023

Yes - I figured out how to extract the julia line number where the error happens by catching the exception inside the julia code block. It just requires a patch to bases/default.py. If this issue is reproducible (even if not deterministic), I'll create a branch where we can apply it. @JPDarby let me know.

@bernstei
Copy link
Collaborator

bernstei commented Mar 6, 2023

See also JuliaPy/pyjulia#525

@wcwitt wcwitt mentioned this issue Mar 7, 2023
@bernstei
Copy link
Collaborator

Basically, you just need to add try as the 1st line of the julia source for the basis, and then end it with

catch e
    throw(error(string(e) * " in julia code location " * string(stacktrace(catch_backtrace()))))
end

The julia code line will be reported as part of the python exception message, although keep in mind that the line numbers will be relative to the source code with the "try" line, and depending on where you start the julia relative to the python """ that can be confusing as well (and, of course, the fact that the julia line is probably 1 based, not 0).

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

Thank you, this is the entire stacktrace (had to include to add some globals to get outside the try/catch scope)

Internal error: encountered unexpected error in runtime:
UndefRefError()
getindex at ./array.jl:924 [inlined]
copy_exprargs at ./expr.jl:64
copy at ./expr.jl:37
copy_exprs at ./expr.jl:42
copy_exprargs at ./expr.jl:64
inflate_ir at ./compiler/ssair/legacy.jl:14
inflate_ir at ./compiler/ssair/legacy.jl:10
InliningTodo at ./compiler/ssair/inlining.jl:873 [inlined]
resolve_todo at ./compiler/ssair/inlining.jl:804
analyze_method! at ./compiler/ssair/inlining.jl:861
handle_match! at ./compiler/ssair/inlining.jl:1293
analyze_single_call! at ./compiler/ssair/inlining.jl:1210
assemble_inline_todo! at ./compiler/ssair/inlining.jl:1425
ssa_inlining_pass! at ./compiler/ssair/inlining.jl:82
jfptr_ssa_inlining_passNOT._13086.clone_1 at /home/casv2/julia-1.8.5/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
run_passes at ./compiler/optimize.jl:539
optimize at ./compiler/optimize.jl:504 [inlined]
_typeinf at ./compiler/typeinfer.jl:257
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2386
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2360
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_ext at ./compiler/typeinfer.jl:967
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1000
typeinf_ext_toplevel at ./compiler/typeinfer.jl:996
jfptr_typeinf_ext_toplevel_17539.clone_1 at /home/casv2/julia-1.8.5/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
jl_apply at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/julia.h:1843 [inlined]
jl_type_infer at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:315
jl_generate_fptr_impl at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/jitlayers.cpp:319
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2091 [inlined]
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2035
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2369 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
#AnalyticTransform#6 at /home/casv2/.julia/packages/ACE1/G18CB/src/polynomials/transforms.jl:277
AnalyticTransform at /home/casv2/.julia/packages/ACE1/G18CB/src/polynomials/transforms.jl:266 [inlined]
#agnesi_transform#5 at /home/casv2/.julia/packages/ACE1/G18CB/src/polynomials/transforms.jl:225 [inlined]
agnesi_transform at /home/casv2/.julia/packages/ACE1/G18CB/src/polynomials/transforms.jl:211
#10 at ./array.jl:0 [inlined]
iterate at ./generator.jl:47 [inlined]
collect_to! at ./array.jl:845
collect_to_with_first! at ./array.jl:823 [inlined]
collect at ./array.jl:797
unknown function (ip: 0x1505b7c82984)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
#_transform#9 at /home/casv2/.julia/packages/ACE1x/2WWoB/src/defaults.jl:168
_transform##kw at /home/casv2/.julia/packages/ACE1x/2WWoB/src/defaults.jl:159 [inlined]
_pair_basis at /home/casv2/.julia/packages/ACE1x/2WWoB/src/defaults.jl:235
unknown function (ip: 0x1505b7d05654)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
#ace_basis#23 at /home/casv2/.julia/packages/ACE1x/2WWoB/src/defaults.jl:291
ace_basis##kw at /home/casv2/.julia/packages/ACE1x/2WWoB/src/defaults.jl:288
unknown function (ip: 0x1505b7c7c634)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
jl_apply at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/julia.h:1843 [inlined]
do_call at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:215
eval_stmt_value at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:166 [inlined]
eval_body at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:612
eval_body at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:522
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:750
top-level scope at none:10
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/toplevel.c:906
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/toplevel.c:850
ijl_toplevel_eval_in at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/toplevel.c:965
ijl_eval_string at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/jlapi.c:115
ffi_call_unix64 at /home/casv2/miniconda3/lib/python3.9/lib-dynload/../../libffi.so.7 (unknown line)
ffi_call_int at /home/casv2/miniconda3/lib/python3.9/lib-dynload/../../libffi.so.7 (unknown line)
_call_function_pointer at /usr/local/src/conda/python-3.9.5/Modules/_ctypes/callproc.c:920 [inlined]
_ctypes_callproc at /usr/local/src/conda/python-3.9.5/Modules/_ctypes/callproc.c:1263
PyCFuncPtr_call at /usr/local/src/conda/python-3.9.5/Modules/_ctypes/_ctypes.c:4201
_PyObject_MakeTpCall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839081a3)
unknown function (ip: 0x55f3839ba2e3)
unknown function (ip: 0x55f3839081c9)
unknown function (ip: 0x55f38398fb31)
_PyFunction_Vectorcall at python (unknown line)
_PyObject_Call at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
_PyObject_Call at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839083bd)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839083bd)
unknown function (ip: 0x55f38398fb31)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f38398fb31)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f3839bb3fc)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
_PyObject_Call at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839083bd)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f38398fb31)
PyEval_EvalCodeEx at python (unknown line)
PyEval_EvalCode at python (unknown line)
unknown function (ip: 0x55f383a3fe8a)
unknown function (ip: 0x55f383a70214)
unknown function (ip: 0x55f38391b676)
PyRun_SimpleFileExFlags at python (unknown line)
Py_RunMain at python (unknown line)
Py_BytesMain at python (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x55f3839fda63)
Internal error: encountered unexpected error in runtime:
UndefRefError()
getindex at ./array.jl:924 [inlined]
copy_exprargs at ./expr.jl:64
copy at ./expr.jl:37
copy_exprs at ./expr.jl:42
copy_exprargs at ./expr.jl:64
inflate_ir at ./compiler/ssair/legacy.jl:14
inflate_ir at ./compiler/ssair/legacy.jl:10
InliningTodo at ./compiler/ssair/inlining.jl:873 [inlined]
resolve_todo at ./compiler/ssair/inlining.jl:804
analyze_method! at ./compiler/ssair/inlining.jl:861
handle_match! at ./compiler/ssair/inlining.jl:1293
analyze_single_call! at ./compiler/ssair/inlining.jl:1210
assemble_inline_todo! at ./compiler/ssair/inlining.jl:1425
ssa_inlining_pass! at ./compiler/ssair/inlining.jl:82
jfptr_ssa_inlining_passNOT._13086.clone_1 at /home/casv2/julia-1.8.5/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
run_passes at ./compiler/optimize.jl:539
optimize at ./compiler/optimize.jl:504 [inlined]
_typeinf at ./compiler/typeinfer.jl:257
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2386
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2386
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2386
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2386
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2386
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_edge at ./compiler/typeinfer.jl:877
abstract_call_method at ./compiler/abstractinterpretation.jl:647
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:139
abstract_call_known at ./compiler/abstractinterpretation.jl:1716
abstract_call at ./compiler/abstractinterpretation.jl:1786
abstract_call at ./compiler/abstractinterpretation.jl:1753
abstract_eval_statement at ./compiler/abstractinterpretation.jl:1910
typeinf_local at ./compiler/abstractinterpretation.jl:2386
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2482
_typeinf at ./compiler/typeinfer.jl:230
typeinf at ./compiler/typeinfer.jl:213
typeinf_ext at ./compiler/typeinfer.jl:967
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1000
typeinf_ext_toplevel at ./compiler/typeinfer.jl:996
jfptr_typeinf_ext_toplevel_17539.clone_1 at /home/casv2/julia-1.8.5/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
jl_apply at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/julia.h:1843 [inlined]
jl_type_infer at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:315
jl_generate_fptr_impl at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/jitlayers.cpp:319
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2091 [inlined]
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2035
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2369 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
_show_default at ./show.jl:413
show_default at ./show.jl:396 [inlined]
show at ./show.jl:391 [inlined]
print at ./strings/io.jl:35
print_to_string at ./strings/io.jl:144
string at ./strings/io.jl:185
unknown function (ip: 0x1504e844f504)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
jl_apply at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/julia.h:1843 [inlined]
do_call at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:215
eval_stmt_value at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:166 [inlined]
eval_body at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:612
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:750
top-level scope at none:24
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/toplevel.c:906
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/toplevel.c:850
ijl_toplevel_eval_in at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/toplevel.c:965
ijl_eval_string at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/jlapi.c:115
ffi_call_unix64 at /home/casv2/miniconda3/lib/python3.9/lib-dynload/../../libffi.so.7 (unknown line)
ffi_call_int at /home/casv2/miniconda3/lib/python3.9/lib-dynload/../../libffi.so.7 (unknown line)
_call_function_pointer at /usr/local/src/conda/python-3.9.5/Modules/_ctypes/callproc.c:920 [inlined]
_ctypes_callproc at /usr/local/src/conda/python-3.9.5/Modules/_ctypes/callproc.c:1263
PyCFuncPtr_call at /usr/local/src/conda/python-3.9.5/Modules/_ctypes/_ctypes.c:4201
_PyObject_MakeTpCall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839081a3)
unknown function (ip: 0x55f3839ba2e3)
unknown function (ip: 0x55f3839081c9)
unknown function (ip: 0x55f38398fb31)
_PyFunction_Vectorcall at python (unknown line)
_PyObject_Call at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
_PyObject_Call at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839083bd)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839083bd)
unknown function (ip: 0x55f38398fb31)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f38398fb31)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f3839bb3fc)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
_PyObject_Call at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f3839083bd)
unknown function (ip: 0x55f38398fd2a)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x55f383907eff)
unknown function (ip: 0x55f38398fb31)
PyEval_EvalCodeEx at python (unknown line)
PyEval_EvalCode at python (unknown line)
unknown function (ip: 0x55f383a3fe8a)
unknown function (ip: 0x55f383a70214)
unknown function (ip: 0x55f38391b676)
PyRun_SimpleFileExFlags at python (unknown line)
Py_RunMain at python (unknown line)
Py_BytesMain at python (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x55f3839fda63)
�[33m[W 2023-03-10 13:48:39,082]�[0m Trial 12 failed because of the following error: JuliaError('Exception \'MethodError(Any[MethodInstance for Core.check_top_bit(::Type{UInt64}, ::Int64)], (Base.Meta.var"#65345#65346"(), Base.RefValue{Any}), 0x0000000000016516) in julia code location Base.StackTraces.StackFrame[Base.RefValue{Base.Meta.var"#65345#65346"}(x::Function) at refvalue.jl:8, convert(#unused#::Type{Ref{Base.Meta.var"#65345#65346"}}, x::Function) at refpointer.jl:104, cconvert(T::Type, x::Function) at essentials.jl:412, FunctionWrappers.FunctionWrapper{Ret, Args}(obj::objT) where {Ret, Args, objT} at FunctionWrappers.jl:106, ACE1.Transforms.AnalyticTransform(str_f::String, str_finv::String; T::Type) at transforms.jl:277, AnalyticTransform at transforms.jl:266 [inlined], #agnesi_transform#5 at transforms.jl:225 [inlined], agnesi_transform(r0::Float64, p::Int64, q::Int64) at transforms.jl:211, #10 at array.jl:0 [inlined], iterate at generator.jl:47 [inlined], collect_to!(dest::Matrix{Pair{Tuple{Symbol, Symbol}, ACE1.Transforms.AnalyticTransform{Float64}}}, itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}, offs::Int64, st::Tuple{Tuple{Symbol, Int64}, Tuple{Symbol, Int64}}) at array.jl:845, collect_to_with_first! at array.jl:823 [inlined], collect(itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}) at array.jl:797, _transform(kwargs::NamedTuple{(:wL, :rbasis, :Eref, :order, :elements, :delete2b, :pair_transform, :rcut, :totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, :pair_basis, :pair_envelope, :envelope), Tuple{Float64, Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64, Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, Tuple{Symbol, Int64, Int64}}}; transform::Tuple{Symbol, Int64, Int64}) at defaults.jl:168, _transform at defaults.jl:159 [inlined], _pair_basis(kwargs::NamedTuple{(:wL, :rbasis, :Eref, :order, :elements, :delete2b, :pair_transform, :rcut, :totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, :pair_basis, :pair_envelope, :envelope), Tuple{Float64, Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64, Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, Tuple{Symbol, Int64, Int64}}}) at defaults.jl:235, ace_basis(; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:elements, :order, :totaldegree, :rcut), Tuple{Vector{Symbol}, Int64, Int64, Float64}}}) at defaults.jl:291, (::ACE1x.var"#ace_basis##kw")(::NamedTuple{(:elements, :order, :totaldegree, :rcut), Tuple{Vector{Symbol}, Int64, Int64, Float64}}, ::typeof(ACE1x.ace_basis)) at defaults.jl:288, top-level scope at none:10]\' occurred while calling julia code:\nusing ACE1x\n            \n            elements = basis_info["elements"]\n            cor_order = basis_info["cor_order"]\n            maxdeg = basis_info["maxdeg"]\n            r_cut = basis_info["r_cut"]\n            smoothness_prior_param = basis_info["smoothness_prior"]\n            \n            try\n                global B = ACE1x.ace_basis(elements = Symbol.(elements), \n                            order = cor_order, \n                            totaldegree = maxdeg, \n                            rcut = r_cut)\n\n                global B_length = length(B)\n                if isnothing(smoothness_prior_param)\n                    global P_diag = nothing\n                elseif smoothness_prior_param[1] isa String && smoothness_prior_param[2] isa Number && lowercase(smoothness_prior_param[1]) == "algebraic"\n                    global P_diag = diag(smoothness_prior(B; p = smoothness_prior_param[2]))\n                else\n                    throw(ArgumentError("Unknown smoothness_prior"))\n                end\n            catch e\n               throw(error(string(e) * " in julia code location " * string(stacktrace(catch_backtrace()))))\n            end\n            ')�[0m
Traceback (most recent call last):
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/timeout_decorator/timeout_decorator.py", line 82, in new_function
    return function(*args, **kwargs)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/ACEHAL-0.0.1-py3.9.egg/ACEHAL/optimize_basis.py", line 166, in objective
    B_len_norm = define_basis(basis_info=basis_info, **basis_kwargs)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/ACEHAL-0.0.1-py3.9.egg/ACEHAL/basis.py", line 50, in define_basis
    Main.eval(julia_source)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/julia/core.py", line 627, in eval
    ans = self._call(src)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/julia/core.py", line 555, in _call
    self.check_exception(src)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/julia/core.py", line 609, in check_exception
    raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'MethodError(Any[MethodInstance for Core.check_top_bit(::Type{UInt64}, ::Int64)], (Base.Meta.var"#65345#65346"(), Base.RefValue{Any}), 0x0000000000016516) in julia code location Base.StackTraces.StackFrame[Base.RefValue{Base.Meta.var"#65345#65346"}(x::Function) at refvalue.jl:8, convert(#unused#::Type{Ref{Base.Meta.var"#65345#65346"}}, x::Function) at refpointer.jl:104, cconvert(T::Type, x::Function) at essentials.jl:412, FunctionWrappers.FunctionWrapper{Ret, Args}(obj::objT) where {Ret, Args, objT} at FunctionWrappers.jl:106, ACE1.Transforms.AnalyticTransform(str_f::String, str_finv::String; T::Type) at transforms.jl:277, AnalyticTransform at transforms.jl:266 [inlined], #agnesi_transform#5 at transforms.jl:225 [inlined], agnesi_transform(r0::Float64, p::Int64, q::Int64) at transforms.jl:211, #10 at array.jl:0 [inlined], iterate at generator.jl:47 [inlined], collect_to!(dest::Matrix{Pair{Tuple{Symbol, Symbol}, ACE1.Transforms.AnalyticTransform{Float64}}}, itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}, offs::Int64, st::Tuple{Tuple{Symbol, Int64}, Tuple{Symbol, Int64}}) at array.jl:845, collect_to_with_first! at array.jl:823 [inlined], collect(itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}) at array.jl:797, _transform(kwargs::NamedTuple{(:wL, :rbasis, :Eref, :order, :elements, :delete2b, :pair_transform, :rcut, :totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, :pair_basis, :pair_envelope, :envelope), Tuple{Float64, Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64, Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, Tuple{Symbol, Int64, Int64}}}; transform::Tuple{Symbol, Int64, Int64}) at defaults.jl:168, _transform at defaults.jl:159 [inlined], _pair_basis(kwargs::NamedTuple{(:wL, :rbasis, :Eref, :order, :elements, :delete2b, :pair_transform, :rcut, :totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, :pair_basis, :pair_envelope, :envelope), Tuple{Float64, Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64, Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, Tuple{Symbol, Int64, Int64}}}) at defaults.jl:235, ace_basis(; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:elements, :order, :totaldegree, :rcut), Tuple{Vector{Symbol}, Int64, Int64, Float64}}}) at defaults.jl:291, (::ACE1x.var"#ace_basis##kw")(::NamedTuple{(:elements, :order, :totaldegree, :rcut), Tuple{Vector{Symbol}, Int64, Int64, Float64}}, ::typeof(ACE1x.ace_basis)) at defaults.jl:288, top-level scope at none:10]' occurred while calling julia code:
using ACE1x
            
            elements = basis_info["elements"]
            cor_order = basis_info["cor_order"]
            maxdeg = basis_info["maxdeg"]
            r_cut = basis_info["r_cut"]
            smoothness_prior_param = basis_info["smoothness_prior"]
            
            try
                global B = ACE1x.ace_basis(elements = Symbol.(elements), 
                            order = cor_order, 
                            totaldegree = maxdeg, 
                            rcut = r_cut)

                global B_length = length(B)
                if isnothing(smoothness_prior_param)
                    global P_diag = nothing
                elseif smoothness_prior_param[1] isa String && smoothness_prior_param[2] isa Number && lowercase(smoothness_prior_param[1]) == "algebraic"
                    global P_diag = diag(smoothness_prior(B; p = smoothness_prior_param[2]))
                else
                    throw(ArgumentError("Unknown smoothness_prior"))
                end
            catch e
               throw(error(string(e) * " in julia code location " * string(stacktrace(catch_backtrace()))))
            end
            
TIMING reference_calc 102.42197036743164
Traceback (most recent call last):
  File "/home/casv2/ACEHAL/peg/run4b-2/run.py", line 30, in <module>
    HAL(fit_configs, fit_configs, None, solver,
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/ACEHAL-0.0.1-py3.9.egg/ACEHAL/HAL.py", line 322, in HAL
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/ACEHAL-0.0.1-py3.9.egg/ACEHAL/HAL.py", line 370, in _optimize_basis
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/ACEHAL-0.0.1-py3.9.egg/ACEHAL/optimize_basis.py", line 214, in optimize
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/optuna/study/study.py", line 419, in optimize
    _optimize(
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/optuna/study/_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/optuna/study/_optimize.py", line 160, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/optuna/study/_optimize.py", line 234, in _run_trial
    raise func_err
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/timeout_decorator/timeout_decorator.py", line 82, in new_function
    return function(*args, **kwargs)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/ACEHAL-0.0.1-py3.9.egg/ACEHAL/optimize_basis.py", line 166, in objective
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/ACEHAL-0.0.1-py3.9.egg/ACEHAL/basis.py", line 50, in define_basis
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/julia/core.py", line 627, in eval
    ans = self._call(src)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/julia/core.py", line 555, in _call
    self.check_exception(src)
  File "/home/casv2/miniconda3/lib/python3.9/site-packages/julia/core.py", line 609, in check_exception
    raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'MethodError(Any[MethodInstance for Core.check_top_bit(::Type{UInt64}, ::Int64)], (Base.Meta.var"#65345#65346"(), Base.RefValue{Any}), 0x0000000000016516) in julia code location Base.StackTraces.StackFrame[Base.RefValue{Base.Meta.var"#65345#65346"}(x::Function) at refvalue.jl:8, convert(#unused#::Type{Ref{Base.Meta.var"#65345#65346"}}, x::Function) at refpointer.jl:104, cconvert(T::Type, x::Function) at essentials.jl:412, FunctionWrappers.FunctionWrapper{Ret, Args}(obj::objT) where {Ret, Args, objT} at FunctionWrappers.jl:106, ACE1.Transforms.AnalyticTransform(str_f::String, str_finv::String; T::Type) at transforms.jl:277, AnalyticTransform at transforms.jl:266 [inlined], #agnesi_transform#5 at transforms.jl:225 [inlined], agnesi_transform(r0::Float64, p::Int64, q::Int64) at transforms.jl:211, #10 at array.jl:0 [inlined], iterate at generator.jl:47 [inlined], collect_to!(dest::Matrix{Pair{Tuple{Symbol, Symbol}, ACE1.Transforms.AnalyticTransform{Float64}}}, itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}, offs::Int64, st::Tuple{Tuple{Symbol, Int64}, Tuple{Symbol, Int64}}) at array.jl:845, collect_to_with_first! at array.jl:823 [inlined], collect(itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}) at array.jl:797, _transform(kwargs::NamedTuple{(:wL, :rbasis, :Eref, :order, :elements, :delete2b, :pair_transform, :rcut, :totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, :pair_basis, :pair_envelope, :envelope), Tuple{Float64, Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64, Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, Tuple{Symbol, Int64, Int64}}}; transform::Tuple{Symbol, Int64, Int64}) at defaults.jl:168, _transform at defaults.jl:159 [inlined], _pair_basis(kwargs::NamedTuple{(:wL, :rbasis, :Eref, :order, :elements, :delete2b, :pair_transform, :rcut, :totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, :pair_basis, :pair_envelope, :envelope), Tuple{Float64, Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64, Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, Tuple{Symbol, Int64, Int64}}}) at defaults.jl:235, ace_basis(; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:elements, :order, :totaldegree, :rcut), Tuple{Vector{Symbol}, Int64, Int64, Float64}}}) at defaults.jl:291, (::ACE1x.var"#ace_basis##kw")(::NamedTuple{(:elements, :order, :totaldegree, :rcut), Tuple{Vector{Symbol}, Int64, Int64, Float64}}, ::typeof(ACE1x.ace_basis)) at defaults.jl:288, top-level scope at none:10]' occurred while calling julia code:
using ACE1x
            
            elements = basis_info["elements"]
            cor_order = basis_info["cor_order"]
            maxdeg = basis_info["maxdeg"]
            r_cut = basis_info["r_cut"]
            smoothness_prior_param = basis_info["smoothness_prior"]
            
            try
                global B = ACE1x.ace_basis(elements = Symbol.(elements), 
                            order = cor_order, 
                            totaldegree = maxdeg, 
                            rcut = r_cut)

                global B_length = length(B)
                if isnothing(smoothness_prior_param)
                    global P_diag = nothing
                elseif smoothness_prior_param[1] isa String && smoothness_prior_param[2] isa Number && lowercase(smoothness_prior_param[1]) == "algebraic"
                    global P_diag = diag(smoothness_prior(B; p = smoothness_prior_param[2]))
                else
                    throw(ArgumentError("Unknown smoothness_prior"))
                end
            catch e
               throw(error(string(e) * " in julia code location " * string(stacktrace(catch_backtrace()))))
            end

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

It's the (very) long line above including
ACE1.Transforms.AnalyticTransform(str_f::String, str_finv::String; T::Type) at transforms.jl:277

@cortner could this be related to the warnings?

┌ Warning: automatic inverse not implemented, inverse will return NaN
└ @ ACE1.Transforms ~/.julia/packages/ACE1/G18CB/src/polynomials/transforms.jl:270

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

Formatting the relevant line here again

julia.core.JuliaError: Exception 'MethodError(Any[MethodInstance for Core.check_top_bit(::Type{UInt64}, ::Int64)], 
(Base.Meta.var"#65345#65346"(), Base.RefValue{Any}), 0x0000000000016516) in julia code location 
Base.StackTraces.StackFrame[Base.RefValue{Base.Meta.var"#65345#65346"}(x::Function) at refvalue.jl:8, 
convert(#unused#::Type{Ref{Base.Meta.var"#65345#65346"}}, x::Function) at refpointer.jl:104, cconvert(T::Type, 
x::Function) at essentials.jl:412, FunctionWrappers.FunctionWrapper{Ret, Args}(obj::objT) where {Ret, Args, objT} at 
FunctionWrappers.jl:106, ACE1.Transforms.AnalyticTransform(str_f::String, str_finv::String; T::Type) at transforms.jl:277, 
AnalyticTransform at transforms.jl:266 [inlined], #agnesi_transform#5 at transforms.jl:225 [inlined], 
agnesi_transform(r0::Float64, p::Int64, q::Int64) at transforms.jl:211, #10 at array.jl:0 [inlined], iterate at generator.jl:47 
[inlined], collect_to!(dest::Matrix{Pair{Tuple{Symbol, Symbol}, ACE1.Transforms.AnalyticTransform{Float64}}}, 
itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"
{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}, offs::Int64, st::Tuple{Tuple{Symbol, Int64}, Tuple{Symbol, Int64}}) at array.jl:845, collect_to_with_first! at array.jl:823 [inlined], 
collect(itr::Base.Generator{Base.Iterators.ProductIterator{Tuple{Vector{Symbol}, Vector{Symbol}}}, ACE1x.var"#10#11"
{Dict{Tuple{Symbol, Symbol}, Float64}, Int64, Int64}}) at array.jl:797, _transform(kwargs::NamedTuple{(:wL, :rbasis, :Eref, 
:order, :elements, :delete2b, :pair_transform, :rcut, :totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, 
:pair_basis, :pair_envelope, :envelope), Tuple{Float64, Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, 
Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64, Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, 
Tuple{Symbol, Int64, Int64}}}; transform::Tuple{Symbol, Int64, Int64}) at defaults.jl:168, _transform at defaults.jl:159 
[inlined], _pair_basis(kwargs::NamedTuple{(:wL, :rbasis, :Eref, :order, :elements, :delete2b, :pair_transform, :rcut, 
:totaldegree, :pair_degree, :transform, :r0, :pure2b, :pair_rcut, :pair_basis, :pair_envelope, :envelope), Tuple{Float64,
Symbol, Missing, Int64, Vector{Symbol}, Bool, Tuple{Symbol, Int64, Int64}, Float64, Int64, Symbol, Tuple{Symbol, Int64,
 Int64}, Symbol, Bool, Float64, Symbol, Tuple{Symbol, Int64}, Tuple{Symbol, Int64, Int64}}}) at defaults.jl:235, 
ace_basis(; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:elements, :order, :totaldegree, :rcut), 
Tuple{Vector{Symbol}, Int64, Int64, Float64}}}) at defaults.jl:291, (::ACE1x.var"#ace_basis##kw")
(::NamedTuple{(:elements, :order, :totaldegree, :rcut), Tuple{Vector{Symbol}, Int64, Int64, Float64}}, 
::typeof(ACE1x.ace_basis)) at defaults.jl:288, top-level scope at none:10]' occurred while calling julia code:```

@bernstei
Copy link
Collaborator

bernstei commented Mar 10, 2023

Can you paste your default.py here also? I think the issue is in line 10 of the julia code section (end of the very long line is top-level scope at none:10)

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

Here's my default.py

params = ["elements", "cor_order", "maxdeg", "r_cut", "smoothness_prior"]

source = """using ACE1x
            
            elements = basis_info["elements"]
            cor_order = basis_info["cor_order"]
            maxdeg = basis_info["maxdeg"]
            r_cut = basis_info["r_cut"]
            smoothness_prior_param = basis_info["smoothness_prior"]
            
            try
                global B = ACE1x.ace_basis(elements = Symbol.(elements), 
                            order = cor_order, 
                            totaldegree = maxdeg, 
                            rcut = r_cut)

                global B_length = length(B)
                if isnothing(smoothness_prior_param)
                    global P_diag = nothing
                elseif smoothness_prior_param[1] isa String && smoothness_prior_param[2] isa Number && lowercase(smoothness_prior_param[1]) == "algebraic"
                    global P_diag = diag(smoothness_prior(B; p = smoothness_prior_param[2]))
                else
                    throw(ArgumentError("Unknown smoothness_prior"))
                end
            catch e
               throw(error(string(e) * " in julia code location " * string(stacktrace(catch_backtrace()))))
            end
            """

I think line 10 is the ACE1x.ace_basis call where internally something breaks, only after (consistently!?) 36 HAL iterations performing the exact same call apart from the inputs (which seem very sensible).

@bernstei
Copy link
Collaborator

Interesting. I've never run more than 20 iterations in a single run, so maybe that's why I haven't run into this. I think we have to take this up with the ACE/julia experts.

@bernstei
Copy link
Collaborator

Is it happening right after some particular choice of basis parameters (e.g. new ones chosen by the optimizer)?

@cortner
Copy link
Member

cortner commented Mar 10, 2023

Ok I understand the cause and can fix it. Well not really the cause but I have a rough idea what might have happened.

How did you transfer the model / basis to different processes?

@bernstei
Copy link
Collaborator

bernstei commented Mar 10, 2023

I don't think we're doing anything active (about multiple processes) in python. Just calling Main.eval with some julia code to create the basis, which defines some global variables like B, getting a reference to them via B = Main.B, then calling things like E_B = np.array(energy(B, convert(ASEAtoms(at)))) to get rows of the design matrix. energy, convert, and ASEAtoms are defined to python by

from julia.JuLIP import energy, forces, virial
convert = Main.eval("julip_at(a) = JuLIP.Atoms(a)")
ASEAtoms = Main.eval("ASEAtoms(a) = ASE.ASEAtoms(a)")

@bernstei
Copy link
Collaborator

I'm setting JULIA_NUM_THREADS=<n_cores>, but @casv2 is the one who's getting this error (running more iterations than I ever have), so he'll have to say what he's doing.

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

I'm setting JULIA_NUM_THREADS=<n_cores>

I'm doing this too, might this be the problem?

Is it happening right after some particular choice of basis parameters (e.g. new ones chosen by the optimizer)?

No they're different.

How did you transfer the model / basis to different processes?

As @bernstei described above I don't think we're not using multiple proces calls to Julia. It seems to break after exactly 536 total calls of that function in serial seemingly

@cortner
Copy link
Member

cortner commented Mar 10, 2023

no - threads shouldn't cause an issue.

I've seen it before when we tried to copy an ACE basis to a new process. Then the anonymous function that defines the Agnesi(p, q) transform gets lost along the way. What I will do now is implement a raw Agnesi(p, q) struct without anonymous functions. I bet this will solve your problem for now at least. But I still don't understand why the problem occured in the first place.

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

Thank you very much, regarding the warnings could we suppress those or maybe remove? In our current setup we see them hundreds of times which is a bit excessive.

@bernstei
Copy link
Collaborator

I've seen it before when we tried to copy an ACE basis to a new process

New python process, or new julia process? Just trying to understand how this could be happening, given that we're not (as far as I know) doing anything with multiple processes.

@wcwitt
Copy link

wcwitt commented Mar 10, 2023

New julia process. We've seen a (possibly) related issue for distributed assembly, where the core problem is serializing the basis and reconstructing it elsewhere, so maybe the Python interface does something similar. Doesn't yet explain the intermittency though.

@cortner
Copy link
Member

cortner commented Mar 10, 2023

New python process, or new julia process?

i've seen it when copying to a new Julia process.

@cortner
Copy link
Member

cortner commented Mar 10, 2023

could we suppress those or maybe remove?

Yes, I can remove them. It will just fail by throwing an error,.

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

That'd be great, thanks

@cortner
Copy link
Member

cortner commented Mar 10, 2023

can you please try ACE1.jl v0.11.4 - see also this PR

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

Thank you, running a job now. Warnings have dissapeared and I'll get back once I get to 36 HAL iterations, should be a few hours. Which is also pretty much exactly how long it takes to generate a stable ACE potential for a small molecule starting from 1 config :). Including running the DFT.

@casv2
Copy link
Collaborator

casv2 commented Mar 10, 2023

This seems resolved now, thank you!

@casv2 casv2 closed this as completed Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants