Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running parallel scans in Jupyter/IJulia #317

Open
phockett opened this issue Feb 23, 2023 · 11 comments
Open

Running parallel scans in Jupyter/IJulia #317

phockett opened this issue Feb 23, 2023 · 11 comments

Comments

@phockett
Copy link

I had a lot of issues dispatching parallel scan jobs from a Jupyter/IJulia notebook (although single parameter set runs in a notebook worked fine). This occasionally seemed to work, but generally didn't, and usually crashed out after spawning workers.

As a quick work-around, I wrote a basic job dispatch notebook instead, which basically writes a job script to file then launches the parallel execution via a shell script. In case it's of interest/use to anyone else, it's now at https://github.com/phockett/Luna.jl-jupyterDispatch.

@chrisbrahms
Copy link
Collaborator

Thank you very much, this is amazing! @nkotsina I think this may solve an issue you've had as well?

One question: when you say it "crashed out", did you get an error within Julia or did Julia itself crash? We've seen some issues running the continuous integration via GitHub actions on parallel scans as well, wondering whether this could be related.

@phockett
Copy link
Author

phockett commented Mar 22, 2023

Ah, yes, good question - I was a little slap-dash in the above posting!

The behaviour I see if trying to run a parallel job directly from a notebook is that parallel workers are launched, and seem to start up OK, then crash out on a per-worker basis with DomainError with CSpline evaulated out of bounds (see sample output below). As far as I can tell it has nothing to do with the job/parameters themselves, since I tested the same jobs in serial or parallel from CLI, and had no issues, so does seem to be something to do with the environment. My guess would be it's something to do with how Jupyter lab launches the independent processes - maybe there is a shared memory or permissions issue or similar cropping up here. It doesn't seem out of the realm of possibility that similar issues occur for GH actions runs.

As a side-note, I did have a couple of successful runs from a notebook at some point, but I've no idea why!


Here's a snippet of the output from an attempted run:

## Launch parallel job
runscan(scan) do scanidx, energy, pressure
    prop_capillary(a, flength, gas, pressure; λ0, τfwhm, energy,
                   λlims, trange, scan, scanidx, filepath=outputdir)
end

## Sample output
From worker 3:	┌ Info: Running scan: paralleltest_010223 (9 points)
      From worker 3:	│ Index: 00001
      From worker 3:	│ Variables:
      From worker 3:	│ 	pressure: 0.6
      From worker 3:	└ 	energy: 5e-05
      From worker 2:	┌ Info: Running scan: paralleltest_010223 (9 points)
      From worker 2:	│ Index: 00002
      From worker 2:	│ Variables:
      From worker 2:	│ 	pressure: 1
      From worker 2:	└ 	energy: 5e-05
      From worker 5:	┌ Info: Running scan: paralleltest_010223 (9 points)
      From worker 5:	│ Index: 00003
      From worker 5:	│ Variables:
      From worker 5:	│ 	pressure: 1.4
      From worker 5:	└ 	energy: 5e-05
      From worker 6:	┌ Info: Running scan: paralleltest_010223 (9 points)
      From worker 6:	│ Index: 00004
      From worker 6:	│ Variables:
      From worker 6:	│ 	pressure: 0.6
      From worker 6:	└ 	energy: 0.0001
      From worker 4:	┌ Info: Running scan: paralleltest_010223 (9 points)
      From worker 4:	│ Index: 00005
      From worker 4:	│ Variables:
      From worker 4:	│ 	pressure: 1
      From worker 4:	└ 	energy: 0.0001
      From worker 3:	[ Info: X+Y polarisation not required.
      From worker 3:	[ Info: Freq limits 0.07 - 3.00 PHz
      From worker 3:	[ Info: Samples needed: 7195.02, samples: 8192, δt = 55.59 as
      From worker 3:	[ Info: Requested time window: 400.0 fs, actual time window: 455.4 fs
      From worker 3:	[ Info: Grid: samples 4096 / 8192, ωmax 2.83e+16 / 5.65e+16
      From worker 3:	[ Info: Using PPT ionisation rate.
      From worker 3:	[ Info: Found cached PPT rate for 24.588208407672 eV, 800.0 nm
      From worker 3:	[ Info: Using mode-averaged propagation.
      From worker 2:	[ Info: X+Y polarisation not required.
      From worker 2:	[ Info: Freq limits 0.07 - 3.00 PHz
      From worker 2:	[ Info: Samples needed: 7195.02, samples: 8192, δt = 55.59 as
      From worker 2:	[ Info: Requested time window: 400.0 fs, actual time window: 455.4 fs
      From worker 2:	[ Info: Grid: samples 4096 / 8192, ωmax 2.83e+16 / 5.65e+16
      From worker 3:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 2:	[ Info: Using PPT ionisation rate.
      From worker 2:	[ Info: Found cached PPT rate for 24.588208407672 eV, 800.0 nm
      From worker 3:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 5:	[ Info: X+Y polarisation not required.
      From worker 5:	[ Info: Freq limits 0.07 - 3.00 PHz
      From worker 5:	[ Info: Samples needed: 7195.02, samples: 8192, δt = 55.59 as
      From worker 5:	[ Info: Requested time window: 400.0 fs, actual time window: 455.4 fs
      From worker 3:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 3:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 5:	[ Info: Grid: samples 4096 / 8192, ωmax 2.83e+16 / 5.65e+16
      From worker 2:	[ Info: Using mode-averaged propagation.
      From worker 6:	[ Info: X+Y polarisation not required.
      From worker 6:	[ Info: Freq limits 0.07 - 3.00 PHz
      From worker 6:	[ Info: Samples needed: 7195.02, samples: 8192, δt = 55.59 as
      From worker 6:	[ Info: Requested time window: 400.0 fs, actual time window: 455.4 fs
      From worker 5:	[ Info: Using PPT ionisation rate.
      From worker 5:	[ Info: Found cached PPT rate for 24.588208407672 eV, 800.0 nm
      From worker 6:	[ Info: Grid: samples 4096 / 8192, ωmax 2.83e+16 / 5.65e+16
      From worker 2:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 6:	[ Info: Using PPT ionisation rate.
      From worker 6:	[ Info: Found cached PPT rate for 24.588208407672 eV, 800.0 nm
      From worker 2:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 5:	[ Info: Using mode-averaged propagation.
      From worker 4:	[ Info: X+Y polarisation not required.
      From worker 4:	[ Info: Freq limits 0.07 - 3.00 PHz
      From worker 4:	[ Info: Samples needed: 7195.02, samples: 8192, δt = 55.59 as
      From worker 4:	[ Info: Requested time window: 400.0 fs, actual time window: 455.4 fs
      From worker 2:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 2:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 6:	[ Info: Using mode-averaged propagation.
      From worker 5:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 4:	[ Info: Grid: samples 4096 / 8192, ωmax 2.83e+16 / 5.65e+16
      From worker 4:	[ Info: Using PPT ionisation rate.
      From worker 6:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 4:	[ Info: Found cached PPT rate for 24.588208407672 eV, 800.0 nm
      From worker 5:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 6:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 5:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 5:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 4:	[ Info: Using mode-averaged propagation.
      From worker 6:	[ Info: Found FFTW wisdom at /home/paul/.luna/FFTWcache_1threads
      From worker 6:	[ Info: FFTW wisdom saved to /home/paul/.luna/FFTWcache_1threads
      From worker 3:	┌ Warning: Error at scanidx 1:
      From worker 3:	│ DomainError with CSpline evaulated out of bounds, 2.2500203760117746e11 > 2.099287787671748e11:
      From worker 3:	│ 
      From worker 3:	│ Stacktrace:
      From worker 3:	│   [1] (::Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}})(x0::Float64)
      From worker 3:	│     @ Luna.Maths ~/.julia/packages/Luna/cyPdO/src/Maths.jl:761
      From worker 3:	│   [2] ir
      From worker 3:	│     @ ~/.julia/packages/Luna/cyPdO/src/Ionisation.jl:240 [inlined]
      From worker 3:	│   [3] _broadcast_getindex_evalf
      From worker 3:	│     @ ./broadcast.jl:670 [inlined]
      From worker 3:	│   [4] _broadcast_getindex
      From worker 3:	│     @ ./broadcast.jl:643 [inlined]
      From worker 3:	│   [5] getindex
      From worker 3:	│     @ ./broadcast.jl:597 [inlined]
      From worker 3:	│   [6] macro expansion
      From worker 3:	│     @ ./broadcast.jl:961 [inlined]
      From worker 3:	│   [7] macro expansion
      From worker 3:	│     @ ./simdloop.jl:77 [inlined]
      From worker 3:	│   [8] copyto!
      From worker 3:	│     @ ./broadcast.jl:960 [inlined]
      From worker 3:	│   [9] copyto!
      From worker 3:	│     @ ./broadcast.jl:913 [inlined]
      From worker 3:	│  [10] materialize!
      From worker 3:	│     @ ./broadcast.jl:871 [inlined]
      From worker 3:	│  [11] materialize!
      From worker 3:	│     @ ./broadcast.jl:868 [inlined]
      From worker 3:	│  [12] (::Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}})(out::Vector{Float64}, E::Vector{Float64})
      From worker 3:	│     @ Luna.Ionisation ~/.julia/packages/Luna/cyPdO/src/Ionisation.jl:242
      From worker 3:	│  [13] PlasmaScalar!(Plas::Luna.Nonlinear.PlasmaCumtrapz{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Vector{Float64}, Vector{Float64}}, E::Vector{Float64})
      From worker 3:	│     @ Luna.Nonlinear ~/.julia/packages/Luna/cyPdO/src/Nonlinear.jl:117
      From worker 3:	│  [14] (::Luna.Nonlinear.PlasmaCumtrapz{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Vector{Float64}, Vector{Float64}})(out::Vector{Float64}, Et::Vector{Float64}, ρ::Float64)
      From worker 3:	│     @ Luna.Nonlinear ~/.julia/packages/Luna/cyPdO/src/Nonlinear.jl:169
      From worker 3:	│  [15] Et_to_Pt!(Pt::Vector{Float64}, Et::Vector{Float64}, responses::Tuple{Luna.Nonlinear.var"#Kerr#1"{Float64}, Luna.Nonlinear.PlasmaCumtrapz{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Vector{Float64}, Vector{Float64}}}, density::Float64)
      From worker 3:	│     @ Luna.NonlinearRHS ~/.julia/packages/Luna/cyPdO/src/NonlinearRHS.jl:124
      From worker 3:	│  [16] (::Luna.NonlinearRHS.TransModeAvg{Float64, FFTW.rFFTWPlan{Float64, -1, false, 1, Int64}, Tuple{Luna.Nonlinear.var"#Kerr#1"{Float64}, Luna.Nonlinear.PlasmaCumtrapz{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Vector{Float64}, Vector{Float64}}}, Luna.Grid.RealGrid, Luna.Interface.var"#11#12"{Float64}, Luna.NonlinearRHS.var"#norm!#18"{Luna.Grid.RealGrid, Luna.LinearOps.var"#βfun!#26"{Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}, Vector{ComplexF64}, Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}})(nl::Vector{ComplexF64}, Eω::Vector{ComplexF64}, z::Float64)
      From worker 3:	│     @ Luna.NonlinearRHS ~/.julia/packages/Luna/cyPdO/src/NonlinearRHS.jl:359
      From worker 3:	│  [17] (::Luna.RK45.var"#fbar!#14"{Luna.NonlinearRHS.TransModeAvg{Float64, FFTW.rFFTWPlan{Float64, -1, false, 1, Int64}, Tuple{Luna.Nonlinear.var"#Kerr#1"{Float64}, Luna.Nonlinear.PlasmaCumtrapz{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Vector{Float64}, Vector{Float64}}}, Luna.Grid.RealGrid, Luna.Interface.var"#11#12"{Float64}, Luna.NonlinearRHS.var"#norm!#18"{Luna.Grid.RealGrid, Luna.LinearOps.var"#βfun!#26"{Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}, Vector{ComplexF64}, Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}}, Luna.RK45.var"#prop!#12"{Vector{ComplexF64}}, Vector{ComplexF64}})(out::Vector{ComplexF64}, ybar::Vector{ComplexF64}, t1::Float64, t2::Float64)
      From worker 3:	│     @ Luna.RK45 ~/.julia/packages/Luna/cyPdO/src/RK45.jl:304
      From worker 3:	│  [18] Luna.RK45.PreconStepper(f!::Luna.NonlinearRHS.TransModeAvg{Float64, FFTW.rFFTWPlan{Float64, -1, false, 1, Int64}, Tuple{Luna.Nonlinear.var"#Kerr#1"{Float64}, Luna.Nonlinear.PlasmaCumtrapz{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Vector{Float64}, Vector{Float64}}}, Luna.Grid.RealGrid, Luna.Interface.var"#11#12"{Float64}, Luna.NonlinearRHS.var"#norm!#18"{Luna.Grid.RealGrid, Luna.LinearOps.var"#βfun!#26"{Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}, Vector{ComplexF64}, Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}}, linop::Vector{ComplexF64}, y0::Vector{ComplexF64}, t::Float64, dt::Float64; rtol::Float64, atol::Float64, safety::Float64, max_dt::Float64, min_dt::Int64, locextrap::Bool, norm::Function)
      From worker 3:	│     @ Luna.RK45 ~/.julia/packages/Luna/cyPdO/src/RK45.jl:157
      From worker 3:	│  [19] #solve_precon#2
      From worker 3:	│     @ ~/.julia/packages/Luna/cyPdO/src/RK45.jl:22 [inlined]
      From worker 3:	│  [20] run(Eω::Vector{ComplexF64}, grid::Luna.Grid.RealGrid, linop::Vector{ComplexF64}, transform::Luna.NonlinearRHS.TransModeAvg{Float64, FFTW.rFFTWPlan{Float64, -1, false, 1, Int64}, Tuple{Luna.Nonlinear.var"#Kerr#1"{Float64}, Luna.Nonlinear.PlasmaCumtrapz{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Vector{Float64}, Vector{Float64}}}, Luna.Grid.RealGrid, Luna.Interface.var"#11#12"{Float64}, Luna.NonlinearRHS.var"#norm!#18"{Luna.Grid.RealGrid, Luna.LinearOps.var"#βfun!#26"{Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}, Vector{ComplexF64}, Vector{Float64}}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}}, FT::FFTW.rFFTWPlan{Float64, -1, false, 1, Int64}, output::Luna.Output.HDF5Output{Luna.Output.GridCondition, Luna.Stats.var"#collect_stats#44"{Luna.Stats.var"#analytic!#43"{AbstractFFTs.ScaledPlan{ComplexF64, FFTW.cFFTWPlan{ComplexF64, 1, false, 1, Int64}, Float64}, CartesianIndices{0, Tuple{}}, Vector{ComplexF64}}, Vector{ComplexF64}, Tuple{Luna.Stats.var"#addstat!#1"{Vector{Float64}}, Luna.Stats.var"#addstat!#3"{Luna.Fields.var"#energy_ω#29"{Luna.Grid.RealGrid, Float64}}, Luna.Stats.var"#addstat!#10", Luna.Stats.var"#addstat!#19"{Luna.Grid.RealGrid}, Luna.Stats.var"#47#48"{Float64}, Luna.Stats.var"#addstat!#31"{Luna.Interface.var"#11#12"{Float64}}, Luna.Stats.var"#addstat!#32"{Luna.Interface.var"#11#12"{Float64}, Symbol}, Luna.Stats.var"#addstat!#14"{Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}}, Luna.Stats.var"#addstat!#27"{Int64, Luna.Grid.RealGrid, Luna.Interface.var"#11#12"{Float64}, Luna.Interface.var"#21#22"{Luna.Capillary.MarcatiliMode{Float64, Luna.Capillary.var"#3#7"{Luna.Capillary.var"#3#4#8"{Luna.PhysData.var"#52#55"{Luna.PhysData.var"#48#49"{Float64, Luna.PhysData.var"#5#6"{Float64, Float64, Float64, Float64, Float64, Float64}}}}}, Luna.Capillary.var"#5#9"{Luna.Capillary.var"#5#6#10"{Luna.PhysData.var"#53#56"{Luna.Maths.CmplxBSpline{Luna.Maths.RealBSpline{Dierckx.Spline1D, Vector{Float64}, Luna.Maths.FastFinder{Vector{Float64}, Float64}}}}}}, Val{true}}}, Vector{Float64}, Luna.Stats.var"#ionfrac!#26"{Luna.Ionisation.var"#ionrate!#17"{Luna.Ionisation.var"#ir#16"{Float64, Luna.Maths.CSpline{Float64, Float64, Vector{Float64}, Vector{Float64}, Luna.Maths.var"#ffast#31"{Int64, Float64, Float64}}}}, Float64}}, typeof(Luna.Stats.zdz!)}}}; min_dz::Int64, max_dz::Float64, init_dz::Float64, z0::Float64, rtol::Float64, atol::Float64, safety::Float64, norm::Function, status_period::Int64)
      From worker 3:	│     @ Luna ~/.julia/packages/Luna/cyPdO/src/Luna.jl:367

... and so on with similar errors for other scanidx and workers.

I can share a full example notebook and outputs if useful.

@chrisbrahms
Copy link
Collaborator

A full (non)working example would definitely be helpful. What's confusing here is that the error you're getting, DomainError with CSpline evaulated out of bounds, occurs when you try to evaluate the accelerated (pre-calculated and then interpolated) PPT ionisation rate at too high a field strength. Which is to say, it should very much depend on your input parameters and is essentially telling you that the model doesn't work for the intensity you're creating. If it turns out that this runs in serial but not in parallel, something serious must be wrong with the way we dispatch the jobs.

@phockett
Copy link
Author

Hmmm, that is curious.

I initially tested for parameters quite similar to the Luna docs (these are as currently set in the demo notebook at https://github.com/phockett/Luna.jl-jupyterDispatch/blob/main/scan_parallel_Luna_template_160223.ipynb).

a = 150e-6 
flength = 2.5
gas = :Ar
λ0 = 800e-9
τfwhm = 35e-15
λlims = (100e-9, 4e-6)
trange = 400e-15
energies = collect(50e-6:50e-6:150e-6)
pressures = collect(0.6:0.4:1.4)

Physically I think these parameters should be OK, but I also don't have that much experience here. All that said, I've since encountered some of these types of errors running in parallel from shell too, so it's not impossible that there is something else going on here! I can certainly test a little more carefully given your comments, perhaps you could suggest some "safe" and "not safe" parameter sets for testing to see if the errors are encountered reliably and when expected?

@chrisbrahms
Copy link
Collaborator

chrisbrahms commented Mar 30, 2023

In this particular case, the issue isn't in the parallel or serial execution, but in a small difference between your scan scripts. The order in which you add variables to the scan matters! In the one which fails you have

scan = Scans.Scan("paralleltest_010223")
addvariable!(scan,:pressure, pressures)
addvariable!(scan, :energy, energies)

but in the one which works you have

scan = Scan("pressure_energy_test_300123"; energy=energies)
addvariable!(scan, :pressure, pressures)

I.e. in the failing case you add pressure, then energy to the scan, rather than the other way around. But in both cases your do-block starts with

runscan(scan) do scanidx, energy, pressure

which assumes energy comes first. So in the failing case you are running a scan over 0.6 to 1.4 joules of pulse energy so 16 petawatts of peak power, which is why the PPT rate fails.

It does say this in the manual, but from looking at this it needs to be far more obvious. I'll turn it into a big flashing warning box. Sorry for the confusion!

I don't have a properly working Julia+Jupyter setup right now--could you check whether fixing this just fixes everything?

@phockett
Copy link
Author

phockett commented Mar 30, 2023

Ah, the meat-space "stupid-user" debug was required! 😜 Thanks for the gentle correction.

It certainly looks like mea culpa, and with apologies... I do vaguely recall playing around with the ordering of the parameter passing when trying to get the parallel scans running, and had indeed probably read about same in the docs, but clearly got in a mess there regardless and may have tricked myself into thinking it was not the issue... but it would certainly explain the inconsistencies I was seeing. (And, as a side-note, I probably hadn't quite appreciated the Julia ! notation at the time either, nor various other subtle differences to Python.)

In any case, I will have a more careful/rigorous play starting with a clean notebook and see if things make more sense, or if there are any other issues that crop up.

@phockett
Copy link
Author

phockett commented Mar 31, 2023

The plot thickens - after a bit more fiddling I now recall that that I was playing with the parameter ordering because creating a Scan() object was buggy in IJulia, and gives ArgParse errors. At some point in testing it seemed that adding variables as different types, and/or in a different order, seemed to work for mysterious reasons. Today I can't get it to work at all! See https://github.com/phockett/Luna.jl-jupyterDispatch/blob/main/demo/Luna_IJulia_Scan_tests_310323.ipynb for today's example (versions at the end of the doc).

I think this is a more fundamental issue, presumably related to how Jupyter/IJulia is wrapping these calls (since it seems OK from CLI usually). The crashes discussed above are, presumably, as you mention, a different issue related to the actual code I ended up with at the times it worked.

@chrisbrahms
Copy link
Collaborator

Aha! Now we're getting somewhere. The problem here seems to come from the fact that IJulia internally runs Julia with some added command-line arguments. The Scan object constructor in Luna reacts to these (to enable running of scan files with different modes via the command line, which is useful for execution on clusters or in a screen session over SSH) but of course they don't match the signature that the Scan constructor expects. Hence why everything works fine on the command line (where ARGS, the global variable which holds the command-line arguments, is usually empty) but not in IJulia.

In a classic case of one nice feature causing really hard bugs, command-line arguments also override any Execs you define inside your script, so there's basically no way of fixing this by the way you create the scan. I'm not sure how it would sometimes work, but my guess is that somewhere along the line it got the point of parsing the arguments once, at which point it deletes the content of ARGS to avoid infinite recursion.

One thing to try is to print the command-line arguments with println(ARGS) and if it's not empty, add the following line just before you make your Scan object:

[pop!(ARGS) for _ in eachindex(ARGS)]

This will remove any command-line arguments currently present and should hopefully enable the Scan creation to work fine. If this does not work, it's possible IJulia injects these arguments somewhere along the line before sending the function call to the Julia kernel.

I will try to think of a better way of dealing with this. Hopefully there is a way of fixing this issue and still keep all of the functionality.

@michaelhemsworth
Copy link

michaelhemsworth commented Nov 28, 2023

Hi @chrisbrahms perhaps you can check if isdefined(Main, :IJulia) && Main.IJulia.inited returns true to identify if Julia is running in a notebook and then pop the ARGS automatically. From https://julialang.github.io/IJulia.jl/stable/manual/usage/#Julia-and-IPython-Magics that @phockett pointed me towards

@phockett
Copy link
Author

phockett commented Nov 29, 2023

Thanks for the suggestion @chrisbrahms , and the further reminder/prod to look at this again @michaelhemsworth!

On the IJulia point, it seems like ARGS contains the session handle by default, e.g. in a new IJulia notebook I get something like:

println(ARGS)
["~/.local/share/jupyter/runtime/kernel-5715b518-55f2-42c9-aac0-c4f3cd6fd3c9.json"]

So it makes sense this is messing up the general arg passing!

Adding [pop!(ARGS) for _ in eachindex(ARGS)] before Scan creation does, indeed, fix this issue for Scan creation, and I can then run serial or parallel scans happily from the notebook, huzzah. (I will post a demo notebook soon - just need to tidy up my tests!)

As @michaelhemsworth mentioned, wrapping this with an IJulia check should suffice for general use, e.g. I quickly tested this (in a notebook only), and it seemed to work as expected:

if isdefined(Main, :IJulia) && Main.IJulia.inited
    [pop!(ARGS) for _ in eachindex(ARGS)]
end

It might be germane to put the session handle somewhere too, I've no idea if deleting this will/can cause other issues!

(Tested in Julia 1.9.3/IJulia 1.24.2, Luna 0.4.0.)

@phockett
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants