adjoint simulation crash upon calling "run" #2309

yugeniom · 2022-11-13T13:03:54Z

Hello,
I am experiencing a reproducible core dump on the latest mpi-enabled conda pymeep package:
The issue is that the adjoint simulation immediately crashes when the "run" method is called on it. This doesnt happen when I tweak the adjoint simulation not to have active sources, by replacing "if np.any(dJ) by a constant False; in this case the adjoint simulation in fact starts and runs correctly, making me think is something related to a divergence; but at the same time it's strange that the crash is immediate on the run call and doesn't depend on the resolution, making me think this is not about a divergence during the timestep.
Could this be somehow related to the calculation of the adjoint source itself?

Here the code:

 class iterazione(object):
     def J(self,alpha):
         sourcealpha = 1
         return npa.square(npa.divide(npa.abs(alpha),npa.abs(sourcealpha)))
    
    def make_sdf(self,stepvar):
        
        
        Len = 29
        
        #design_region_resolution = 500
        Nx = int(45e3)
        x = npa.linspace(0, 30,num=Nx)
        y = npa.linspace(-0.5,0.5,1)
        xv, yv = npa.meshgrid(x,y)
               
        x0=0    
        sgfun = 0
        i=0
        while x0 < Len:
            x01 = npa.add(x0,stepvar[0]/2)
            ordsg = 32 #order of supergaussian function
            sgfac = 2 * (2 * npa.log(2))**(1/ordsg) #supergaussian fwhm to sigma
            sguide = npa.divide(stepvar[2*i],sgfac) #sigma of supergaussian
            sgfun = npa.add(npa.exp(npa.divide(-npa.power(npa.add(xv,-x01),ordsg),(2*npa.power(sguide,ordsg)))),sgfun)
            x0 = npa.add(x0,npa.add(stepvar[2*i +1],stepvar[2*i]))
            i += 1
        
        return sgfun.flatten()

    def __init__(self, x):
        resolution = 50
       
                        
        PMLthk = 1.0
        Sx = 45 + 30
        Sy = 7.0
        Len = 45

        Nx = int(45e3)
                        

        Sz = 3.6071

        WGwidth = 1
        lambda_c = 0.815
        frequencies = 1/npa.linspace(0.814,0.816,1)
        fcen = 1/lambda_c
        width = 0.2
        src_xshift = 10.83
        src_waist_radius = 10
        src_focus = 0
        src_rot_angle = -8  # CCW rotation angle about z axis (0: +y axis)
        grating_etch_depth = 0.0723
        grating_length = 45
        F0 = 0.2
        Fmax = 0.618
        R = 48685
        self.cell_size = mp.Vector3(Sx, Sy)
        pml_layers = [mp.PML(PMLthk)]

        fwidth = width * fcen
        source_center  = mp.Vector3(src_xshift - Len/2, 2)
        source_size = mp.Vector3(35, 0)

        src = mp.GaussianSource(frequency=fcen,fwidth=fwidth)
        beam_x0 = mp.Vector3(0,0)    # beam focus (relative to source center)

        beam_kdir = mp.Vector3(0,-1,0).rotate(mp.Vector3(0,0,1),math.radians(src_rot_angle))  # beam propagation direction
        beam_w0 = src_waist_radius  # beam waist radius
        beam_E0 = mp.Vector3(0,0,1)
        self.source = [mp.GaussianBeamSource(src,
                                         center=source_center,
                                         size=source_size,
                                         beam_x0=beam_x0,
                                         beam_kdir=beam_kdir,
                                         beam_w0=beam_w0,
                                         beam_E0=beam_E0)]

        


        
        
        
        
        
        ALGAAS02 = mp.Medium(index=3.42241)
        ALGAAS04 = mp.Medium(index=3.30241)
        ALGAAS08 = mp.Medium(index=3.06241)
        GAAS = mp.Medium(index=3.5457)
        AIR = mp.Medium(index=1)
        
        thk_clad = Sy/2
        thk_barr_bot02 = 0.01
        thk_barr_bot04 = 0.12
        thk_qw =  0.02
        thk_barr_top04 = 0.12
        thk_barr_top02 = 0.02
        thk_air = Sy - thk_clad - thk_barr_bot02 - thk_barr_bot04 - thk_qw - thk_barr_top04 - thk_barr_top02
        etchdepth = 0.03
        
        
        
        self.design_params1 = self.make_sdf(x)
        design_variables1 = mp.MaterialGrid(mp.Vector3(Nx, 1), AIR, ALGAAS02,weights=self.design_params1.reshape(Nx,1), grid_type='U_MEAN',do_averaging=True,
                 beta=1)
        self.design_region1 = mpa.DesignRegion(
            design_variables1, volume=mp.Volume(center=mp.Vector3(x= 0,y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04 +thk_barr_top02/2), size=mp.Vector3(45, thk_barr_top02))
        )

        
        
        self.geometry = [
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad/2), material=ALGAAS08, size=mp.Vector3(Sx, thk_clad)
            ),  # clad
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02/2), material=ALGAAS02, size=mp.Vector3(Sx,thk_barr_bot02)
            ),  # bottom barrier04
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04/2), material=ALGAAS04, size=mp.Vector3(Sx, thk_barr_bot04)
            ),  # bottom barrier02
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw/2), material=GAAS, size=mp.Vector3(Sx,thk_qw)
            ),  # qw
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04/2), material=ALGAAS04, size=mp.Vector3(Sx,thk_barr_top04)
            ),  # top barrier04

            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04 +thk_barr_top02/2), material=ALGAAS02, size=mp.Vector3(Sx,thk_barr_top02)
            ),  # top barrier02
            mp.Block(
                center=design_region1.center, size=design_region1.size, material=design_variables1
            ),  # design region
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04 + thk_barr_top02 + thk_air/2), material=AIR, size=mp.Vector3(Sx,thk_air)
            ),
        ]
      
        
        
        self.sim = mp.Simulation(
            cell_size=self.cell_size,
            boundary_layers=pml_layers,
            geometry=self.geometry,
            sources=self.source,
            eps_averaging=True,
            resolution=resolution,
        )
    
        
        mode_chosen = 1
        self.TE_left = mpa.EigenmodeCoefficient(
            self.sim, mp.Volume(center=mp.Vector3(-Sx/2 + 2, -Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw/2,0), size=mp.Vector3(y=2)),kpoint_func=lambda a,b: mp.Vector3(x=-1),mode=mode_chosen)
        self.ob_list = [self.TE_left]





        self.opt = mpa.OptimizationProblem(
            simulation=self.sim,
            objective_functions=self.J,
            objective_arguments=self.ob_list,
            design_regions=[self.design_region1],
            fcen=fcen,
            df=0,
            nf=1,
            remapping=True
        )

If I use the plot2D on the adjoint simulation object before calling the run, I see the adjoint source is correctly placed where the EigenModeCoeficient monitor was:

>#here I initialize a grating in the design region
>Sx = 45 + 30
>Sy = 3
>Nx = int(45e3)
>x1 = []
>for i in range(Nx):
    if (i%2 == 0):
        x1.append(0.1275)
    else:
        x1.append(0.1275)
>x1 = npa.array(x1)

>iterazione0 = iterazione(x1.flatten())
>iterazione0.opt.update_design([iterazione0.make_sdf(x1)])

>vals = []

>def get_slice(sim):
    vals.append(sim.get_array(center=mp.Vector3(0,0,0), size=mp.Vector3(Sx,7,0), component=mp.Ez))

>iterazione0.opt.forward_run()
>print(iterazione0.opt.f0)
>iterazione0.opt.prepare_adjoint_run()
>if iterazione0.opt.sim.k_point:
    iterazione0.opt.sim.change_k_point(-1*self.sim.k_point)
>iterazione0.opt.adjoint_design_region_monitors = []
>for ar in range(len(iterazione0.opt.objective_functions)):
            # Reset the fields
            iterazione0.opt.sim.restart_fields()
            iterazione0.opt.sim.clear_dft_monitors()

            # Update the sources
            iterazione0.opt.sim.change_sources(iterazione0.opt.adjoint_sources[ar])

            # register design dft fields
            iterazione0.opt.adjoint_design_region_monitors.append(utils.install_design_region_monitors(
            iterazione0.opt.sim, iterazione0.opt.design_regions, iterazione0.opt.frequencies, iterazione0.opt.decimation_factor
            ))
            iterazione0.opt.sim._evaluate_dft_objects()
            iterazione0.opt.sim.plot2D()  
            # Adjoint run
            #iterazione0.opt.sim.run(until=200)  
>

The text was updated successfully, but these errors were encountered:

smartalecH · 2022-11-14T01:36:38Z

I think this was fixed in #2251. Can you try this using master?

@oskooi it's probably a good time to push another release (ideally we'd reinstate the nightly builds)

yugeniom · 2022-11-14T09:00:22Z

BTW I forgot to mention, the conda pymeep version which produces the error is linux mpich 1.24.0 with python 3.10 . Is master already part of the latest conda nightly build or should I compile it?

oskooi · 2022-11-14T15:27:06Z

The Conda package is for version 1.24 (July 2022). Currently, there is no Conda package for nightly builds.

We will do a version 1.25 tarball release this week which includes #2251. This will be followed shortly thereafter by an update to the Conda package.

yugeniom · 2022-11-18T10:58:17Z

Hello there,
I just retested this on the new release 1.25 with #2251 fixed (conda package 1.25 mpi on python 3.11); unfortunately, the problem I described above apparently remains. The adjoint solver keeps crashing immediately upon "run" method call.

yugeniom · 2022-11-18T15:36:07Z

Hello,
I was running this with Ipython previously.
If I run from shell with plain python, I get some more information on the crash:

python3: step_db.cpp:40: void meep::fields::step_db(meep::field_type): Assertion changed_materials' failed.
Aborted`

I hope this can help...

smartalecH · 2022-11-18T20:59:24Z

python3: step_db.cpp:40: void meep::fields::step_db(meep::field_type): Assertion changed_materials' failed.
Aborted

This means the simulation wasn't initialized properly. You have to be careful with how you set up the adjoint run "manually".

if you're trying to do this yourself, I suggest you carefully go through the adjoint_run() code and ensure you are setting things up properly. From your snippet above, it doesn't look right.

yugeniom · 2022-11-18T21:22:52Z

I'll give a look, thanks. But in any case the problem appears when throwing the call on the Optimization object ("opt" here) with initial parameters, which starts a forward and an adjoint run under the hood from what I see. Already at this first adjoint run the problem appears. The manual setup you see below in the snippet was just debug trials on the problem...

smartalecH · 2022-11-20T22:25:17Z

The tests on master are passing, and all the tests use the __call__() method.

I would start with one of those tests, and gradually modify it to your current example. This way you can systematically identify the issue.

smartalecH · 2022-12-19T18:38:46Z

Ok this is definitely a bug. I encountered it myself while running something similar to the above. There's an easy (temporary) fix though.

The problem is that the forward run, which is when the simulation object is generated, only allocates certain DFT field components because of the polarity in 2D. When you move to do an adjoint simulation, the mode source that's generated isn't always perfectly polarized (there's some noise) unless you specify a parity (e.g. mp.EVEN_Y + mp.ODD_Z). This forces an adjoint source to match the polarization of the forward source, and all internal checks should pass. All the tests use a parity, which is why this went undetected...

In theory, it shouldn't be a problem if we run an adjoint source with additional field components... it's not clear to me why changed_materials assertion is what's failing. So I'm worried something else is going on (another consequence of the problematic #1855 PR...) Either way, we need to look into it eventually.

yugeniom · 2022-12-19T19:00:55Z

Hi @smartalecH ,
I don't know if this piece of additional information could help, while you debug the problem at the root: the apparent way around this which I found, was to "force_all_components = True" in the simulation class.

yugeniom changed the title ~~adjoint solver crash with autograd numpy~~ adjoint solver crash at run call Nov 13, 2022

yugeniom changed the title ~~adjoint solver crash at run call~~ adjoint simulation crash upon calling "run" Nov 13, 2022

smartalecH added bug adjoint optimization labels Dec 19, 2022

Arjun-Khurana mentioned this issue Jan 21, 2023

Sources seem to change before and after optimization #2379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adjoint simulation crash upon calling "run" #2309

adjoint simulation crash upon calling "run" #2309

yugeniom commented Nov 13, 2022 •

edited by smartalecH

Loading

smartalecH commented Nov 14, 2022

yugeniom commented Nov 14, 2022

oskooi commented Nov 14, 2022

yugeniom commented Nov 18, 2022

yugeniom commented Nov 18, 2022

smartalecH commented Nov 18, 2022

yugeniom commented Nov 18, 2022

smartalecH commented Nov 20, 2022

smartalecH commented Dec 19, 2022 •

edited

Loading

yugeniom commented Dec 19, 2022

adjoint simulation crash upon calling "run" #2309

adjoint simulation crash upon calling "run" #2309

Comments

yugeniom commented Nov 13, 2022 • edited by smartalecH Loading

smartalecH commented Nov 14, 2022

yugeniom commented Nov 14, 2022

oskooi commented Nov 14, 2022

yugeniom commented Nov 18, 2022

yugeniom commented Nov 18, 2022

smartalecH commented Nov 18, 2022

yugeniom commented Nov 18, 2022

smartalecH commented Nov 20, 2022

smartalecH commented Dec 19, 2022 • edited Loading

yugeniom commented Dec 19, 2022

yugeniom commented Nov 13, 2022 •

edited by smartalecH

Loading

smartalecH commented Dec 19, 2022 •

edited

Loading