Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjoint simulation crash upon calling "run" #2309

Open
yugeniom opened this issue Nov 13, 2022 · 10 comments
Open

adjoint simulation crash upon calling "run" #2309

yugeniom opened this issue Nov 13, 2022 · 10 comments

Comments

@yugeniom
Copy link

yugeniom commented Nov 13, 2022

Hello,
I am experiencing a reproducible core dump on the latest mpi-enabled conda pymeep package:
The issue is that the adjoint simulation immediately crashes when the "run" method is called on it. This doesnt happen when I tweak the adjoint simulation not to have active sources, by replacing "if np.any(dJ) by a constant False; in this case the adjoint simulation in fact starts and runs correctly, making me think is something related to a divergence; but at the same time it's strange that the crash is immediate on the run call and doesn't depend on the resolution, making me think this is not about a divergence during the timestep.
Could this be somehow related to the calculation of the adjoint source itself?

Here the code:

 class iterazione(object):
     def J(self,alpha):
         sourcealpha = 1
         return npa.square(npa.divide(npa.abs(alpha),npa.abs(sourcealpha)))
    
    def make_sdf(self,stepvar):
        
        
        Len = 29
        
        #design_region_resolution = 500
        Nx = int(45e3)
        x = npa.linspace(0, 30,num=Nx)
        y = npa.linspace(-0.5,0.5,1)
        xv, yv = npa.meshgrid(x,y)
               
        x0=0    
        sgfun = 0
        i=0
        while x0 < Len:
            x01 = npa.add(x0,stepvar[0]/2)
            ordsg = 32 #order of supergaussian function
            sgfac = 2 * (2 * npa.log(2))**(1/ordsg) #supergaussian fwhm to sigma
            sguide = npa.divide(stepvar[2*i],sgfac) #sigma of supergaussian
            sgfun = npa.add(npa.exp(npa.divide(-npa.power(npa.add(xv,-x01),ordsg),(2*npa.power(sguide,ordsg)))),sgfun)
            x0 = npa.add(x0,npa.add(stepvar[2*i +1],stepvar[2*i]))
            i += 1
        
        return sgfun.flatten()

    def __init__(self, x):
        resolution = 50
       
                        
        PMLthk = 1.0
        Sx = 45 + 30
        Sy = 7.0
        Len = 45

        Nx = int(45e3)
                        

        Sz = 3.6071

        WGwidth = 1
        lambda_c = 0.815
        frequencies = 1/npa.linspace(0.814,0.816,1)
        fcen = 1/lambda_c
        width = 0.2
        src_xshift = 10.83
        src_waist_radius = 10
        src_focus = 0
        src_rot_angle = -8  # CCW rotation angle about z axis (0: +y axis)
        grating_etch_depth = 0.0723
        grating_length = 45
        F0 = 0.2
        Fmax = 0.618
        R = 48685
        self.cell_size = mp.Vector3(Sx, Sy)
        pml_layers = [mp.PML(PMLthk)]

        fwidth = width * fcen
        source_center  = mp.Vector3(src_xshift - Len/2, 2)
        source_size = mp.Vector3(35, 0)

        src = mp.GaussianSource(frequency=fcen,fwidth=fwidth)
        beam_x0 = mp.Vector3(0,0)    # beam focus (relative to source center)

        beam_kdir = mp.Vector3(0,-1,0).rotate(mp.Vector3(0,0,1),math.radians(src_rot_angle))  # beam propagation direction
        beam_w0 = src_waist_radius  # beam waist radius
        beam_E0 = mp.Vector3(0,0,1)
        self.source = [mp.GaussianBeamSource(src,
                                         center=source_center,
                                         size=source_size,
                                         beam_x0=beam_x0,
                                         beam_kdir=beam_kdir,
                                         beam_w0=beam_w0,
                                         beam_E0=beam_E0)]

        


        
        
        
        
        
        ALGAAS02 = mp.Medium(index=3.42241)
        ALGAAS04 = mp.Medium(index=3.30241)
        ALGAAS08 = mp.Medium(index=3.06241)
        GAAS = mp.Medium(index=3.5457)
        AIR = mp.Medium(index=1)
        
        thk_clad = Sy/2
        thk_barr_bot02 = 0.01
        thk_barr_bot04 = 0.12
        thk_qw =  0.02
        thk_barr_top04 = 0.12
        thk_barr_top02 = 0.02
        thk_air = Sy - thk_clad - thk_barr_bot02 - thk_barr_bot04 - thk_qw - thk_barr_top04 - thk_barr_top02
        etchdepth = 0.03
        
        
        
        self.design_params1 = self.make_sdf(x)
        design_variables1 = mp.MaterialGrid(mp.Vector3(Nx, 1), AIR, ALGAAS02,weights=self.design_params1.reshape(Nx,1), grid_type='U_MEAN',do_averaging=True,
                 beta=1)
        self.design_region1 = mpa.DesignRegion(
            design_variables1, volume=mp.Volume(center=mp.Vector3(x= 0,y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04 +thk_barr_top02/2), size=mp.Vector3(45, thk_barr_top02))
        )

        
        
        self.geometry = [
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad/2), material=ALGAAS08, size=mp.Vector3(Sx, thk_clad)
            ),  # clad
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02/2), material=ALGAAS02, size=mp.Vector3(Sx,thk_barr_bot02)
            ),  # bottom barrier04
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04/2), material=ALGAAS04, size=mp.Vector3(Sx, thk_barr_bot04)
            ),  # bottom barrier02
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw/2), material=GAAS, size=mp.Vector3(Sx,thk_qw)
            ),  # qw
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04/2), material=ALGAAS04, size=mp.Vector3(Sx,thk_barr_top04)
            ),  # top barrier04

            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04 +thk_barr_top02/2), material=ALGAAS02, size=mp.Vector3(Sx,thk_barr_top02)
            ),  # top barrier02
            mp.Block(
                center=design_region1.center, size=design_region1.size, material=design_variables1
            ),  # design region
            mp.Block(
                center=mp.Vector3(y=-Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw + thk_barr_top04 + thk_barr_top02 + thk_air/2), material=AIR, size=mp.Vector3(Sx,thk_air)
            ),
        ]
      
        
        
        self.sim = mp.Simulation(
            cell_size=self.cell_size,
            boundary_layers=pml_layers,
            geometry=self.geometry,
            sources=self.source,
            eps_averaging=True,
            resolution=resolution,
        )
    
        
        mode_chosen = 1
        self.TE_left = mpa.EigenmodeCoefficient(
            self.sim, mp.Volume(center=mp.Vector3(-Sx/2 + 2, -Sy/2 + thk_clad + thk_barr_bot02 + thk_barr_bot04 + thk_qw/2,0), size=mp.Vector3(y=2)),kpoint_func=lambda a,b: mp.Vector3(x=-1),mode=mode_chosen)
        self.ob_list = [self.TE_left]





        self.opt = mpa.OptimizationProblem(
            simulation=self.sim,
            objective_functions=self.J,
            objective_arguments=self.ob_list,
            design_regions=[self.design_region1],
            fcen=fcen,
            df=0,
            nf=1,
            remapping=True
        )

If I use the plot2D on the adjoint simulation object before calling the run, I see the adjoint source is correctly placed where the EigenModeCoeficient monitor was:

>#here I initialize a grating in the design region
>Sx = 45 + 30
>Sy = 3
>Nx = int(45e3)
>x1 = []
>for i in range(Nx):
    if (i%2 == 0):
        x1.append(0.1275)
    else:
        x1.append(0.1275)
>x1 = npa.array(x1)

>iterazione0 = iterazione(x1.flatten())
>iterazione0.opt.update_design([iterazione0.make_sdf(x1)])

>vals = []

>def get_slice(sim):
    vals.append(sim.get_array(center=mp.Vector3(0,0,0), size=mp.Vector3(Sx,7,0), component=mp.Ez))

>iterazione0.opt.forward_run()
>print(iterazione0.opt.f0)
>iterazione0.opt.prepare_adjoint_run()
>if iterazione0.opt.sim.k_point:
    iterazione0.opt.sim.change_k_point(-1*self.sim.k_point)
>iterazione0.opt.adjoint_design_region_monitors = []
>for ar in range(len(iterazione0.opt.objective_functions)):
            # Reset the fields
            iterazione0.opt.sim.restart_fields()
            iterazione0.opt.sim.clear_dft_monitors()

            # Update the sources
            iterazione0.opt.sim.change_sources(iterazione0.opt.adjoint_sources[ar])

            # register design dft fields
            iterazione0.opt.adjoint_design_region_monitors.append(utils.install_design_region_monitors(
            iterazione0.opt.sim, iterazione0.opt.design_regions, iterazione0.opt.frequencies, iterazione0.opt.decimation_factor
            ))
            iterazione0.opt.sim._evaluate_dft_objects()
            iterazione0.opt.sim.plot2D()  
            # Adjoint run
            #iterazione0.opt.sim.run(until=200)  
>

image

@yugeniom yugeniom changed the title adjoint solver crash with autograd numpy adjoint solver crash at run call Nov 13, 2022
@yugeniom yugeniom changed the title adjoint solver crash at run call adjoint simulation crash upon calling "run" Nov 13, 2022
@smartalecH
Copy link
Collaborator

I think this was fixed in #2251. Can you try this using master?

@oskooi it's probably a good time to push another release (ideally we'd reinstate the nightly builds)

@yugeniom
Copy link
Author

BTW I forgot to mention, the conda pymeep version which produces the error is linux mpich 1.24.0 with python 3.10 . Is master already part of the latest conda nightly build or should I compile it?

@oskooi
Copy link
Collaborator

oskooi commented Nov 14, 2022

The Conda package is for version 1.24 (July 2022). Currently, there is no Conda package for nightly builds.

We will do a version 1.25 tarball release this week which includes #2251. This will be followed shortly thereafter by an update to the Conda package.

@yugeniom
Copy link
Author

Hello there,
I just retested this on the new release 1.25 with #2251 fixed (conda package 1.25 mpi on python 3.11); unfortunately, the problem I described above apparently remains. The adjoint solver keeps crashing immediately upon "run" method call.

@yugeniom
Copy link
Author

Hello,
I was running this with Ipython previously.
If I run from shell with plain python, I get some more information on the crash:

python3: step_db.cpp:40: void meep::fields::step_db(meep::field_type): Assertion changed_materials' failed.
Aborted`

I hope this can help...

@smartalecH
Copy link
Collaborator

python3: step_db.cpp:40: void meep::fields::step_db(meep::field_type): Assertion changed_materials' failed.
Aborted

This means the simulation wasn't initialized properly. You have to be careful with how you set up the adjoint run "manually".

if you're trying to do this yourself, I suggest you carefully go through the adjoint_run() code and ensure you are setting things up properly. From your snippet above, it doesn't look right.

@yugeniom
Copy link
Author

I'll give a look, thanks. But in any case the problem appears when throwing the call on the Optimization object ("opt" here) with initial parameters, which starts a forward and an adjoint run under the hood from what I see. Already at this first adjoint run the problem appears. The manual setup you see below in the snippet was just debug trials on the problem...

@smartalecH
Copy link
Collaborator

The tests on master are passing, and all the tests use the __call__() method.

I would start with one of those tests, and gradually modify it to your current example. This way you can systematically identify the issue.

@smartalecH
Copy link
Collaborator

smartalecH commented Dec 19, 2022

Ok this is definitely a bug. I encountered it myself while running something similar to the above. There's an easy (temporary) fix though.

The problem is that the forward run, which is when the simulation object is generated, only allocates certain DFT field components because of the polarity in 2D. When you move to do an adjoint simulation, the mode source that's generated isn't always perfectly polarized (there's some noise) unless you specify a parity (e.g. mp.EVEN_Y + mp.ODD_Z). This forces an adjoint source to match the polarization of the forward source, and all internal checks should pass. All the tests use a parity, which is why this went undetected...

In theory, it shouldn't be a problem if we run an adjoint source with additional field components... it's not clear to me why changed_materials assertion is what's failing. So I'm worried something else is going on (another consequence of the problematic #1855 PR...) Either way, we need to look into it eventually.

@yugeniom
Copy link
Author

Hi @smartalecH ,
I don't know if this piece of additional information could help, while you debug the problem at the root: the apparent way around this which I found, was to "force_all_components = True" in the simulation class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants