Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disappearing cells in merged layout #589

Closed
lukasc-ubc opened this issue Jun 18, 2020 · 35 comments
Closed

Disappearing cells in merged layout #589

lukasc-ubc opened this issue Jun 18, 2020 · 35 comments
Assignees
Labels

Comments

@lukasc-ubc
Copy link

lukasc-ubc commented Jun 18, 2020

Dear @klayoutmatthias

I had the unfortunate experience of receiving a chip that was manufactured with missing thousands of components. I traced the problem to the merging of layouts I did using KLayout 0.26.5, specifically where (not always) KLayout removed Array Instances (and only kept one of the elements in the array).

Example GDS file:

Here are images of the original GDS: https://www.dropbox.com/s/njerj6fvpjl01n3/EBeam_LukasChrostowski_20200307_2340.gds?dl=1

and screenshot:

image

Here are the steps to reproduce the problem, which results in the screenshot below:

  • open the above GDS
  • In the Cell window, right click and Copy, then Deep.
  • create a new layout
  • Paste
  • then notice that one of the Instances disappears.

image

The disappearing isn't consistent. When I merged thousands of circuits, approximately 50% of them failed.

Also, I just downloaded the latest KLayout (LW-klayout-0.26.6-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg), and the problem persists there too. This is a pretty big problem...

I can also confirm that this wasn't an issue in KLayout 0.26.3 (the Python2 version), as I am fixing my layout using the older working version...

Thank you
Lukas

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 18, 2020

Hi Lukas,

I'm sorry to hear this.

But I'm even more sorry, I can't reproduce it neither on Windows nor Linux (0.26.6). I have created a small movie to show what I did: https://youtu.be/p39kwyZygYs - I followed your instructions without being able to reproduce it.

As these cells probably were PCells maybe the problem is related to this. You're sure you can reproduce the problem with the static cells of the file you uploaded?

Otherwise the problem might be related to Mac build (again?).

Matthias

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Jun 19, 2020

Dear @klayoutmatthias

Thank you for investigating. Too bad this problem only occurs on the Mac version.

I just tried on another Mac computer, running Catalina 10.15.5, using the same Anaconda3 build, LW-klayout-0.26.6-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg. I get the same problem.

The previous computer (MacBook Pro) was running the older Mac OSX, Mojave, 10.14.6.

Note that the GDS I uploaded above does not contain PCells, and is regular GDS.

I saved the GDS as a GDS TXT file, turning off the PCell option: https://www.dropbox.com/s/0siychmpzimvqji/EBeam_LukasChrostowski_20200307_2340.txt?dl=1

I get the same problem.

Digging further, after copying the layout, I saved it as a TXT GDS. https://www.dropbox.com/s/6a76659ih4ke8vw/EBeam_LukasChrostowski_20200307_2340_copied.txt?dl=1

Here is the (problematic) difference between the two files:

AREF 
SNAME TE1550_SubGC_neg31_oxide_WB90
COLROW 1 2 
XY 138000: 103000
138000: 103000
138000: 357000
ENDEL 

AREF 
SNAME TE1550_SubGC_neg31_oxide_WB90
COLROW 1 2 
XY 38000: 103000
38000: 103000
38000: 357000
ENDEL 
AREF 
SNAME TE1550_SubGC_neg31_oxide_WB90
COLROW 1 2 
XY 138000: 103000
138000: 103000
138000: 357000
ENDEL 
SREF 
SNAME TE1550_SubGC_neg31_oxide_WB90
XY 38000: 103000
ENDEL 

For some reasons it is dropping the AREF and turning it into an SREF.

@Kazzz-S: could you please check on your Mac computer if you get this problem?

I also turned on "noisy" debug level while doing the copy and paste operation:

Renamed layout from  to EBeam_LukasChrostowski_20200307_2340.gds
Created layout EBeam_LukasChrostowski_20200307_2340.gds
Loading: started
Loading file: /Users/lukasc/Downloads/EBeam_LukasChrostowski_20200307_2340.gds with technology: EBeam
File read: started
File read: 0 (user) 0 (sys) 0.002 (wall) 0.00M (mem)
Sorting: started
Updating relations: started
Updating relations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Topological sort: started
Topological sort: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Updating bounding boxes: started
Updating bounding boxes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting shapes: started
Sorting shapes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting instances: started
Sorting instances: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Memory usage per master category:
  Layout info    : 7735 (used) 8132 (reqd)
  Cell info      : 1828 (used) 1828 (reqd)
  Instances      : 608 (used) 608 (reqd)
  Shapes info    : 5556 (used) 7444 (reqd)
  Total          : 15727 (used) 18012 (reqd)
Loading: 0 (user) 0 (sys) 0.006 (wall) 0.00M (mem)
PaintEvent: started
Preparing to draw
Preparing to draw: started
Preparing to draw: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: started
Drawing decorations
Drawing decorations: started
Drawing decorations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Drawing layer: 
Drawing layer: started
Drawing layer: 0.01 (user) 0 (sys) 0.002 (wall) 0.00M (mem)
Cell cache: TE1550_SubGC_neg31_oxide_WB90 (1:r0 *0.00228106852 0,0) 77 x 49 -> 4 hits
Cell cache: ROUND_PATH$3 (1:r0 *0.00228106852 0,0) 53 x 157 -> 2 hits
Cell cache: ROUND_PATH$2 (1:r0 *0.00228106852 0,0) 133 x 293 -> 2 hits
Cell cache: Compact_YBranch_open (1:r180 *0.00228106852 0,0) 37 x 16 -> 2 hits
Cell cache: Compact_YBranch_open (1:r0 *0.00228106852 0,0) 37 x 16 -> 2 hits
Cell cache: EBeam_LukasChrostowski_v5 (2:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: _EBeam_LukasChrostowski_v5 (3:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (4:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (4:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Cell cache: EBeam_LukasChrostowski_v5 (2:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: _EBeam_LukasChrostowski_v5 (3:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (4:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing frames and guiding shapes
Drawing frames and guiding shapes: started
Drawing frames and guiding shapes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: 0.01 (user) 0 (sys) 0.004 (wall) 0.00M (mem)
PaintEvent: 0.01 (user) 0 (sys) 0.015 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.002 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0.01 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
Created layout L3
Sorting: started
Updating relations: started
Updating relations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Topological sort: started
Topological sort: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting instances: started
Sorting instances: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
PaintEvent: started
Preparing to draw
Preparing to draw: started
Preparing to draw: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: started
Drawing decorations
Drawing decorations: started
Drawing decorations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Drawing frames and guiding shapes
Drawing frames and guiding shapes: started
Drawing frames and guiding shapes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
PaintEvent: 0.01 (user) 0 (sys) 0.005 (wall) 0.00M (mem)
Sorting: started
Updating relations: started
Updating relations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Topological sort: started
Topological sort: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Updating bounding boxes: started
Updating bounding boxes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting shapes: started
Sorting shapes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting instances: started
Sorting instances: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting: started
Updating relations: started
Updating relations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Topological sort: started
Topological sort: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Updating bounding boxes: started
Updating bounding boxes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting shapes: started
Sorting shapes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Sorting instances: started
Sorting instances: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
Sorting: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
PaintEvent: started
Preparing to draw
Preparing to draw: started
Preparing to draw: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: started
Drawing decorations
Drawing decorations: started
Drawing decorations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (1:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (1:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (1:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing frames and guiding shapes
Drawing frames and guiding shapes: started
Drawing frames and guiding shapes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
PaintEvent: 0.01 (user) 0.01 (sys) 0.013 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.004 (wall) 0.00M (mem)
PaintEvent: started
Preparing to draw
Preparing to draw: started
Preparing to draw: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: started
Drawing decorations
Drawing decorations: started
Drawing decorations: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0.002 (wall) 0.00M (mem)
Cell cache: Compact_YBranch_open (1:r180 *0.00228106852 0,0) 37 x 16 -> 2 hits
Cell cache: Compact_YBranch_open (1:r0 *0.00228106852 0,0) 37 x 16 -> 2 hits
Cell cache: ROUND_PATH$2 (1:r0 *0.00228106852 0,0) 133 x 293 -> 2 hits
Cell cache: ROUND_PATH$3 (1:r0 *0.00228106852 0,0) 53 x 157 -> 2 hits
Cell cache: TE1550_SubGC_neg31_oxide_WB90 (1:r0 *0.00228106852 0,0) 77 x 49 -> 3 hits
Cell cache: EBeam_LukasChrostowski_v5 (2:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: _EBeam_LukasChrostowski_v5 (3:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (4:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (4:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing layer: 
Drawing layer: started
Drawing layer: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
Cell cache: EBeam_LukasChrostowski_v5 (2:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: _EBeam_LukasChrostowski_v5 (3:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Cell cache: EBeam_LukasChrostowski.gds_20200307_2340 (4:r0 *0.00228106852 0,0) 1383 x 938 -> 1 hits
Drawing frames and guiding shapes
Drawing frames and guiding shapes: started
Drawing frames and guiding shapes: 0 (user) 0 (sys) 0 (wall) 0.00M (mem)
Redrawing: 0 (user) 0.01 (sys) 0.003 (wall) 0.00M (mem)
PaintEvent: 0.01 (user) 0.01 (sys) 0.013 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.002 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.001 (wall) 0.00M (mem)
PaintEvent: started
PaintEvent: 0 (user) 0 (sys) 0.002 (wall) 0.00M (mem)

Nothing too obvious here, except for the lines that show Cell cache: TE1550_SubGC_neg31_oxide_WB90 with 4 hits, then down to 3 hits.

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Jun 19, 2020

@klayoutmatthias @Kazzz-S

My student (Hossam) and I found that the problem does not exist on the version HW-klayout-0.26.6-macOS-Catalina-1-qt5Brew-RsysPhb37.dmg (Catalina (10.15) HomeBrew-Python3 included - experimental).

The version that I found the problem in is LW-klayout-0.26.6-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg (Catalina (10.15) Anaconda3-environment based - experimental / needs the corresponding development environment).

I haven't tried the other Mac OSX builds, especially not the Python 2 ones, but I suspect that they work.

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jun 19, 2020

Dear @lukasc-ubc,

Here are the steps to reproduce the problem, which results in the screenshot below:

  • open the above GDS
  • In the Cell window, right click and Copy, then Deep.
  • create a new layout
  • Paste
  • then notice that one of the Instances disappears.

I have followed the above steps with EBeam_LukasChrostowski_20200307_2340.gds using the four packages below.

  1. HW-klayout-0.26.6-macOS-Catalina-1-qt5Brew-RsysPhb37.dmg (embedded Homebrew Python 3.x)
  2. LW-klayout-0.26.6-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg (shares Anaconda3 env.)
  3. LW-klayout-0.26.6-macOS-Catalina-1-qt5Brew-Rhb27Phb37.dmg (shares Homebrew env.)
  4. ST-klayout-0.26.6-macOS-Catalina-1-qt5MP-RsysPsys.dmg (OS-bundled Python 2.7)

I have repeated several times; however, unfortunately, I could not reproduce the problem in all four.
The reason could be that I've used the same machine on which I built those packages.

Regards,
Kazzz-S

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jun 19, 2020

Dear @lukasc-ubc,

I've tested the two packages below on virtual machines that are different from the real machine used for building the packages. The problem did not reproduce, either.

  • LW-klayout-0.26.6-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg (shares Anaconda3 env.)
  • LW-klayout-0.26.6-macOS-Catalina-1-qt5Brew-Rhb27Phb37.dmg (shares Homebrew env.)

Regards,
Kazzz-S

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 19, 2020

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 19, 2020

Update: I can reproduce the problem with the downloaded LW-klayout-0.26.6-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg

Next step is to produce my own binary with Anaconda.

BTW: I faced some installation issues because Anaconda by default will install into /Users/matthias/opt/anaconda3 and the KLayout binary expects it at /Applications/anaconda3. Creating a link solved the issue. This is for the records in case someone else faces this issue.

Matthias

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jun 19, 2020

Dear @klayoutmatthias,

Update: I can reproduce the problem with the downloaded LW-klayout-0.26.6-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg
Next step is to produce my own binary with Anaconda.

Thank you for testing with Anaconda3.

BTW: I faced some installation issues because Anaconda by default will install into /Users/matthias/opt/anaconda3 and the KLayout binary expects it at /Applications/anaconda3. Creating a link solved the issue. This is for the records in case someone else faces this issue.

Yes, this symbolic link is required to build the tool, too.
I'll update the associated document accordingly.

Regards,
Kazzz-S

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Jun 20, 2020

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 20, 2020

Sure .. AREFs have to work.

It's never a person causing an issue, those are always the conditions. I'm just facing too many combinations to stay in control of them. With Linux I can deploy a hive of docker images for testing, with Windows there are less variations. MacOS is dark matter to me.

I'm trying to debug now, but Anaconda is not providing debug versions of the Qt libraries ...

Matthias

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Jun 20, 2020

Thanks @klayoutmatthias, I am grateful for your support for Mac OSX. I can appreciate it is not easy especially now with the many versions such as LW, HW, Brew, Anaconda3.

fyi. in my edX course, we had about 100 design submissions. About 20% were Mac OSX, 80% Windows, and a couple using Linux.

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 20, 2020

Dear @lukasc-ubc and @Kazzz,

I got to the spot, but it's not easy.

It looks to be one the weird RTTI issues. This is the line which fails:

dbArray.h, 268++:

    basic_repository::iterator f = r->find ((ArrayBase *) &base);
    if (f != r->end ()) {
      return dynamic_cast <basic_array<Coord> *> (*f);    //  <--- "dynamic_cast" returns 0 but it must not
    } else {
      basic_array<Coord> *bb = base.clone ();
      bb->in_repository = true;
      r->insert (bb);
      return bb;
    }
  }

RTTI in shared objects is an old nemesis of mine.

C++ keeps type information in a thing called vtable to implement polymorphism. The vtable tells an object where to actually find the implementation of a virtual method. This way, derived classes can provide different implementations for a virtual method by employing a different version of the vtable.

As the vtable is not modified, it's usually not created at run time but statically at compile time. The C++ object now keeps a pointer to this static table and uses this pointer to find the function pointers it needs to use when a virtual method is called. This works nicely so far.

But the vtable is also used to identify a type. The value of the vtable pointer can basically be used as an efficient class identifier. So if you want to know whether an object is of a particular type you just need to compare the vtable pointer against the one of the class you look for. This is a very efficient operation and one of the implementation details that make C++ that efficient. This concept is called RTTI (run time type information).

dynamic_cast takes this concept somewhat further: by looking not only for a specific pointer but also for the vtable pointers from derived objects, dynamic_cast can decide whether a specific object is a subclass of the requested one and cast the object pointer properly or return 0 if this is not the case.

The whole thing works well in single-binary scenarios. In shared object scenarios however, the C++ runtime needs to play dirty tricks. The reason is that in these scenarios, the same class instantiated in different shared objects has different vtable pointers. So just comparing vtable pointers for the purpose of identifying classes isn't sufficient anymore. To maintain the efficiency, the compilers have to do some magic and this is when trouble starts. In our case this magic somehow fails and the dynamic_cast delivers crap.

I've had such cases in the past more than once. Without deep assembler-level debugging it's difficult to pinpoint the true cause, but one big issue is ABI incompatibility of shared objects. This means when mixing shared objects from different compilers there is a certain chance that the RTTI tables of the different compile units are not compatible and this problem happens. Specifically Qt libraries are susceptible to this issue and I'm kind of suspecting that taking Qt from Anaconda is causing the problem here. If Anaconda Qt is made with a different compiler or version, the ABI might not be compatible. This would explain why other combinations work.

For the solution I'm not sure what to propose. It may be possible to fix the particular line by somehow avoiding a dynamic_cast. But RTTI is that basic - we surely will see other issues at well. So right now, my conclusion is only to discourage use of the Anaconda Qt variant. Still we don't have a proof if there is a stable package.

I've tried to run the testsuite with the result of many fails (331 failing tests, 313 on my good variant with HomeBrew/system Ruby+Python). There seem to be some systematic issues in the hierarchical data processing domain which is responsible for the majority of the fails the problem we discuss here. I need to see if I can fix this (#493).

After this I can check whether the testsuite is susceptible to our issue here after I fixed the systematic issues. If that is, continuous integration with automatic test suite runs was a way to indicate the quality of a package - at least on the same level than Linux or Windows.

There is Travis offering MacOS support for continuous integration, but the usable (unlimited in terms of CPU) variant isn't for free and I don't have the resources to support another platform myself.

I need to say that debugging on MacOS is really painful, specifically with the VirtualBox VM. This only gives me a 640x480 screen and the keyboard and mouse mapping is driving me mad (XCode won't give me the backslash character for example, no chance). I don't want to continue this longer than necessary.

Matthias

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Jun 20, 2020

@klayoutmatthias

I would be happy to send you hardware (laptop with a normal screen), and pay for the Travis CI (assuming it is not prohibitive).

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 20, 2020

Thanks. I'd happily accept this proposal if you could include a device which adds a couple of hours to my days :)

With my current budget of 24 hours I simply don't have the time to add another platform to my support list.

I wonder why there isn't someone picking up the business of professionalization. The GPL actually allows everyone to sell the software. So far, no one would pay for something you get for free. But if the deal is to get stability, maintenance and support, many probably would. Every human being on earth could basically provide this service for KLayout. It's a business model RedHat turned into a billion dollar company. And there is an increasing business interest in open source EDA. So where are the entrepreneurs?

Regarding the technical issues (@Kazzz you might be interested):

  • I have created a branch "macos_fixes" based on 0.26
  • LVS examples fail on macOS when using deep mode #493 is fixed there along with a couple of nasty other issues (but unrelated to Lukas's disappearing cells issue)
  • With this fixes I can run the testsuite without a lot of noise generated by these problems

Here are my results:

  • On the Qt5HomeBrew/rsys/psys build I am down to a couple of compatibility issues and pymod fails (apparently due to dynamic linker security settings). But nothing serious so far. So I'd consider this combination a fairly stable one.
  • On the Qt5Anaconda/rana3/pana3 build the testsuite fails reporting exactly the issue of this bug (vanishing array) and a couple more. So it's fair to say that this combination isn't stable and a fix will require more changes than the RTTI issue found so far.

So my conclusions are:

  • QA based on the test suite appears to be feasible
  • At least the Qt5Anaconda/rana3/pana3 combination is not production stable

In order to run the test suite, one needs a build. The test suite binary is called "ut_runner". To run the test suite within the build I change to the build folder and use:

TESTSRC=.. TESTMP=testtmp DYLD_LIBRARY_PATH=$(pwd):$(pwd)/db_plugins ./ut_runner -c

I'll try to rectify the remaining compatibility issues and try on other combinations. My suspect still is Qt5Anaconda.

Best regards,

Matthias

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 20, 2020

@Kazzz ... many thanks by the way. The build4mac.py script is really useful!

Best regards,

Matthias

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jun 20, 2020

Dear @klayoutmatthias

Thanks a lot for the very educative and detailed descriptions.

As you pointed out, Qt5 with Anaconda3 is suspicious because it is a bit old (5.9.7); on the other hand, Homebrew=5.15.0 (recently updated); MacPorts=5.14.2.

Also, thanks for the new branch macos_fixes. I'm going to test it and get back to you if any.

Regards,
Kazzz-S

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jun 21, 2020

Dear @klayoutmatthias,

Regarding the technical issues (@Kazzz you might be interested):

I have created a branch "macos_fixes" based on 0.26
#493 is fixed there along with a couple of nasty other issues (but unrelated to Lukas's disappearing cells issue)
With this fixes I can run the testsuite without a lot of noise generated by these problems

I have built four different versions using the macos_fixes branch and run ut_runner.
The results are stored under Dropbox: 0.26.6-Tests/ut_runner.
Please refer to "ReadMe.txt" there.

I would like to know how to diagnose the outputs of ut_runner (better to open a new ticket?).

Regards,
Kazzz-S

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 23, 2020

Dear @Kazzz,

thanks for the logs.

I'm working on reducing the fails for MacOS.

  • Some tests fail because of sensitivity for architecture or STL details. Those will need update of golden data
  • Other tests fail because test data is required which I cannot make public for confidentiality reasons. I need to make these tests skipped instead of failing.
  • Finally the pymod tests fail on MacOS because of a runtime linker error I don't quite understand so far.

Before a diagnosis is possible, I want to bring the number of dummy fails down.

I have also performed some experiments for building against Anaconda Qt5. But without success. I basically can't figure out what compiler the Anaconda people are using. I think, a stable, Anaconda-based build should be possible if we would use the same compiler as they so.

Best regards,

Matthias

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 26, 2020

Dear @Kazzz,

I have updated the macos-fixes branch once again.

The pymod Tests are still failing. Apparently MacOS does not allow loading dylibs with relative rpaths into "restricted" binaries. In this case that is the Python interpreter. To me this looks like bad design - isn't the Python interpreter supposed to be enabled for loading module dylibs? For application isolation this might make sense, but Python should be open by design.

Anyway, I don't know how to fix them and the don't seem to be required anyway. I'd suggest to exclude them for now with this option:

ut_runner -x pymod ... 

I have fixed the netlisting fails which brought me to some deeper refactoring of this code while debugging some unexpected deltas. In the end I hope the netlist compare / LVS feature benefits from this activity too.

From your logs I see some tests with private data fail - they actually should be skipped, not failing. I assume there is a directory called "private" in the top level folder. If you remove this, these tests should pass too.

Best regards,

Matthias

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jun 27, 2020

Dear @klayoutmatthias,

Thank you so much for pushing a4c0235. I have built and tested it with ut_runner -x pymod ... .
The log files are available under maxos-fixes-a4c02357/ of Dropbox: 0.26.6-Tests/ut_runner.
Now the outputs are much cleaner :-).

I have fixed the netlisting fails which brought me to some deeper refactoring of this code while debugging some unexpected >deltas. In the end I hope the netlist compare / LVS feature benefits from this activity too.

I hope so, too. Even though my main work environment is Linux, I'm supporting Mac expecting this kind of potential improvement.

I'm also going to study how to run rt_runner with pymod.

With warm regards,
Kazzz-S

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 27, 2020

Dear @Kazzz,

Many thanks for the logs. The builds 1 + 2 look very good now.

I wonder why the Homebrew tests stall. That is not a particularly critical test. I'm using Qt5Bew/Rsys/Psys. The Homebrew build of yours - is it using Python and Ruby from Homebrew? Maybe I need to test this combination too.

Even the Anaconda3 log looks good. Wasn't this the combination which caused the initial problems? The rba:basic fail is ugly, but not critical - garbage collection is pretty unpredictable on some Ruby versions, hence cleanup does not always take place as the tests expect.

Thanks and best regards,

Matthias

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jun 27, 2020

Update: for me the combination of Qt5+Ruby+Python taken from Anaconda3 is still giving a lot a fails. I cannot recommend this combination, but I'm very curious what your "Anaconda3" combination is :)

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jun 27, 2020

Dear @klayoutmatthias,

Thank you for checking the logs.

I wonder why the Homebrew tests stall. That is not a particularly critical test. I'm using Qt5Bew/Rsys/Psys. The Homebrew > >build of yours - is it using Python and Ruby from Homebrew? Maybe I need to test this combination too.

Yes. As the long log file name below implies, all modules (Qt5, Ruby, Python) are from Homebrew.

  • QATest_a4c02357_2020_0627_1425__qt5Brew.build.macos-Catalina-release-Rhb27Phb37.log

More precisely,

(base) MacBookPro2{sekigawa}(1)$ brew info qt5
qt: stable 5.15.0 (bottled), HEAD [keg-only] <== This is the newest among all packages.

(base) MacBookPro2{sekigawa}(3)$ brew info ruby
ruby: stable 2.7.1 (bottled), HEAD [keg-only]

(base) MacBookPro2{sekigawa}(4)$ brew info python
python: stable 3.7.7 (bottled), HEAD

Regarding Anaconda3: QATest_a4c02357_2020_0627_1438__qt5Ana3.build.macos-Catalina-release-Rana3Pana3.log

(base) MacBookPro2{sekigawa}(1)$ conda info

     active environment : base
    active env location : /Applications/anaconda3 <== I have installed anaconda3 here; not under $HOME/opt/anaconda3/
            shell level : 1
       user config file : /Users/sekigawa/.condarc
 populated config files : 
          conda version : 4.8.3
    conda-build version : 3.18.11
         python version : 3.7.7.final.0
       virtual packages : __osx=10.15.5
       base environment : /Applications/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Applications/anaconda3/pkgs
                          /Users/sekigawa/.conda/pkgs
       envs directories : /Applications/anaconda3/envs
                          /Users/sekigawa/.conda/envs
               platform : osx-64
             user-agent : conda/4.8.3 requests/2.24.0 CPython/3.7.7 Darwin/19.5.0 OSX/10.15.5
                UID:GID : 501:20
             netrc file : None
           offline mode : False
----
(base) MacBookPro2{sekigawa}(2)$ conda list | grep qt
pyqt                      5.9.2            py37h655552a_2  
qt                        5.9.7                h468cd18_1  <=== This is the oldest among all packages.
qtawesome                 0.7.2                      py_0  
qtconsole                 4.7.5                      py_0  
qtpy                      1.9.0                      py_0  
sphinxcontrib-qthelp      1.0.3                      py_0 
----
(base) MacBookPro2{sekigawa}(3)$ conda list | grep ruby
ruby                      2.5.1                h7107397_0 <===
----
(base) MacBookPro2{sekigawa}(4)$ conda list | grep python
ipython                   7.15.0                   py37_0  
ipython_genutils          0.2.0                    py37_0  
msgpack-python            1.0.0            py37h04f5b5a_1  
python                    3.7.7           hc70fcce_0_cpython   <===
python-dateutil           2.8.1                      py_0  
python-jsonrpc-server     0.3.4                      py_0  
python-language-server    0.31.10                  py37_0  
python-libarchive-c       2.9                        py_0  
python.app                2                       py37_10 

Regarding MacPorts: QATest_a4c02357_2020_0627_1216__qt5MP.build.macos-Catalina-release-Rmp26Pmp37.log

(base) MacBookPro2{sekigawa}(1)$ port installed qt5
The following ports are currently installed:
  qt5 @5.14.2_0 (active) <===

(base) MacBookPro2{sekigawa}(2)$ port installed ruby*
The following ports are currently installed:
  ruby26 @2.6.6_0 (active) <===
  ruby_select @1.1_0 (active)

(base) MacBookPro2{sekigawa}(3)$ port installed python*
The following ports are currently installed:
  python2_select @0.0_3 (active)
  python3_select @0.0_1 (active)
  python27 @2.7.18_1 (active)
  python37 @3.7.7_0 (active) <=== 
  python38 @3.8.3_0 (active)
  python_select @0.3_8 (active)

Regards,
Kazzz-S

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jul 4, 2020

Hi @Kazzz,

My Anaconda installation is pretty similar:

$ conda info 

     active environment : base
    active env location : /Users/matthias/opt/anaconda3
            shell level : 1
       user config file : /Users/matthias/.condarc
 populated config files : /Users/matthias/.condarc
          conda version : 4.8.3
    conda-build version : 3.18.11
         python version : 3.7.6.final.0
       virtual packages : __osx=10.15.1
       base environment : /Users/matthias/opt/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/matthias/opt/anaconda3/pkgs
                          /Users/matthias/.conda/pkgs
       envs directories : /Users/matthias/opt/anaconda3/envs
                          /Users/matthias/.conda/envs
               platform : osx-64
             user-agent : conda/4.8.3 requests/2.22.0 CPython/3.7.6 Darwin/19.0.0 OSX/10.15.1
                UID:GID : 501:20
             netrc file : None
           offline mode : False

$ conda list | grep "qt "
pyqt                      5.9.2            py37h655552a_2  
qt                        5.9.7                h468cd18_1     (same as yours)

$ conda list | grep "ruby "
ruby                      2.5.1                h74228e1_0  

$ conda list | grep "python "
ipython                   7.12.0           py37h5ca1d4c_0  
msgpack-python            0.6.1            py37h04f5b5a_1  
python                    3.7.6                h359304d_2  

This combination is giving me the test fails that lead to this issue initially. As your log is not showing these fails, the hypothesis now is that the Anaconda version makes a difference. I'll try to upgrade.

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jul 4, 2020

Hi @klayoutmatthias,

Thanks for the info.
My real machine is 10.15.5, which is the latest with some security patches applied.
Your virtual machine looks 10.15.1. Such a difference could be one of the reasons.

Regards,
Kazzz-S

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jul 4, 2020

Dear @Kazzz,

In between I updated to 10.15.5 and conda too. I just built a4c0235 with Qt5Ana3, Python and Ruby from Anaconda. The result: the problem is gone! Also the tests basically pass. Qt is still 5.9.7, but Python has changed to 3.7.7 after "conda update"

So the root cause seems to be some incompatibility between system and Anaconda or between Anaconda packages themselves. The solution seems to be building and running on 10.15.5 with most recent Anaconda. Or use HomeBrew or MacPorts packages.

Another lesson learned is that the test suite results are important. I'll try to fix the remaining fail in rba:basic and after this, with excluding pymod tests, all tests should pass.

@lukas-ubc: My only remaining question is whether you had a MacOS on an older version than 10.15.5 and upgrading would basically fix this issue without recompiling. I can't roll back (I did not make a VM snapshot), so I can't (and don't want to) further debug this issue.

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jul 4, 2020

Update: I just got a memory corruption issue I need to debug :(

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jul 4, 2020

Dear @klayoutmatthias,

In between I updated to 10.15.5 and conda too. I just built a4c0235 with Qt5Ana3, Python and Ruby from Anaconda. The result: the problem is gone! Also the tests basically pass. Qt is still 5.9.7, but Python has changed to 3.7.7 after "conda update"

Great!

@lukas-ubc: My only remaining question is whether you had a MacOS on an older version than 10.15.5 and upgrading would basically fix this issue without recompiling. I can't roll back (I did not make a VM snapshot), so I can't (and don't want to) further debug this issue.

Me, too

Kazzz-S

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Jul 5, 2020

@klayoutmatthias

I had experienced the problem on Catalina 10.15.5.

     active environment : base
    active env location : /Users/lukasc/opt/anaconda3
            shell level : 1
       user config file : /Users/lukasc/.condarc
 populated config files : /Users/lukasc/.condarc
          conda version : 4.8.3
    conda-build version : 3.18.9
         python version : 3.7.4.final.0
       virtual packages : __osx=10.15.5
       base environment : /Users/lukasc/opt/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/lukasc/opt/anaconda3/pkgs
                          /Users/lukasc/.conda/pkgs
       envs directories : /Users/lukasc/opt/anaconda3/envs
                          /Users/lukasc/.conda/envs
               platform : osx-64
             user-agent : conda/4.8.3 requests/2.22.0 CPython/3.7.4 Darwin/19.5.0 OSX/10.15.5
                UID:GID : 501:20
             netrc file : None
           offline mode : False

$ conda list | grep "qt "
pyqt                      5.9.2            py37h655552a_2  
qt                        5.9.7                h468cd18_1  

$ conda list | grep "ruby "
ruby                      2.5.1                h74228e1_0  

$ conda list | grep "python "
ipython                   7.8.0            py37h39e3cac_0  
msgpack-python            0.6.1            py37h04f5b5a_1  
python                    3.7.4                h359304d_1  


@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jul 5, 2020

Dear @Kazzz,

I managed to debug and fix the memory corruption issue I mentioned in a length session. It's not related to the initial problem, but the fix should lead to more stability in the Ruby/Python scripting domain. The bug might also be responsible for the stuck test you reported.

With this fix, the test suite passes (except for the pymod tests which I excluded) for the Anaconda-only builds now. Both the macos-fixes and the 0.26 branches are updated accordingly.

I think there is nothing I can do any more with respect to the initial problem. But the MacOS enhancements in general have been worth the effort. I'd suggest to release the next minor version with these updates. If you don't do so already, I'd ask to do a full rebuild with 10.15.5. If Lukas' problem still persists after this on 10.15.5, my options are exhausted.

Best regards,

Matthias

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jul 5, 2020

Dear @klayoutmatthias,

Thank you very much for your efforts.
I have just pulled both the macos-fixes and the 0.26 branches.
I'm going to full-rebuild after merging some macOS-specific minor improvements.
Once ready, I'll let you know.

With warm regards,
Kazzz-S

@Kazzz-S
Copy link
Contributor

Kazzz-S commented Jul 8, 2020

Dear @klayoutmatthias,

Thanks a lot for the maintenance release version 0.26.7 (28bf525).
I have merged my changes (7d460cd) and built it for Catalina.
As usual, different DGMs are stored in my Dropbox

Srl.No. DMG name Remarks
1 ST-klayout-0.26.7-macOS-Catalina-1-qt5MP-RsysPsys.dmg shares OS-bundled Ruby&Python
2 LW-klayout-0.26.7-macOS-Catalina-1-qt5MP-Rmp26Pmp37.dmg share MacPorts env. New!!!
3 LW-klayout-0.26.7-macOS-Catalina-1-qt5Brew-Rhb27Phb37.dmg shares Homebrew env.
4 LW-klayout-0.26.7-macOS-Catalina-1-qt5Ana3-Rana3Pana3.dmg shares Aanconda3 env.

Under ut_runner-7d460cdc/ directory, I've stored the log files of the QATest.
In all four builds, the status is "All tests passed."
Once again, thank you very much for your effort.

Building for other OS will take time because the virtual machines are too slow :-(.
(cf. on my real machine with SSD, each build took about 30 minutes)
Once they are ready, I'll open a related ticket as usual.

Warm regards,
Kazzz-S

@klayoutmatthias
Copy link
Collaborator

klayoutmatthias commented Jul 12, 2020

Hi @Kazzz,

many thanks for your continuous support! I have published these new DMGs.

With my VirtualBox Catalina 10.15.5 and using the Anaconda 3 package I can no longer reproduce the issue. According to the logs, all builds pass the relevant tests, so I'm sure this release is a much better quality than the previous MacOS releases.

This is my system configuration and the image I used:

image

@lukas-ucb: I keep my fingers crossed that this release now will also work flawlessly on your machine.

Best regards,

Matthias

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Aug 1, 2020

@klayoutmatthias
I tried LW-0.26.7 Ana3, and I still have the same issue.
image

@lukasc-ubc
Copy link
Author

lukasc-ubc commented Aug 1, 2020

I have tested it in the HW build, and there is no problem there.

Question: is it possible to run pip to install additional packages by running pip from the terminal, for the versions that have Python build-in? The way I presently do it in the HW version is:
import pip pip.main(['install','scipy')
which is not recommended pypa/pip#5599. but it works.

Specifically, could you include the pip binary in the HW package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants