Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdstk - 2 #165

Closed
wants to merge 9 commits into from
Closed

Gdstk - 2 #165

wants to merge 9 commits into from

Conversation

tvt173
Copy link

@tvt173 tvt173 commented Sep 20, 2022

@amccaugh , this PR builds on #163 and #164.

I got practically all the tests running, though not passing regressions. However, I currently hack it into working by disabling garbage collection in the tests. There appears to be an issue with gdstk's c++ based objects getting garbage collected by python prematurely. I haven't pinned down where or why this is happening though. I'm suspicious of the types which are getting inherited by phidl objects though.

@joamatab
@heitzmann
@alexsludds
@jenshnielsen
@Dominik-Vogel
@flaport
@SkandanC

@amccaugh
Copy link
Owner

Thanks for taking a look at this @tvt173 ! Just as a warning, I'm not likely to have a lot of time to work on this myself in the near future. That being said, I am very interested in this overall. Two questions immediately jump to mind that will tell whether this is a good investment of time or not:

  • Are the functions sufficiently similar + bug-free to prevent major errors from occurring in most (>99%) existing phidl layouts? Nothing more frustrating than when one of the packages you use gets an upgrade that breaks a bunch of your code.
  • Are the performance gains sufficiently large to warrant dealing with potential bugs? When using gdstk directly (as in, writing C++ code) I could easily believe they are. However, when you add on the overhead of Python->C function calling, what kind of performance increase could you expect for a real layout? See for example the discussion here, showing a fairly large overhead for a simple operation (~60ns of Python->C->Python overhead just for accessing a single variable)

What I'd like to focus on with these questions is the real layout aspect, and avoiding unrealistic examples/benchmarks. For instance, it won't matter to most users if you can generate 100k ellipses much quicker but that's swamped by slowdowns in useful functions like connect(). I'm guessing there should be decent performance increases -- especially with bounding box computations which are used a LOT in phidl -- but I'd really like to see examples before anyone spends a lot of time on this

@tvt173
Copy link
Author

tvt173 commented Sep 21, 2022

hi @amccaugh, no problem-- most of the thanks should go to @alexsludds for getting this started. i agree with your concerns, and I don't yet have good answers for them... i was interested in getting this to work though so that we could start to benchmark. i can say for sure that I see huge speedups in time spent reading/writing GDS files and with post-processing operations when using gdstk vs gdspy. so if you have a design which spends a significant amount of time importing other gds files or performing booleans, etc., I think you could safely expect a significant speedup ( i rewrote a postprocessing script in gdstk and it took less time to run the whole thing than to just open the input gds in gdspy!). it's probably hard to make blanket generalizations for everything else though.

i'd argue one more point though that gdspy is deprecated and gdstk is still alive and active. i'd be a bit concerned making all of my designs with a deprecated tool at the core...

anyways i'm a bit concerned with the memory access errors we were seeing in the tests (not sure if @heitzmann has seen this before or has any comment)-- as mentioned above, i had to disable garbage collecting completely to get them to run without getting "fatal errors". the interface between the c++ and python code is definitely not my forte...

i think it would be wise though to move from inheritance of these base gdspy/gdstk types to something like an adapter pattern so we don't have such a tight coupling between the libraries. would make the interface cleaner and easier to migrate like this if ever necessary again in the future!

@alexsludds
Copy link

Thanks @tvt173.
I'm curious to try some benchmarking. This is basic benchmarking stuff to do (draw 10,000 triangles etc), but I wanted to try something more realistic (full reticle tapeout of test structures or something).

Does anyone have an example reticle layout lying around that they made using phidl? It would be great to benchmark both version (gdspy vs gdstk) side by side for both speed and accuracy.

Best,
-Alex

@tvt173
Copy link
Author

tvt173 commented Sep 21, 2022

don't know about a reticle, but gdsfactory has a fairly large library of cells we can benchmark if we can get this branch of phidl to also work with gdsfactory
@joamatab

@amccaugh
Copy link
Owner

That's good to hear about the reading/writing GDSII files, I could definitely imagine that being a whole lot faster. What's really interesting to me though is the possibility of speeding up boolean operations -- I'm a little surprised actually, considering they're done in Clipper which is already C++. I wonder where the performance gain coming from, is it just that Python is pretty slow about recursively collecting all the polygons?

And w.r.t. reticles, I have some die designs which might serve as adequate benchmarks, I'll need to think about which ones might be suitable

@tvt173
Copy link
Author

tvt173 commented Sep 22, 2022

was also a bit surprised to see that, but it seems to be true. i actually found it to be much faster in one instance to copy the polygons from gdspy to gdstk, do the boolean, and copy back to gdspy than it was to just do the boolean in gdspy. if you look at the code a bit though, polygons in gdstk are fully defined in c++ and then the booleans directly operate on them (still in c++), which is quite a bit less convoluted than what was happening in gdspy. so i suppose it's reasonable to expect

@heitzmann
Copy link

anyways i'm a bit concerned with the memory access errors we were seeing in the tests (not sure if @heitzmann has seen this before or has any comment)-- as mentioned above, i had to disable garbage collecting completely to get them to run without getting "fatal errors". the interface between the c++ and python code is definitely not my forte...

There could be issues in the gdstk backend I'd have to look into more carefully (reference counting can be trick with so many references), although I haven't caught these sorts of issues with gdstk for a while now.

The problem is that lately I haven't had much free time to look into gdstk development, so I can't promise when I'll be able to follow up on this. In any case, having a concise example where the segfault happens would help a lot in getting started.

@joamatab
Copy link
Contributor

Hi Lucas,

my gdstk branch for phidl causes many segmentation faults when running pytest

see https://github.com/joamatab/phidl/tree/gdstk

@tvt173
Copy link
Author

tvt173 commented Sep 25, 2022

Also, in the current branch, you will get errors if you comment out instances of gc.disable() and run the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants