Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve the ironPython interpreter's re-execution performance #6879
The aim of this PR is to improve the ironPython interpreter's performance through the use of a static container that acts as a cache and stores a copy of the previously executed and compiled script - if the script doesn't change between runs, the previously compiled copy will be called from the cache and reused.
This is a very common scenario when you for example combine a custom node that contains a python script with a List.Map or one of the List.Combine nodes; or when the node has defined inputs and can be used straight - away without any function application nodes.
Currently, the script is rebuilt every time the ipy method is called and that penalty quickly adds up for large data sets. The net benefit will greatly depend upon the code complexity and the size and structure of the data set. Here are some of my test results:
two flat lists, the node has undefined inputs but is programmed to work on a list-wide level and thus the ipy interpreter is called only twice. Tho code is relatively simple and we should expect virtually identical execution times between the two:
Edit: Case 2 was being bottle-necked by the Revit API. It was making too many element collector calls that were slowing things down. I refactored the code and now the difference in performance is clearer.
one flat list of ~100 items fed through a list.map node. The ipy interpreter is called once per item for a total of hundred executions. The code is slightly more complex than the first example. With the change, we manage to halve execution time:
two flat lists, one with 10 views and the other with 100 lines. The node is defined to work on a singleton
My take from this is that python scripts set up to work on a singleton level will get the biggest uplift in performance. This would also allow people to code simpler nodes and rely on Dynamo's native node lacing.
@aparajit-pratap do you think storing the compiled code like that will affect anything negatively? Do you have any other ideas on how to improve performance ?
@dimven disk io is a big problem for python - I think your reuse of the script engine caches the imported modules. This is a big savings in time as modules like the entire revit api and protogeometry do not need to be reloaded from disk every time the python node is called. This is why your third test case has big improvements.
I think we'll need to carefully consider stateful bugs this could cause.
@mjkkirschner , thanks for looking into this. I revised the code in case 2 and now the difference is more apparent. I suspect that the additional compile step at line 50 helps. I had to also cache the code as a string because after the compile step, I could no longer extract it from the cached engine.
@dimven this is a clever fix indeed! Off the top of my head I can't think of any cases where this could lead to bugs due to maintaining a state. It would be good to run the test suite on your branch to begin with to ensure there are no regressions. Once we are confident of the changes it would be good to merge this in. Will keep you posted. Thanks again for the fix and keep them coming :)