Python Execution Speed & Debugging Project
Python is a core language supported by QuantConnect, yet operates at roughly 1/10th the speed of its C# counterpart. We would like to bring these two languages to near parity with an in-depth analysis of each piece of the speed delay and point by point improvement.
LEAN is a C# algorithmic trading engine which is able to import python algorithms thanks to our fork of PythonNet. PythonNet translates the C# classes into the Python domain and carries across all the financial data needed. Although a single interaction with PythonNet is relatively fast; triggering this event millions of times is slow.
Over 6 months the LEAN core team has investigated various technologies to achieve better debugging and better execution speeds. These designs included:
- Writing LEAN as a C++ library, imported to a C# and Python runner.
- Re-writing LEAN in CPython.
- Improving the PythonNet implementation.
The core requirements were: #1 Debugging. #2 Best Possible Speeds. Over time we tested every technology implementing hello world examples and benchmarking them. We dove into the Python-C API and discovered that PythonNet was invoking the same calls we would need to call for a native C++ implementation and that a CPython implementation was much slower compared to invoking the C-API directly.
Given these lessons: we have narrowed our focus to improve and optimize the LEAN-PythonNet interaction. The nice side effect of this is that we don't require a massive code rewrite of LEAN, it is a fairly low-risk way to make LEAN and Python better. There are three core concepts we can focus on for the most immediate results, similar to recycling! Reduce-Reuse-Recycle! :)
Find or create python-native translations of C# types to improve the conversion speed. E.g. Instead of using a C# Decimal in Python; cast it to a Python double. Instead of translating the CoarseFundamental object millions of times, create an equivalent in python and transmit the simple type to the algorithm.
Reuse "border crossings"; points where the C# objects are translated into python ones; or the Python method is invoked. These points can often be optimized to reuse the object instead of re-creating it each time. The expensive reflection process of obtaining a reference to OnData, for example, can easily be cached and reused forever. When the AlgorithmManager invoked "algorithm.OnData" there was a heavy reflection process being started under the surface which we should avoid (e.g. Reuse method )
Reduce border crossings by re-engineering class/eventing call architecture. The python method invocation overhead is currently being paid millions of times. There could be substantial improvements from re-architecting parts of LEAN to process some data in a batch form, potentially sending over larger "chunks" of data. This would reduce the number of border crossings. (e.g. Reduce crossings )
Install LEAN with Python
- Install LEAN.
- Get LEAN working with Python.
- Get a set of sample data provided through QuantConnect website.
- Draft a test algorithm exploiting the QC API in Python.
- Benchmark your performance - always record 3 values.
Install PythonNet Fork
Find Improvements - Make PR
- Look deeply into how LEAN works,
- Analyze the calls to PythonNet, find hot spots in your favorite optimization tool.
- Reduce python calls as much as possible, reuse objects.
- Test your fix. Submit a PR to LEAN or QuantConnect/PythonNet
Pull requests much pass the regression tests. Before merging we will also test them on the full cloud regression test to make sure its not a breaking change.