New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow simulation in PySD #374
Comments
Hi @jmf119 We have worked in trying to improve a lot the performance of PySD. Nevertheless, it could still be far away from Vensim, especially for models with a lot of variables and subscripts. Some suggestions would be (they can be applied when calling the
Note that you can also run parts of your model, if you split it by modules you can just run some views: You can also run the model until a given time and then save all the states to run the simulation from that time on (you could save time if your model integrates a historic frame that is always the same): We have improved a lot the PySD performance in the last two years, but we still have a lot of work to do. We are planning about moving the I would also like to invite you to help in the development of performance improvements If you are available. Any contribution is welcome! |
Thank you for the suggestions Eneko, these do help improve performance speed.
Excited for what is in the works and appreciate all your helpful advice.
Honored to be invited to contribute at some point in the future.
… On Oct 2, 2022, at 1:42 PM, Eneko Martin-Martinez ***@***.***> wrote:
Hi @jmf119 <https://github.com/jmf119>
We have worked in trying to improve a lot the performance of PySD. Nevertheless, it could still be far away from Vensim, especially for models with a lot of variables and subscripts.
Some suggestions would be (they can be applied when calling the run method <https://pysd.readthedocs.io/en/master/python_api/model_class.html#pysd.py_backend.model.Model.run>):
Use the return_columns argument to select only the variables you need.
If you don't need to save variables for each step, make sure that you have a saveper value greater than time_step or that you select the returning values with return_timestamps.
Note that you can also run parts of your model, if you split it by modules you can just run some views:
https://pysd.readthedocs.io/en/master/advanced_usage.html#selecting-and-running-a-submodel <https://pysd.readthedocs.io/en/master/advanced_usage.html#selecting-and-running-a-submodel>
You can also run the model until a given time and then save all the states to run the simulation from that time on (you could save time if your model integrates a historic frame that is always the same):
https://pysd.readthedocs.io/en/master/advanced_usage.html#starting-simulations-from-an-end-state-of-another-simulation <https://pysd.readthedocs.io/en/master/advanced_usage.html#starting-simulations-from-an-end-state-of-another-simulation>
We have improved a lot the PySD performance in the last two years, but we still have a lot of work to do. We are planning about moving the xarray.DataArray backend for subscripted variables to numpy.ndarray backend, see #373 <#373>. And we have several other issues related to performance improvements https://github.com/SDXorg/pysd/issues?q=is%3Aissue+is%3Aopen+label%3Aperformance <https://github.com/SDXorg/pysd/issues?q=is%3Aissue+is%3Aopen+label%3Aperformance>
I would also like to invite you to help in the development of performance improvements If you are available. Any contribution is welcome!
—
Reply to this email directly, view it on GitHub <#374 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVHODUS2NSWCZ5FZG43ZHV3WBHCOVANCNFSM6AAAAAAQ24JDUA>.
You are receiving this because you were mentioned.
|
This is something I've been fighting with for most of the last week. I added a couple dozen new components to a model that used to run in 5-6s, after adding the components the model now takes ~38s to run. The underlying performance issue is caused by models that have a large number of components or a large number of time steps. The way The existing approach (as of pysd==3.12.0, pandas==2.1.1, and python==3.11.6) generates a huge number of fragmentation/performance warnings from pandas:
When I wrap my with cProfile.Profile(builtins=False) as pr:
df = model.run()
stats = pstats.Stats(pr)
stats.sort_stats('cumulative')
stats.print_stats(0.1)
I ended up getting a 30x speed-up by monkey patching import pysd
import pandas as pd
from collections import defaultdict
class FastDataFrameHandler(pysd.py_backend.output.OutputHandlerInterface):
def process_output(self, out_file):
if out_file is None:
return self
def initialize(self, model):
self.length = 0
self.ds = defaultdict(list)
def update(self, model):
for key in self.capture_elements_step:
component = getattr(model.components, key)
self.ds[component.name].append(model.time.round() if key=='time' else component())
self.length += 1
def postprocess(self, **kwargs):
df = pd.DataFrame.from_dict(self.ds)
df.set_index('Time', inplace=True)
return df
def add_run_elements(self, model):
for key in self.capture_elements_run:
component = getattr(model.components, key)
self.ds[component.name] = [component()] * self.length
def ModelOutput_init(self, *args, **kwargs):
self.handler = FastDataFrameHandler()
# Load/setup model here
with patch('pysd.py_backend.output.ModelOutput.__init__', ModelOutput_init):
df = model.run() |
Thanks a lot, @easyas314159, for checking that! |
Hi @easyas314159 are you working on the implementation? Otherwise I could implement it on your behalf. Thanks a lot! |
I already added the improvements in the branch |
@enekomartinmartinez Thanks for taking this, I had grant funding non-sense land in my lap shortly after my initial post and didn't have a chance to post an update. |
How do I speed up simulation speed in PySD? The model I have translated directly from Vensim takes about 10 minutes to run once using PySD but simulates in milliseconds in Vensim...what are some tips for improving run performance time in PySD?
The text was updated successfully, but these errors were encountered: