New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache certain computations using joblib #124

Closed
thesamovar opened this Issue Sep 20, 2013 · 6 comments

Comments

Projects
None yet
2 participants
@thesamovar
Member

thesamovar commented Sep 20, 2013

Some of the code generation stuff can end up actually taking a while. Solution: cache it!

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Mar 13, 2014

Member

I'm wondering if we could formalise the conditions which require recomputation. Like that, we can rapidly check if a recomputation is required. For example, a Resetter CodeObject depends on the reset code string, variables and namespace. If any of those have changed, a regeneration of the CodeObject is required, otherwise not.

We could either do this joblib style by having a hashing mechanism for all the things it depends on, or by tracking for each object type whether it has changed or not. The former is safer and more general, the latter is probably quicker. My guess is that the hashing would be sufficiently quick though, because we wouldn't actually have to hash the values of arrays ever I think.

The general mechanisms here might end up being related to #92.

Member

thesamovar commented Mar 13, 2014

I'm wondering if we could formalise the conditions which require recomputation. Like that, we can rapidly check if a recomputation is required. For example, a Resetter CodeObject depends on the reset code string, variables and namespace. If any of those have changed, a regeneration of the CodeObject is required, otherwise not.

We could either do this joblib style by having a hashing mechanism for all the things it depends on, or by tracking for each object type whether it has changed or not. The former is safer and more general, the latter is probably quicker. My guess is that the hashing would be sufficiently quick though, because we wouldn't actually have to hash the values of arrays ever I think.

The general mechanisms here might end up being related to #92.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Mar 13, 2014

Member

For example, a Resetter CodeObject depends on the reset code string, variables and namespace. If any of those have changed, a regeneration of the CodeObject is required, otherwise not.

This should be true for any kind of CodeObject I think. In fact, if we re-do the namespace resolution every time (and I don't really see a way around it without coding for a lot of special cases) than it is even simpler: if the abstract code + the abstract namespace (i.e. Variable objects) are unchanged, then we can re-use the CodeObject. For most of the abstract namespace, simple hashing in the sense of using the object's id should be fine (it's ok if an array changed, only the reference to the array and the meta-information has to be the same). The only exception are Constant objects which will be generated on the fly for external variables -- for them the hashing has to be about the actual value. The way we currently use Variable objects, they are immutable, whenever we change something (e.g. when "translating" subexpressions for use in a different context), we create a new object. I wonder whether we should more strongly enforce the immutability though ("private" attributes + getter methods/properties).

Finally, I think we should do at least some coarse benchmarking: if for example the namespace resolution takes most of the time and codegen+template filling is quick, then not much is gained by the approach described above (since the lengthy weave compilation is already cached, anyway).

Member

mstimberg commented Mar 13, 2014

For example, a Resetter CodeObject depends on the reset code string, variables and namespace. If any of those have changed, a regeneration of the CodeObject is required, otherwise not.

This should be true for any kind of CodeObject I think. In fact, if we re-do the namespace resolution every time (and I don't really see a way around it without coding for a lot of special cases) than it is even simpler: if the abstract code + the abstract namespace (i.e. Variable objects) are unchanged, then we can re-use the CodeObject. For most of the abstract namespace, simple hashing in the sense of using the object's id should be fine (it's ok if an array changed, only the reference to the array and the meta-information has to be the same). The only exception are Constant objects which will be generated on the fly for external variables -- for them the hashing has to be about the actual value. The way we currently use Variable objects, they are immutable, whenever we change something (e.g. when "translating" subexpressions for use in a different context), we create a new object. I wonder whether we should more strongly enforce the immutability though ("private" attributes + getter methods/properties).

Finally, I think we should do at least some coarse benchmarking: if for example the namespace resolution takes most of the time and codegen+template filling is quick, then not much is gained by the approach described above (since the lengthy weave compilation is already cached, anyway).

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Mar 13, 2014

Member

One of the key situations is when you might do something like:

for ...:
   G.I = str_expr_that_depends_on_loop_variable
   ...

This ought to be fast I think, as I expect it to be a relatively common use case.

Member

thesamovar commented Mar 13, 2014

One of the key situations is when you might do something like:

for ...:
   G.I = str_expr_that_depends_on_loop_variable
   ...

This ought to be fast I think, as I expect it to be a relatively common use case.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Nov 25, 2015

Member

Could also use memoize (in Python 3 there is functools.lru_cache). See also the way sympy uses it: https://github.com/sympy/sympy/blob/master/sympy/core/cache.py

Member

thesamovar commented Nov 25, 2015

Could also use memoize (in Python 3 there is functools.lru_cache). See also the way sympy uses it: https://github.com/sympy/sympy/blob/master/sympy/core/cache.py

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Dec 3, 2015

Member

I think with all the improvements to the preparation time merged recently, maybe we should postpone this to post-2.0? A quick and dirty solution might do more harm then good...

Member

mstimberg commented Dec 3, 2015

I think with all the improvements to the preparation time merged recently, maybe we should postpone this to post-2.0? A quick and dirty solution might do more harm then good...

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Dec 3, 2015

Member

Yep!

Member

thesamovar commented Dec 3, 2015

Yep!

@thesamovar thesamovar removed this from the 2.0 milestone Dec 3, 2015

mstimberg added a commit that referenced this issue Apr 5, 2017

@mstimberg mstimberg closed this in 011820d Sep 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment