New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache scope #129
Cache scope #129
Conversation
Couple new features added this morning: Functions wrapped with |
(All UrbanSim tests are passing, the failure is because sanfran_urbansim uses table_source. That's fixed by UDST/sanfran_urbansim#7.) |
Can the 'iter' option be extended to use specific simulation years? For example, something like:
All simulation years not specified would be cached. This would be nice for models that run at different time scales rather than having to put the logic into the model itself. |
@bridwell, I don't think we can support a feature like that. It implies there's a single, easy to identify variable that the simulation is looping over so that there can be a comparison. That variable exists now (as To accomplish what you want I'd make a "model" like this: @sim.table(cache=True, scope='forever')
def my_table():
# do stuff
return df
@sim.model()
def clear_my_table_cache(year, my_table):
if year in [2010, 2020, 2030]:
my_table.clear_cached() |
30c9818
to
3b37158
Compare
In anticipation of fully implementing cache scope controls this removes the table_source decorator and makes a namedtuple container object for cached items so that items can be cached along with the information about their cache scope.
Table, column, and injectable decorators now take a cache_scope argument that can be 'step', 'iter', or 'forever'. Thing with a scope of 'step' are cleared after each model, things with a scope of 'iter' are cleared at the end of each model year, and things with scope 'forever' are never automatically cleared.
It occurred to me that one of the benefits of table_source was that it converted the registered thing to a DataFrameWrapper that allows you to access the underlying DataFrame. To keep the same benefit now that table_source is gone I've converted the simulation framework so that the injection and get_table always result in a wrapped DataFrame. This could result in a little more computation while evaluating dependencies, but caching still works the same so things that are cached will only be built once (or until they are removed from the cache).
There are situations when you need to directly access the DataFrame underlying a DataFrameWrapper, especially when trying to shadow that column with a computed column. Instead of forcing people to use the semi-private ._frame now it's .local.
3b37158
to
dbb2cd6
Compare
@daradib The |
The |
Yeah the idea is now that users should never come in contact with |
A couple of the tests in test_table_copy were testing the behaviour of TableFuncWrapper, but those don't enter user space anymore, get_table always returns a DataFrameWrapper.
get_table now returns a DataFrameWrapper, which will not be re-evaluated with __getitem__. Instead, call get_table twice. to_frame() always returns copies, so avoid using it when testing other behavior. Rename table_source since it has been replaced by cache=True. Use a for loop to reuse code when creating tables of computed columns.
Looks good. I added a commit cleaning up |
@fscottfoti, any objection to merging this and making it part of the next release? |
Set scope on cached data The user can set a scope on cached data so that the simulation framework can take care of expiring cached items on a schedule, saving the user from having to do manual cache management. Removes the "table source", which is replaced by cache mechanics.
Sorry, I missed this comment. So this is not a reverse compatible change right? I have to update bayarea_urbansim when I update urbansim? |
Yup, now this is breaking. Should be easy enough to fix. |
I ended up making two releases today. Version 1.2 has the accounts and supply/demand stuff, but not this. Version 1.3 has this. So you can pick one of those releases to work with for now. |
sounds good |
This adds a
cache_scope
option to thetable
,column
, andinjectable
decorators that allow results to be automatically cleared from the cache at set points. The options are'step'
,'iter'
, and'forever'
. Items with scope'step'
are cleared after every model is run, items with scope'iter'
are cleared at the end of every model year, and things with scope'forever'
are never automatically cleared.This removes the need for
table_source
so that has been removed. The equivalent is now to usecache_scope='forever'
.I waffled a little bit on using 'forever' or 'simulation' for the long-lived scope. With a scope of 'simulation' it'd make sense to clear it at the end of the simulation, with 'forever' it'd be best to never clear it. I'm up for either one, anyone have an opinion?