Cache scope #129

jiffyclub · 2014-11-05T00:08:18Z

This adds a cache_scope option to the table, column, and injectable decorators that allow results to be automatically cleared from the cache at set points. The options are 'step', 'iter', and 'forever'. Items with scope 'step' are cleared after every model is run, items with scope 'iter' are cleared at the end of every model year, and things with scope 'forever' are never automatically cleared.

This removes the need for table_source so that has been removed. The equivalent is now to use cache_scope='forever'.

I waffled a little bit on using 'forever' or 'simulation' for the long-lived scope. With a scope of 'simulation' it'd make sense to clear it at the end of the simulation, with 'forever' it'd be best to never clear it. I'm up for either one, anyone have an opinion?

jiffyclub · 2014-11-05T20:44:57Z

Couple new features added this morning:

Functions wrapped with sim.table() will now be evaluated before injection, and a DataFrameWrapper will be injected, so that other functions can directly access the underlying DataFrame if necessary. And on that topic the attribute name of the underlying DataFrame has been changed from ._frame to .local so that accessing it doesn't look so sketchy.

jiffyclub · 2014-11-05T22:04:17Z

(All UrbanSim tests are passing, the failure is because sanfran_urbansim uses table_source. That's fixed by UDST/sanfran_urbansim#7.)

bridwell · 2014-11-05T22:34:16Z

Can the 'iter' option be extended to use specific simulation years? For example, something like:

@sim.column(iter=[2010, 2020, 2030])

All simulation years not specified would be cached. This would be nice for models that run at different time scales rather than having to put the logic into the model itself.

jiffyclub · 2014-11-05T22:48:24Z

@bridwell, I don't think we can support a feature like that. It implies there's a single, easy to identify variable that the simulation is looping over so that there can be a comparison. That variable exists now (as year), but that's something we're planning to change soon to generalize the simulation framework to uses beyond UrbanSim.

To accomplish what you want I'd make a "model" like this:

@sim.table(cache=True, scope='forever')
def my_table():
    # do stuff
    return df

@sim.model()
def clear_my_table_cache(year, my_table):
    if year in [2010, 2020, 2030]:
        my_table.clear_cached()

In anticipation of fully implementing cache scope controls this removes the table_source decorator and makes a namedtuple container object for cached items so that items can be cached along with the information about their cache scope.

Table, column, and injectable decorators now take a cache_scope argument that can be 'step', 'iter', or 'forever'. Thing with a scope of 'step' are cleared after each model, things with a scope of 'iter' are cleared at the end of each model year, and things with scope 'forever' are never automatically cleared.

It occurred to me that one of the benefits of table_source was that it converted the registered thing to a DataFrameWrapper that allows you to access the underlying DataFrame. To keep the same benefit now that table_source is gone I've converted the simulation framework so that the injection and get_table always result in a wrapped DataFrame. This could result in a little more computation while evaluating dependencies, but caching still works the same so things that are cached will only be built once (or until they are removed from the cache).

There are situations when you need to directly access the DataFrame underlying a DataFrameWrapper, especially when trying to shadow that column with a computed column. Instead of forcing people to use the semi-private ._frame now it's .local.

[ci-skip]

jiffyclub · 2014-12-02T19:54:09Z

@daradib The test_table_copy test is failing and it looks to me like the test setup is wrong, but I wanted to check with you. In particular this table is set up with copy_col=False but with the name test_funcd_copied2. Then it fails on test assert table['a'] is not table['a'] because copy_col is False. Should that function be named uncopied or have copy_col=True?

daradib · 2014-12-02T20:19:17Z

The test_funcd_copied2 function (should be better named, sorry about that) is supposed to test that we still have a copy if the input table (test_frame_copied) evaluates as a copy, even if the function doesn't directly copy the input table. If I'm understanding correctly, this PR evaluates the input table before injection so you'll get the same copy as input each time, and the test fails. If that's now intended behavior, maybe we can get rid of the function?

jiffyclub · 2014-12-02T20:34:42Z

Yeah the idea is now that users should never come in contact with TableFuncWrapper, injection and get_table will always hand you a DataFrameWrapper so that things are more predictable. I'll see about cleaning this up in accordance.

A couple of the tests in test_table_copy were testing the behaviour of TableFuncWrapper, but those don't enter user space anymore, get_table always returns a DataFrameWrapper.

jiffyclub · 2014-12-02T20:44:56Z

@daradib Can you confirm 5f34076 looks like the right thing to do?

get_table now returns a DataFrameWrapper, which will not be re-evaluated with __getitem__. Instead, call get_table twice. to_frame() always returns copies, so avoid using it when testing other behavior. Rename table_source since it has been replaced by cache=True. Use a for loop to reuse code when creating tables of computed columns.

daradib · 2014-12-02T22:45:13Z

Looks good. I added a commit cleaning up test_table_copy to call get_table twice, avoid calling to_frame, etc.

jiffyclub · 2014-12-02T22:47:58Z

@fscottfoti, any objection to merging this and making it part of the next release?

Set scope on cached data The user can set a scope on cached data so that the simulation framework can take care of expiring cached items on a schedule, saving the user from having to do manual cache management. Removes the "table source", which is replaced by cache mechanics.

fscottfoti · 2014-12-03T20:50:31Z

Sorry, I missed this comment. So this is not a reverse compatible change right? I have to update bayarea_urbansim when I update urbansim?

fscottfoti · 2014-12-03T21:04:20Z

Yup, now this is breaking. Should be easy enough to fix.

jiffyclub · 2014-12-03T21:06:32Z

I ended up making two releases today. Version 1.2 has the accounts and supply/demand stuff, but not this. Version 1.3 has this. So you can pick one of those releases to work with for now.

fscottfoti · 2014-12-03T21:10:38Z

sounds good

jiffyclub mentioned this pull request Nov 5, 2014

table_source is going away, switch to cached table UDST/sanfran_urbansim#7

Merged

jiffyclub force-pushed the cache-scope branch from 30c9818 to 3b37158 Compare November 10, 2014 23:28

daradib mentioned this pull request Nov 25, 2014

spandex sim support #133

Merged

jiffyclub added 7 commits December 2, 2014 11:12

fix another usage of table_source

e69aa5c

Change ._frame attribute to .local

3b390d4

There are situations when you need to directly access the DataFrame underlying a DataFrameWrapper, especially when trying to shadow that column with a computed column. Instead of forcing people to use the semi-private ._frame now it's .local.

Add caching section to simulation docs

ae1f92f

[ci-skip]

fixes from rebase on top of copy_col changes

dbb2cd6

jiffyclub force-pushed the cache-scope branch from 3b37158 to dbb2cd6 Compare December 2, 2014 19:47

Remove copy tests related to TableFuncWrapper

5f34076

A couple of the tests in test_table_copy were testing the behaviour of TableFuncWrapper, but those don't enter user space anymore, get_table always returns a DataFrameWrapper.

jiffyclub merged commit 8264b2c into master Dec 3, 2014

jiffyclub deleted the cache-scope branch December 3, 2014 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache scope #129

Cache scope #129

jiffyclub commented Nov 5, 2014

jiffyclub commented Nov 5, 2014

jiffyclub commented Nov 5, 2014

bridwell commented Nov 5, 2014

jiffyclub commented Nov 5, 2014

jiffyclub commented Dec 2, 2014

daradib commented Dec 2, 2014

jiffyclub commented Dec 2, 2014

jiffyclub commented Dec 2, 2014

daradib commented Dec 2, 2014

jiffyclub commented Dec 2, 2014

fscottfoti commented Dec 3, 2014

fscottfoti commented Dec 3, 2014

jiffyclub commented Dec 3, 2014

fscottfoti commented Dec 3, 2014

Cache scope #129

Cache scope #129

Conversation

jiffyclub commented Nov 5, 2014

jiffyclub commented Nov 5, 2014

jiffyclub commented Nov 5, 2014

bridwell commented Nov 5, 2014

jiffyclub commented Nov 5, 2014

jiffyclub commented Dec 2, 2014

daradib commented Dec 2, 2014

jiffyclub commented Dec 2, 2014

jiffyclub commented Dec 2, 2014

daradib commented Dec 2, 2014

jiffyclub commented Dec 2, 2014

fscottfoti commented Dec 3, 2014

fscottfoti commented Dec 3, 2014

jiffyclub commented Dec 3, 2014

fscottfoti commented Dec 3, 2014