Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the "run" function to store local columns only #23

Merged
merged 3 commits into from
Apr 6, 2017

Conversation

hanase
Copy link
Contributor

@hanase hanase commented Apr 4, 2017

This addresses the issue discussed here (by @janowicz on Feb 21st). Currently the run function writes out all variables regardless if they are used or not which can result in huge files. This happens for both, the base year data as well as every orca iteration, because the write_tables() function simply calls to_frame().

This PR adds two boolean arguments to the run function (out_base_local and out_run_local). If True, only local columns are stored for out_base_tables and out_run_tables, respectively. The write_tables function gets a boolean argument called "local". (Note that after PR #22 of @bridwell is accepted, the write_tables function can pass an expression to to_frame() for obtaining local columns only.)

In our case this change reduces the output file size more than 4 times.

Please feel free to rename the new arguments if needed.

@coveralls
Copy link

coveralls commented Apr 4, 2017

Coverage Status

Coverage increased (+0.005%) to 96.536% when pulling c80130b on hanase:write_local_columns_only into 8e7b87b on UDST:master.

@hanase
Copy link
Contributor Author

hanase commented Apr 4, 2017

Note that as suggested by Eddie, the default behavior now is to save local columns only in the base year, and all columns in run years. This changes the original behavior of the run function which saved all columns in the base year.

@janowicz
Copy link
Contributor

janowicz commented Apr 4, 2017

This is great, thanks @hanase! Appreciate the adding of docstrings for certain of the run parameters that weren't documented before too.

Any thoughts from people about the appropriate defaults for out_base_local or out_run_local? It may make sense to keep the defaults for both of these as False for the moment so as to not break compatibility for anyone who is using computed base-year variables that come from using data_out. For me, I'll usually set them both to True when using :).

@bridwell
Copy link
Contributor

bridwell commented Apr 4, 2017

I kind of like idea of setting them both to True by default, but am OK with whatever.

@hanase
Copy link
Contributor Author

hanase commented Apr 5, 2017

I agree with @bridwell that it would make sense to set both arguments to True by default. Having either of them False might be useful for debugging and exploration, which should not be the default. But I'm also OK with whatever you guys decide.

@janowicz
Copy link
Contributor

janowicz commented Apr 5, 2017

Agreed that the most sensible default is setting both arguments to True. Lets go with that. @hanase could you please set the default for out_run_local to True in this PR?

@coveralls
Copy link

coveralls commented Apr 6, 2017

Coverage Status

Coverage increased (+0.005%) to 96.536% when pulling 580a682 on hanase:write_local_columns_only into 8e7b87b on UDST:master.

@hanase
Copy link
Contributor Author

hanase commented Apr 6, 2017

Done. Sorry - my commit message is wrong - the argument is set to True now.

@janowicz janowicz merged commit fbc4983 into UDST:master Apr 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants