Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix writing of default values #216

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Fix writing of default values #216

wants to merge 18 commits into from

Conversation

trevorb1
Copy link
Member

@trevorb1 trevorb1 commented Feb 1, 2024

Description

In this PR I have fixed the writing out of default values through the existing --write_defaults flag. Accompanying tests have been added.

This functionality existed (and worked) for writing out parameter defaults, but did not for writing out result defaults. This was due to the input data not being passed into the WriteStrategy._expand_defaults(...) function. I have addressed this, however, naming of variables is not super clear now in the WriteStrategy class.

Within WriteStrategy, data to be written out is now called inputs (this can be set/param data or result data), while input data for the model (only param/set data) is now called input_data. idk, we may want to clarify this before merged, or it may just be fine since its only in one place and documented in the docstring of _expand_defaults(...) and at variable definition.

Issue Ticket Number

My suggestion is for this PR to replace PR #215

Documentation

The functionality was already documented, but it didnt work!

@willu47
Copy link
Member

willu47 commented Feb 7, 2024

@trevorb1 - we have a performance problem. Creating new pandas dataframes with default values, concatenating the read-in values, and deleting duplicate values is slow and memory intensive. We need to consider alternatives.

  1. Do not store the entire dictionary of expanded dataframes in memory, but only generate them one-at-a-time upon writing out.
  2. Use xarray, which is a much more efficient way of storing data in-memory; and with very fast and easy methods to expand or fill multi-dimensional data. It also provides very easy conversion to and from pandas DataFrames e.g. see Return OSeMOSYS data as an xarray.DataSet #184

@willu47
Copy link
Member

willu47 commented Feb 7, 2024

@trevorb1 see 7849353 which refactors your expand_defaults code, ensuring that only one dataframe at a time is expanded and then immediately written out, minimising memory use. Note this breaks a lot of tests...

@trevorb1
Copy link
Member Author

Thanks for the feedback, @willu47! I looked at this today and think your option 2 of using xarray will be the way forward.

I refactored the current code a little, just to make use of instance variables. My first thought of using pandas update actually made it slower, though. So that was a bummer haha. The implementation you suggested, @willu47, was faster than the current code!

image

This method still wont work with bigger models, though :( I was using OSeMSOYS Global to generate generic input data for these tests. What I found was that anything past a medium size model, and my computer just killed it after like 45sec (I have a mid-range laptop). imo, this makes the --write_defaults flag not that useful, as for more realistic models it probably wont work. (However, to be fair, I probably wouldnt run a large model on my laptop)

I will look at the xarray implementation to see if that works better! Thanks for the idea on that!

@trevorb1
Copy link
Member Author

trevorb1 commented Mar 31, 2024

With the latest updates I have (following @willu47's suggestion here):

Still to do before reviewing/merging this:

  • Results expansion of defaults does not work right now. Also write tests to flag this.
  • Expansion of defaults does not work for datafiles. Also write tests to flag this.

I guess the scope of this PR has changed from "fixing" writing defaults to just refactoring it in otoole to address issue #217. Based on my comment above, seems like the actual fixing will really only come with the xarray implementation, as mentioned by @willu47 here (which relates to the open PR #184)

@@ -152,8 +152,7 @@ def _write_parameter(
default : int
"""

if not self.write_defaults:
df = self._form_parameter(df, default)
df = self._form_parameter(df, default)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where datafile defaults are being filtered out, even if write_defaults is passed. The issue is that write_defaults is no longer exposed to the write strategy, so we need a way to address this conditional.

@trevorb1
Copy link
Member Author

trevorb1 commented Apr 1, 2024

Okay, I think writing defaults has been moved to ReadStrategy and works now!

A few notes:

@HauHe
Copy link
Contributor

HauHe commented Apr 2, 2024

Hi @trevorb1, I just ran a test with Utopia without having the DiscountRateIdv in the config file, but the default in the model file pointing to the DiscountRate I get results for CapitalInvestment. I haven't done any other testing (yet). But I'd say this should fix #220
Thanks a lot for your work!!

@trevorb1
Copy link
Member Author

trevorb1 commented Apr 2, 2024

@HauHe Great! To confirm, your test was for a multi-year model, right? I cant remember if another issue arose for a single year model that relates to this. But maybe since that is more of an edge case, deserves its own issue ticket

@HauHe
Copy link
Contributor

HauHe commented Apr 2, 2024

@trevorb1, yes the model I used for testing (UTOPIA) has multiple years.

@robertodawid
Copy link

robertodawid commented Jul 30, 2024

I ran a test with one of my models, and there is no data in the Capitalnvestment.csv. In this case, I have declared the DiscountRateIdv for one of the technologies. No matter the combination I did, for example, defaults to DiscountRate in the model file or the default value in the data file, there are no results for capital investment. We have similar difficulties (Shravan and I) in calculating the CapitalInvestment in Osemosys_Pul.p. However, we solved the issue with Sharavan much simpler; and maybe it can be simplified/enhanced. Happy to share it with you!. @trevorb1

@robertodawid
Copy link

I ran a test with one of my models, and there is no data in the Capitalnvestment.csv. In this case, I have declared the DiscountRateIdv for one of the technologies. No matter the combination I did, for example, defaults to DiscountRate in the model file or the default value in the data file, there are no results for capital investment. We have similar difficulties (Shravan and I) in calculating the CapitalInvestment in Osemosys_Pul.p. However, we solved the issue with Sharavan much simpler; maybe it can be simplified/enhanced. Happy to share it with you! @trevorb1

@trevorb1, thanks for yesterday's discussion. I tried to install the version from this branch without success; the CapitalInvestment was empty. The DiscountRateIdv in the data file looks like this (using otoole):

param default 0.089 : DiscountRate :=
RE1 0.05
;
param default 0.089 : DiscountRateIdv :=
RE1 CKGELCurb 0.02
; 

For now, the easiest solution is to un-hash CC1 and the var CapitalInvestment, and the calculated investments are ok compared to my manual calculation.

@trevorb1
Copy link
Member Author

Thanks for the the update @robertodawid! Do you have a minimal datafile + model you would be able to attach that I would be able to recreate this issue with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants