-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to load data sources into the model. #532
Conversation
The method for loading them in isn't dirty, but aligning YAML definitions with those from file is.
Dummy test model isn't well-formatted to pass the carrier in/out checks. We _do_ check these in the YAML schema so that should be sufficient.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #532 +/- ##
==========================================
+ Coverage 95.19% 95.65% +0.46%
==========================================
Files 24 25 +1
Lines 3306 3450 +144
Branches 706 683 -23
==========================================
+ Hits 3147 3300 +153
+ Misses 92 85 -7
+ Partials 67 65 -2
|
src/calliope/example_models/national_scale/data_sources/time_varying_params.csv
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pretty confusing approach that will rapidly become error-prone and difficult to manage with larger models. We should at least try to show an example where YAML and CSV are mixed and not highlight this as the "default" approach?
@sjpfenninger for
|
techs | carrier_in | carrier_out |
---|---|---|
supply_tech | foo | |
supply_tech | bar | |
demand_tech | baz | |
conversion_tech | [foo, bar] | baz |
We would then enforce that when loading the data and process it back into a dictionary to merge into the traditional model definition.
Multiple carriers in a list would need special parsing as they would likely be loaded in as strings ("[foo, bar]") and would need processing back to lists of strings.
to
/from
As with carrier_in
/_out
, we let users define it as they would in YAML. However, there is the added step that we would need to identify transmission technologies once all data files are loaded and a dummy "traditional" model definition has been created. Then we would go back to the loaded data files and make a check that no parameters were defined over the nodes
dimension for transmission technologies. This would then emulate the YAML loading checks, which do not allow a transmission technology to be defined at a node.
The loop back to do the nodes
check could be a pain, but manageable I think.
pros/cons
pros
- it maps to what the user does in YAML
- it allows users to define link to/from nodes in a way that is clearer than defining them via
carrier_in
andcarrier_out
cons
- it's more work on our side and therefore a higher maintenance burden
- might be confusing to the user that
carrier_in
/_out
are pivoted inmodel.inputs
to have the values as one of the dimensions and the values being binary. - It means that you can't use a pre-built calliope model as the data source to your model. Imagine a "vanilla" calliope model being loaded in as a data source and then a set of YAML-defined overrides to tweak that data. It wouldn't work because Calliope would see a malformed
carrier_in
/_out
and would be missingto
andfrom
.
Not failing locally...
@sjpfenninger the solution I opted for was to allow |
Docs added in #538 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments/suggestions.
Besides the remaining issue of dealing with carrier_in
and carrier_out
, which probably takes a bit more thinking to resolve (if it doesn't just stay as-is) I think this looks good now and ready to merge in for the beta.
Co-authored-by: Stefan Pfenninger <stefan@pfenninger.org>
Co-authored-by: Stefan Pfenninger <stefan@pfenninger.org>
* Add `data-sources` top-level key to load tabular data into the model from file or pandas.DataFrame. * Remove `file/df=` functionality. * Fix resampler; add parameter dtype casting. * parent -> base_tech. * Add data source schema validation. * Add tutorial to docs. --------- Co-authored-by: Stefan Pfenninger <stefan@pfenninger.org>
Fixes #92
Summary of changes in this pull request:
data_sources
top-level key to allow loading arbitrary data from file.parent
param ->base_tech
.file=/df=
functionality.This implementation has required some code in
calliope.preprocess.model_data
to align data loaded from file and those from YAML. This would ideally be even cleaner than it is, but it works for now. The approach I'm taking is:base_tech
andcarrier_in
/out
- if included in data from file) which goes at the bottom of the tech inheritance chain. I didn't put this in (2) as I don't want it to override some user YAML definition (e.g.,carrier_in
is changed by a YAML override compared to what is loaded from file).## REMAINING ISSUES- Currently, It is very difficult to ensure any amount of YAML definition can be handled. If there is minimal info provided in YAML (e.g., one specific parameter override for one tech) then you have no info available about which techs exist at which nodes except for what you have provided inEDIT: I think this is fixed.data_sources
. For the national scale example I've set up, inferring which techs are defined at which nodes gets messed up by array broadcasting offlow_cap_max
, making the model think that all techs are defined at all nodes.~~- We probably don't want the national scale example data duplicated in CSV in the calliope module itself. Perhaps we move this to tests? ~~
TODO
- [ ] Order of overrides (YAML > data sources or data sources > YAML) and exception behaviour on clashes between data sources and between YAML and data sources (both currently set to silently override) should be configurable.EDIT: leaving "YAML > data sources" order as-is an non-configurable.sheet_name
param).Reviewer checklist: