Add method to load data sources into the model. #532

brynpickering · 2024-01-06T20:41:25Z

Fixes #92

Summary of changes in this pull request:

data_sources top-level key to allow loading arbitrary data from file.
example of use with the national-scale model.
parent param -> base_tech.
Remove file=/df= functionality.
Tutorial notebook showing different use-cases on a simple model.

This implementation has required some code in calliope.preprocess.model_data to align data loaded from file and those from YAML. This would ideally be even cleaner than it is, but it works for now. The approach I'm taking is:

Load all data from file into one dataset
Create a dummy dict from this with empty tech definitions and empty node definitions with relevant techs attached to those nodes (based on what data is defined from file). Both of these are applied to the user-defined YAML to not mess up the YAML->dataset code.
Create another dummy dict with base tech data (base_tech and carrier_in/out - if included in data from file) which goes at the bottom of the tech inheritance chain. I didn't put this in (2) as I don't want it to override some user YAML definition (e.g., carrier_in is changed by a YAML override compared to what is loaded from file).

~~## REMAINING ISSUES~~
- Currently, It is very difficult to ensure any amount of YAML definition can be handled. If there is minimal info provided in YAML (e.g., one specific parameter override for one tech) then you have no info available about which techs exist at which nodes except for what you have provided in data_sources. For the national scale example I've set up, inferring which techs are defined at which nodes gets messed up by array broadcasting of flow_cap_max, making the model think that all techs are defined at all nodes. EDIT: I think this is fixed.
~~- We probably don't want the national scale example data duplicated in CSV in the calliope module itself. Perhaps we move this to tests? ~~

TODO

- [ ] Order of overrides (YAML > data sources or data sources > YAML) and exception behaviour on clashes between data sources and between YAML and data sources (both currently set to silently override) should be configurable. EDIT: leaving "YAML > data sources" order as-is an non-configurable.

Would be easy enough to extend the loading from file to be direct from netcdf and / or from excel (just add a sheet_name param).
tests
docs

Reviewer checklist:

Test(s) added to cover contribution
Documentation updated
Changelog updated
Coverage maintained or improved

The method for loading them in isn't dirty, but aligning YAML definitions with those from file is.

Dummy test model isn't well-formatted to pass the carrier in/out checks. We _do_ check these in the YAML schema so that should be sufficient.

codecov · 2024-01-07T15:27:06Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (33b0672) 95.19% compared to head (e4a7137) 95.65%.

❗ Current head e4a7137 differs from pull request most recent head 15f102c. Consider uploading reports for the commit 15f102c to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
+ Coverage   95.19%   95.65%   +0.46%     
==========================================
  Files          24       25       +1     
  Lines        3306     3450     +144     
  Branches      706      683      -23     
==========================================
+ Hits         3147     3300     +153     
+ Misses         92       85       -7     
+ Partials       67       65       -2

Files	Coverage Δ
src/calliope/attrdict.py	`96.48% <100.00%> (ø)`
src/calliope/backend/backend_model.py	`97.67% <100.00%> (+0.01%)`	⬆️
src/calliope/core/io.py	`94.79% <100.00%> (+0.34%)`	⬆️
src/calliope/core/model.py	`94.76% <100.00%> (+0.05%)`	⬆️
src/calliope/examples.py	`100.00% <100.00%> (ø)`
src/calliope/postprocess/postprocess.py	`90.32% <100.00%> (ø)`
src/calliope/preprocess/data_sources.py	`100.00% <100.00%> (ø)`
src/calliope/util/schema.py	`90.32% <100.00%> (ø)`
src/calliope/preprocess/model_data.py	`99.34% <97.77%> (-0.66%)`	⬇️
src/calliope/preprocess/time.py	`95.78% <95.23%> (+7.21%)`	⬆️

... and 1 file with indirect coverage changes

…el_drop

src/calliope/attrdict.py

src/calliope/config/config_schema.yaml

src/calliope/config/model_data_checks.yaml

src/calliope/example_models/national_scale/data_sources/time_varying_params.csv

src/calliope/example_models/national_scale/scenarios.yaml

src/calliope/core/model.py

src/calliope/example_models/national_scale/scenarios.yaml

sjpfenninger · 2024-01-09T12:08:50Z

src/calliope/example_models/national_scale/data_sources/techs_carriers_at_nodes.csv

This seems like a pretty confusing approach that will rapidly become error-prone and difficult to manage with larger models. We should at least try to show an example where YAML and CSV are mixed and not highlight this as the "default" approach?

brynpickering · 2024-01-10T18:53:45Z

@sjpfenninger for carrier_in and carrier_out as well as to and from, perhaps we can let them work in a special way...

`carrier_in`/`_out`:

The user could provide it without the carriers dimension as they do in YAML. E.g.:

techs	carrier_in	carrier_out
supply_tech		foo
supply_tech		bar
demand_tech	baz
conversion_tech	[foo, bar]	baz

We would then enforce that when loading the data and process it back into a dictionary to merge into the traditional model definition.

Multiple carriers in a list would need special parsing as they would likely be loaded in as strings ("[foo, bar]") and would need processing back to lists of strings.

`to`/`from`

As with carrier_in/_out, we let users define it as they would in YAML. However, there is the added step that we would need to identify transmission technologies once all data files are loaded and a dummy "traditional" model definition has been created. Then we would go back to the loaded data files and make a check that no parameters were defined over the nodes dimension for transmission technologies. This would then emulate the YAML loading checks, which do not allow a transmission technology to be defined at a node.

The loop back to do the nodes check could be a pain, but manageable I think.

pros/cons

pros

it maps to what the user does in YAML
it allows users to define link to/from nodes in a way that is clearer than defining them via carrier_in and carrier_out

cons

it's more work on our side and therefore a higher maintenance burden
might be confusing to the user that carrier_in/_out are pivoted in model.inputs to have the values as one of the dimensions and the values being binary.
It means that you can't use a pre-built calliope model as the data source to your model. Imagine a "vanilla" calliope model being loaded in as a data source and then a set of YAML-defined overrides to tweak that data. It wouldn't work because Calliope would see a malformed carrier_in/_out and would be missing to and from.

Not failing locally...

brynpickering · 2024-01-17T10:30:24Z

@sjpfenninger the solution I opted for was to allow to/from to be defined in text format in file but to limit carrier_in/out to still be boolean and to raise an error if a transmission technology defines data at nodes in the loaded data from file. This seems to me like a reasonable compromise that stops us having a separate method to load data from file (as in, YAML-esque data that needs to be processed separately to a dictionary).

brynpickering · 2024-01-19T11:12:54Z

Docs added in #538

sjpfenninger

A few minor comments/suggestions.

Besides the remaining issue of dealing with carrier_in and carrier_out, which probably takes a bit more thinking to resolve (if it doesn't just stay as-is) I think this looks good now and ready to merge in for the beta.

docs/examples/calliope_model_object.py

src/calliope/config/model_data_checks.yaml

src/calliope/config/protected_parameters.yaml

tests/common/national_scale_from_data_sources/model.yaml

docs/hooks/macros.py

docs/examples/urban_scale/index.md

Co-authored-by: Stefan Pfenninger <stefan@pfenninger.org>

* Add `data-sources` top-level key to load tabular data into the model from file or pandas.DataFrame. * Remove `file/df=` functionality. * Fix resampler; add parameter dtype casting. * parent -> base_tech. * Add data source schema validation. * Add tutorial to docs. --------- Co-authored-by: Stefan Pfenninger <stefan@pfenninger.org>

Add (dirty?) method to load data sources into the model.

9df079d

The method for loading them in isn't dirty, but aligning YAML definitions with those from file is.

brynpickering requested a review from sjpfenninger January 6, 2024 20:41

Fix dimension broadcasting issue

42d54cc

brynpickering marked this pull request as ready for review January 7, 2024 13:06

brynpickering mentioned this pull request Jan 7, 2024

Increasingly masked rolling horizon #127

Open

2 tasks

Revert some checks to have tests pass

1794a9c

Dummy test model isn't well-formatted to pass the carrier in/out checks. We _do_ check these in the YAML schema so that should be sufficient.

brynpickering added 2 commits January 8, 2024 15:08

Merge branch 'main' into feature-load-from-data-sources

2e0e6c8

Update urban scale example to use data sources; ignore -> drop; add s…

c922c2a

…el_drop

sjpfenninger reviewed Jan 9, 2024

View reviewed changes

brynpickering added 6 commits January 9, 2024 15:46

Changes following review.

5992bc3

Merge branch 'main' into feature-load-from-data-sources

d0932c9

Merge branch 'main' into feature-load-from-data-sources

58f3641

Update math doc generation to rely on data from file

7f9fff0

Merge branch 'main' into feature-load-from-data-sources

9f8baa8

fix missing "math" in math docs headings

176eb99

brynpickering added 13 commits January 11, 2024 16:01

Restructure data source files

9d7d5a5

Clean up loading config files

f4a8a67

Another refactor; remove file/df= functionality.

f97c3cb

Catch missed file= removal

d16bab9

Fixes

a10c070

Fix resampler; add dtype casting

18ec071

Merge branch 'main' into feature-load-from-data-sources

2b185a9

parent -> base_tech

402405a

Minor fixes

69acdc3

Add data source schema validation

6a45df3

Debug failing test.

7280600

Not failing locally...

Do not compare simple model solutions in one test

b5d8abc

Update data sources to dict of dicts

e41aff4

brynpickering added 3 commits January 15, 2024 20:42

Handle loading to/from from file

677fbb3

Add tutorial to docs; timeseries clean-up

0f1d6ae

Clean up tutorial; increase coverage

fb421ef

brynpickering added 3 commits January 17, 2024 11:38

Stop "to" and "from" being defined by non-transmission techs

96d2128

Clean up tests; add coverage

042cba7

Update changelog

17447e9

brynpickering requested a review from sjpfenninger January 17, 2024 11:10

sjpfenninger requested changes Jan 22, 2024

View reviewed changes

brynpickering and others added 6 commits January 22, 2024 14:05

Apply suggestions from code review

964e16a

Co-authored-by: Stefan Pfenninger <stefan@pfenninger.org>

Update src/calliope/config/protected_parameters.yaml

f715036

Co-authored-by: Stefan Pfenninger <stefan@pfenninger.org>

Update kernelspec

e4a7137

Fix indentation

b5ae99e

Notebook text suggestions

d3ed4a8

Fix line length

15f102c

brynpickering merged commit 60b0df8 into main Jan 22, 2024
7 of 8 checks passed

brynpickering deleted the feature-load-from-data-sources branch January 22, 2024 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to load data sources into the model. #532

Add method to load data sources into the model. #532

brynpickering commented Jan 6, 2024 •

edited

Loading

codecov bot commented Jan 7, 2024 •

edited

Loading

sjpfenninger Jan 9, 2024

brynpickering commented Jan 10, 2024 •

edited

Loading

brynpickering commented Jan 17, 2024

brynpickering commented Jan 19, 2024

sjpfenninger left a comment

Add method to load data sources into the model. #532

Add method to load data sources into the model. #532

Conversation

brynpickering commented Jan 6, 2024 • edited Loading

TODO

codecov bot commented Jan 7, 2024 • edited Loading

Codecov Report

sjpfenninger Jan 9, 2024

Choose a reason for hiding this comment

brynpickering commented Jan 10, 2024 • edited Loading

carrier_in/_out:

to/from

pros/cons

pros

cons

brynpickering commented Jan 17, 2024

brynpickering commented Jan 19, 2024

sjpfenninger left a comment

Choose a reason for hiding this comment

brynpickering commented Jan 6, 2024 •

edited

Loading

codecov bot commented Jan 7, 2024 •

edited

Loading

brynpickering commented Jan 10, 2024 •

edited

Loading

`carrier_in`/`_out`:

`to`/`from`