Load non-timeseries data from file #92

brynpickering · 2018-04-04T17:53:26Z

Problem description

Large models in the spatial dimension can become incredibly cumbersome to write in YAML, particularly for links. It would be good to have the possibility of loading from CSV or pandas DataFrame that information.

This partially matches issue #91, for loading from DataFrame

Steps to introduce functionality

Decide on syntax within YAML environment
- for links, could we just say links: file=link_matrix.csv?
Decide on required CSV syntax
- A matrix or single column?
Implement file loading in creation of model._model_run

The text was updated successfully, but these errors were encountered:

sjpfenninger · 2021-07-29T16:00:41Z

Possible approach:

We introduce a new top-level item called data_tables. This is a list of dicts which specify the CSV files from which we read data. By specifying parameter, index, and columns, we can define exactly what we want to read from file, and how the file is structured. This allows enough flexibility even when we add new dimensions in the future.

add_dimensions allows the user to give a single value for a dimension which is not in the file itself. This allows us to have a file with costs without the need to have cost class in the columns or the index. It also allows us to replicate the functionality that currently exists when reading in resources from file: use one file that doesn't specify its technology to define the resource for multiple technologies.

data_sources:
    # A simple per-technology, per-node parameter
    - parameter: energy_cap_max
      rows: [nodes]
      columns: [techs]
      file: energy_cap_max.csv

    # A cost that changes through time
    - parameter: cost_om_prod
      file: cost_om_prod.csv
      add_dimensions:
            costs: monetary
      rows: [timesteps]
      columns: [techs, nodes]

    # Resource read from the same file for two separate techs
    # Note: existing way to read a resource from file would be
    # removed, so there is just one way to read in CSV files
    - parameter: resource
      file: resource_pv.csv
      add_dimensions:
            techs: pv_rooftop
      rows: [timesteps]
      columns: [nodes]
    - parameter: resource
      file: resource_pv.csv
      add_dimensions:
            techs: pv_largescale
      rows: [timesteps]
      columns: [nodes]

    - parameter: energy_cap_max
      rows: [nodes]
      columns: [techs]
      file: energy_cap_max.csv
    - parameter: resource
      file: resource_pv.nc
      add_dimensions:
            techs: pv_rooftop
    - parameter: resource
      file: resource_pv.nc
      add_dimensions:
            techs: pv_largescale

Consider NetCDF

You would still need to explicitly load each parameter from file that you want to read from file? For the first two parameters from the above example:

data_sources:
    - parameter: energy_cap_max
      file: model_data.nc  # has a variable energy_cap_max, with nodes and techs as dims

    - parameter: cost_om_prod
      file: model_data.nc  # also has a variable cost_om_prod, with timesteps, techs, nodes as dims
      add_dimensions:
            costs: monetary  # can still add a dim

Suggestion on how to use the same resource data for two techs:

data_sources:
    - parameter: resource
      file: resource_pv.nc  # has a variable resource, with timesteps and nodes as dims
      add_dimensions:
            techs: [pv_rooftop, pv_openfield]

brynpickering · 2022-04-04T10:00:48Z

Based on offline discussions, the above structure seems to make sense. One change would be to make 'parameter' its own dimension, so that parameters can be defined within the CSV or can be added as dimensions, e.g.:

data_sources:
    - rows: [techs]
      columns: [parameter]
      file: tech_params.csv
    - rows: [nodes]
      columns: [techs]
      add_dimensions:
        parameter: energy_cap_max
      file: energy_cap_max.csv

brynpickering · 2022-04-04T10:01:30Z

Some places where the format might need refinement are w.r.t links and carriers. The way in which we may now define links and techs as lists of dictionaries (#324, #362) might fit well. E.g., for links, we might want to define all of them in a CSV:

	from	to
HUN_to_AUT	HUN	AUT
AUT_to_CHE	AUT	CHE

Which replaces:

links:
    - id: HUN_to_AUT
      from: HUN
      to: AUT
    - id: AUT_to_CHE
      from: AUT
      to: CHE

with:

data_sources:
    # A simple per-technology, per-node parameter
    - rows: [links]
      columns: [parameter]
      file: links.csv

But, how do we handle the following:

links:
    - id: HUN_to_AUT
      from: HUN
      to: AUT
      link_techs:
        - id: ac_transmission
          distance: 10
          energy_cap_max: 20
    - id: AUT_to_CHE
      from: AUT
      to: CHE
      link_techs:
        - id: ac_transmission
          energy_cap_max: 5
        - id: dc_transmission
          energy_cap_max: 10

It seems we would need a separate file to define the nodes connected by links (the above table) and to define the technologies and their capacities within links:

	ac_transmission	ac_transmission	dc_transmission
	distance	energy_cap_max	energy_cap_max
HUN_to_AUT	10	20
AUT_to_CHE		5	10

then use the data source definition:

data_sources:
    # A simple per-technology, per-node parameter
    - rows: [links]
      columns: [link_techs, parameter]
      file: link_params.csv

brynpickering · 2022-04-20T10:42:56Z

One addition: there should be the ability to ignore rows/columns that define comments/references/units that aren't relevant to model dimensions.

brynpickering · 2022-04-20T10:43:56Z

And maybe the possibility for a direct SQL connection

brynpickering self-assigned this Apr 4, 2018

brynpickering added enhancement labels Apr 4, 2018

brynpickering removed their assignment Apr 9, 2018

sjpfenninger added this to the 0.6.x milestone Jan 8, 2019

sjpfenninger removed prio-low labels Feb 22, 2019

brynpickering added this to To do in v0.7.0 May 20, 2021

brynpickering moved this from To do to Highest priority changes in v0.7.0 May 20, 2021

brynpickering moved this from Sharing and visualising data to Refactoring user-facing in v0.7.0 May 20, 2021

brynpickering mentioned this issue Apr 4, 2022

Better handle carrier definitions in YAML #377

Closed

brynpickering mentioned this issue Apr 19, 2023

Separate top-level config and parameters #452

Closed

brynpickering modified the milestones: 0.7.0, 0.7.0.b1 Sep 27, 2023

irm-codebase mentioned this issue Oct 25, 2023

Handling year-specific data #504

Closed

sjpfenninger mentioned this issue Oct 30, 2023

Rip out lots of time-adjustment functionality #507

Merged

4 tasks

brynpickering mentioned this issue Nov 10, 2023

New model definition structure #518

Merged

4 tasks

brynpickering mentioned this issue Jan 6, 2024

Add method to load data sources into the model. #532

Merged

7 tasks

brynpickering closed this as completed in #532 Jan 22, 2024

v0.7.0 automation moved this from Refactoring user-facing to Done Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load non-timeseries data from file #92

Load non-timeseries data from file #92

brynpickering commented Apr 4, 2018

sjpfenninger commented Jul 29, 2021 •

edited

Loading

brynpickering commented Apr 4, 2022

brynpickering commented Apr 4, 2022

brynpickering commented Apr 20, 2022

brynpickering commented Apr 20, 2022 •

edited

Loading

Load non-timeseries data from file #92

Load non-timeseries data from file #92

Comments

brynpickering commented Apr 4, 2018

Problem description

Steps to introduce functionality

sjpfenninger commented Jul 29, 2021 • edited Loading

Consider NetCDF

brynpickering commented Apr 4, 2022

brynpickering commented Apr 4, 2022

brynpickering commented Apr 20, 2022

brynpickering commented Apr 20, 2022 • edited Loading

sjpfenninger commented Jul 29, 2021 •

edited

Loading

brynpickering commented Apr 20, 2022 •

edited

Loading