Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move time series aggregation to an external module #356

Closed
8 tasks
sjpfenninger opened this issue May 20, 2021 · 2 comments
Closed
8 tasks

Move time series aggregation to an external module #356

sjpfenninger opened this issue May 20, 2021 · 2 comments
Projects
Milestone

Comments

@sjpfenninger
Copy link
Member

sjpfenninger commented May 20, 2021

Problem description

To reduce complexity of Calliope's core code, we only want a hook for time series aggregation and resampling, rather than actually doing it ourselves.

The external module could be:

  • Our own current code moved out of the Calliope core
  • tsam

TODO:

  • Remove all complex clustering algorithms from core (inc. masking).
  • Move time resampling to model.resample_time.
  • Make it possible to cluster the timeseries using a user-defined set of cluster IDs (the functionality already exists, we just need to move the definition to model.cluster_time.
  • Keep a config to switch enable inter-cluster storage when using clustering (e.g. model.include_inter_cluster_storage, default is True).
  • Update docs to tell people to prepare cluster IDs themselves using e.g. tsam.
  • Make hardcoded sum/mean of data on resampling explicit for every input parameter.
  • Move hardcoded sum/mean of data on resampling (calliope/time/funcs.py:294 ea89a66) to a model_data variable attribute (ideally, this would be encoded in the typedconfig rules).
  • Document justification for sum/mean of input parameters on resampling.
@brynpickering
Copy link
Member

In the context of #452, we could now have config.init.time_resample alongside config.init.time_subset.

We could also move these two configuration items to config.build and allow a user to resample/slice data only when they build the optimisation problem?

As I see it, advantages:

  • Quicker initialisation of the model as we aren't doing any timeseries manipulation
  • ability to test different extents of resampling / time subsetting on-the-fly
  • Can save the initialised model to file and load it later to do different timeseries operations

Disadvantages:

  • larger model when input data is long, although time_resample would have no impact here as currently when we resample we keep a copy of the original timeseries in-memory anyway.
  • odd output timeseries / possible clashes in output. If resampling, one would get gaps between timesteps. If subsetting, one would get gaps either side of the subset.

@sjpfenninger
Copy link
Member Author

We have decided not to provide clustering code for now, and leave it up to users to do clustering as per their requirements. As of 0.7, it's possible to supply user-defined clustering: e.g. config.init.time_cluster: cluster_days.csv

v0.7.0 automation moved this from Cleaner internals to Done Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

2 participants