Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for sparse xarrays #248

Open
staadecker opened this issue Mar 7, 2024 · 26 comments
Open

Support for sparse xarrays #248

staadecker opened this issue Mar 7, 2024 · 26 comments

Comments

@staadecker
Copy link
Contributor

First of, great work with the library so far!! This is significantly better than what currently exists out there.

Many energy systems models can be extremely sparse. For example, in one recent case, a linear expression is composed of the sum of only 0.1% of the coordinates of an xarray. Not supporting sparsity hence would increase our model size by ~900x. This effectively prohibits us from using Linopy as is.

Suggested solution

Ideally linopy could support xarrays based on the sparse library. I've tried the library but it doesn't work with linopy particularly when trying to combine sparse and not sparse xarrays.

Rejected solutions

  1. I tried using MultiIndices which are natively supported in xarray however broadcasting doesn't work when combining a sub-dimension of a multi-index. In general, my gut tells me this is the wrong direction to take for the library.

  2. I'm aware of the mask parameter for creating constraints and variables however this a) still requires a large memory allocation and b) doesn't help during the computation of linear expressions (it only helps at the variable/constraint creation stage).

@staadecker
Copy link
Contributor Author

(I might be able to submit a pull request for the proposed solution but I first wanted to hear what you thought @FabianHofmann)

@FabianHofmann
Copy link
Collaborator

Hey @staadecker, nice to hear from you and that you like the package :) I would very much like to see sparse object support in linopy and I really cannot say how invasive the required changes would be. But you are most welcome to give it a try! PRs are always welcome.

@staadecker
Copy link
Contributor Author

Update on investigation
I took a look today at adding support for sparse however the changes seem to be quite invasive for two reasons...

  1. xarray is still developing support for sparse and simple operations like creating a sparse DataArray without first creating a DataFrame don't exist afaik.
  2. sparse DataArrays don't mix well with non-sparse DataArrays so a lot of the e.g. concat operations are broken. I think this means a lot of the code would need to be changed to make everything sparse.

General thoughts + question
My feeling is that instead of using xarray one could simply use a Pandas dataframe with a multi-index to do what Linopy does. Instead of broadcasting, one simply does a join on the index columns that are in common. I'm curious, what were the initial reasons to use xarray?

For more context, the lack of sparsity means that a 200 MB data table becomes a 200 GB DataArray which results in an error being thrown due to lack of memory.

@koen-vg
Copy link

koen-vg commented Apr 19, 2024

Just to add another perspective to the discussion, I believe the non-sparse nature of xarray is also impacting the memory footprint of solving pypsa-eur networks. Here is an example of memory usage profile during while solving a medium/high-resolution (35 clusters, 3000 time steps, 4th step in myopic foresight pathway optimisation) sector-coupled pypsa-eur network:

image

This is after applying an ad-hoc memory optimisation to drop the model from memory during solving and reading it back in from a netcdf file afterwards (456052e) as suggested in #219.

During the above optimisation, Gurobi reported using about 16GB of memory, so the mentioned optimisation still leaves a memory overhead of about 9-10GB.

But the maximum memory usage spikes to above 100GB during model preparation. And this was only measured at 30 second intervals, so it's possible that the real maximum was even higher.

For me this has been especially bothersome since I'm working on a SLURM cluster where jobs have to be allocated a certain amount of memory, and will be killed if they exceed it. In an ideal world (in order to run as many jobs in parallel as possible) I'd allocate ~16GB to the above optimisation, but more than 100GB is actually needed... It's possible to group optimisations in a single job and start them in a staggered manner, but this adds quite a bit of complexity.

@fneum
Copy link
Member

fneum commented Apr 19, 2024

Hmm, thanks @koen-vg for pointing this out. Memory spikes in the building phase seem to be a recurring issue, but they should be avoidable. New versions of xarray sometimes cause memory leaks.

Last September, after #161, I had a memory log for PyPSA-Eur sector-coupled, 100 nodes, 4-hourly resolution that looked like this.

image

I'll check how this particular test scenario looks for me with current PyPSA, linopy, xarray versions.

@koen-vg
Copy link

koen-vg commented Apr 19, 2024

Thanks for the quick response! Reassuring to know that a low memory footprint is actually possible. I might try with a few different xarray versions then; the above was using version 2024.2.0. Do you know what version you used when things were working? (Or even just the whole software environment)

@FabianHofmann
Copy link
Collaborator

FabianHofmann commented Apr 19, 2024

@koen-vg that's quite weird, could you mention which linopy you were using?

@FabianHofmann
Copy link
Collaborator

could it have something to do with the IO to the temporary netcdf? did you try it out without it?

@koen-vg
Copy link

koen-vg commented Apr 19, 2024

It was based on linopy 0.3.8. When testing out 456052e it seemed like it reduced the overall memory footprint, but I'm about to do some more detailed testing (master branch linopy; xarray version) to see if I can pin the problem down to anything in particular.

Hopefully I'll soon be able to either point at a faulty xarray/linopy version or just give you a pypsa network which consistently leads to a big memory spike.

@FabianHofmann
Copy link
Collaborator

thanks for your deep-dive @koen-vg, this is quite important

@koen-vg
Copy link

koen-vg commented Apr 19, 2024

Some very preliminary investigation seems to indicate that perhaps myopic foresight optimisation is really the culprit here?

This is the result from even a fairly low-resolution network:
image

I've also seen much more extreme spikes, but only for "later" networks, not overnight optimisations.

Consider this as work in progress. I'm just replicating this with upstream pypsa-eur and linopy versions so I'll be able to provide a complete minimum working example.

@koen-vg
Copy link

koen-vg commented Apr 19, 2024

Regardless of memory spikes, I will just note that, to my best understanding, there is no reason in principle why the optimisation at 2050 should take more time and memory than the one at 2025; the reason is presumably that the later network contains many copies of non-extendable generators, links etc. with different build-years and capacities but otherwise identical function and capacity factors. For solving purposes, those should probably be aggregated and then disaggregated again (since information about how much capacity is built in each year is needed for phase-out purposes). But that's really a separate issue to memory spikes and should be discussed over at pypsa-eur.

@koen-vg
Copy link

koen-vg commented Apr 19, 2024

Last thing before logging off for the weekend:

Here is the memory spike replicated with upstream pypsa-eur and latest version of linopy:
image

This is using pypsa-eur at commit 95805a8d and a freshly created conda env, resulting a linopy 3.8.0 and xarray 2024.2.0.

The configuration file used to produce this example is based in the myopic test config, as follows:

run:
  name: "mem-test"
  disable_progressbar: true
  shared_resources: false
  shared_cutouts: true

enable:
  retrieve_cutout: false

foresight: myopic

scenario:
  ll:
  - v1.5
  clusters:
  - 70
  sector_opts:
  - ""
  planning_horizons:
  - 2020
  - 2025
  - 2030
  - 2035
  - 2040
  - 2045
  - 2050

countries: ["DE", "NL", "BE"]

clustering:
  temporal:
    resolution_sector: "336H"

sector:
  central_heat_vent: true

electricity:
  extendable_carriers:
    Generator: [OCGT]
    StorageUnit: [battery]
    Store: [H2]
    Link: [H2 pipeline]

  renewable_carriers: [solar, onwind, offwind-ac, offwind-dc]

renewable:
  solar:
    cutout: "europe-2013-era5"

industry:
  St_primary_fraction:
    2030: 0.6
    2040: 0.5
    2050: 0.4

solving:
  solver:
    name: "gurobi"
  mem: 4000

Some observations:

  • It's easiest to reproduce this at relatively high spatial resolution and low temporal resolution.
  • On the other hand, anecdotally speaking the memory does also spike at higher temporal resolution. I'll run some optimisations over the weekend to confirm this.
  • The memory during model preparation doesn't exceed the solver memory on the first time horizon, but in this example it already does at the second time horizon (2025). Then the problem grows more severe in subsequent horizons. However, the memory usage does "level off" at some point, presumably as components past their lifetime are phased out.

A little bit of debugging

I tried using ulimit to figure out where the problem is exactly. The results are mixed; you can see where too much memory is allocated, but this is only virtual memory, whereas is the RSS that counts. It's not clear to me if the two are correlated enough. But for example you get something like this:

> (ulimit -v $((3000*1024)); snakemake --use-conda --configfile config/config.mem-test.yaml -j6 -- solve_sector_networks)
[...]
localrule solve_sector_network_myopic:
    input: results/mem-test/prenetworks-brownfield/elec_s_50_lv1.5___2030.nc, resources/mem-test/costs_2030.csv
    output: results/mem-test/postnetworks/elec_s_50_lv1.5___2030.nc, results/mem-test/configs/config.elec_s_50_lv1.5___2030.yaml
    log: results/mem-test/logs/elec_s_50_lv1.5___2030_solver.log, results/mem-test/logs/elec_s_50_lv1.5___2030_memory.log, results/mem-test/logs/elec_s_50_lv1.5___2030_python.log
    jobid: 74
    benchmark: results/mem-test/benchmarks/solve_sector_network/elec_s_50_lv1.5___2030
    reason: Input files updated by another job: results/mem-test/prenetworks-brownfield/elec_s_50_lv1.5___2030.nc
    wildcards: simpl=, clusters=50, ll=v1.5, opts=, sector_opts=, planning_horizons=2030
    threads: 4
    resources: tmpdir=/tmp, mem_mb=30000, mem_mib=28611, runtime=360

Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
INFO:pypsa.io:Imported network elec_s_50_lv1.5___2030.nc has buses, carriers, generators, global_constraints, lines, links, loads, shapes, stores
WARNING:__main__:Existing capacities larger than technical potential for Index(['BE0 0 onwind-2030', 'BE0 2 offwind-ac-2030', 'BE0 2 offwind-dc-2030',
       'BE0 2 onwind-2030', 'DE0 10 offwind-dc-2030', 'DE0 16 offwind-ac-2030',
       'DE0 20 onwind-2030', 'DE0 31 offwind-ac-2030',
       'DE0 32 offwind-ac-2030', 'DE0 32 offwind-dc-2030',
       'DE0 32 onwind-2030', 'DE0 5 offwind-ac-2030'],
      dtype='object', name='Generator'),                        adjust technical potential to existing capacities
/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/shadow/tmp3yjpi434/.snakemake/scripts/tmpx1p14x3_.solve_network.py:156: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  n.generators.p_nom_max.clip(lower=0, inplace=True)
WARNING:pypsa.components:The following buses have no attached components, which can break the lopf:
{'EU'}
WARNING:pypsa.components:The following buses have no attached components, which can break the lopf:
{'EU'}
INFO:linopy.model: Solve problem using Gurobi solver
INFO:linopy.model:Solver logs written to `results/mem-test/logs/elec_s_50_lv1.5___2030_solver.log`.
INFO:linopy.model:Solver options:
 - threads: 4
 - method: 2
 - crossover: 0
 - BarConvTol: 1e-06
 - Seed: 123
 - AggFill: 0
 - PreDual: 0
 - GURO_PAR_BARDENSETHRESH: 200
Set parameter Username
Academic license - for non-commercial use only - expires 2024-12-04
INFO:linopy.io:Writing objective.
Writing constraints.:  64%|████████████████████████▉              | 23/36 [00:01<00:00, 20.66it/s]
ERROR:root:Uncaught exception
Traceback (most recent call last):
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/shadow/tmp3yjpi434/.snakemake/scripts/tmpx1p14x3_.solve_network.py", line 982, in <module>
    n = solve_network(
        ^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/shadow/tmp3yjpi434/.snakemake/scripts/tmpx1p14x3_.solve_network.py", line 924, in solve_network
    status, condition = n.optimize(**kwargs)
                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/pypsa/optimization/optimize.py", line 604, in __call__
    return optimize(self._parent, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/pypsa/optimization/optimize.py", line 584, in optimize
    status, condition = m.solve(solver_name=solver_name, **solver_options, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/linopy/model.py", line 1060, in solve
    result = func(
             ^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/linopy/solvers.py", line 615, in run_gurobi
    problem_fn = model.to_file(problem_fn)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/linopy/io.py", line 278, in to_file
    constraints_to_file(m, f, log=log, batch_size=batch_size)
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/linopy/io.py", line 124, in constraints_to_file
    df = m.constraints[name].flat
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/linopy/constraints.py", line 477, in flat
    df = to_dataframe(ds, mask_func=mask_func)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/linopy/common.py", line 242, in to_dataframe
    data = {k: np.ravel(v) for k, v in data.items()}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/linopy/common.py", line 242, in <dictcomp>
    data = {k: np.ravel(v) for k, v in data.items()}
               ^^^^^^^^^^^
  File "/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 1874, in ravel
    return asanyarray(a).ravel(order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 254. MiB for an array with shape (33353694,) and data type int64
RuleException:
CalledProcessError in file /home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/rules/solve_myopic.smk, line 147:
Command 'source /home/koen/.local/opt/mambaforge/bin/activate '/home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_'; set -euo pipefail;  python /home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/shadow/tmp3yjpi434/.snakemake/scripts/tmpx1p14x3_.solve_network.py' returned non-zero exit status 1.
[Fri Apr 19 16:44:15 2024]
Error in rule solve_sector_network_myopic:
    jobid: 0
    input: results/mem-test/prenetworks-brownfield/elec_s_50_lv1.5___2030.nc, resources/mem-test/costs_2030.csv
    output: results/mem-test/postnetworks/elec_s_50_lv1.5___2030.nc, results/mem-test/configs/config.elec_s_50_lv1.5___2030.yaml
    log: results/mem-test/logs/elec_s_50_lv1.5___2030_solver.log, results/mem-test/logs/elec_s_50_lv1.5___2030_memory.log, results/mem-test/logs/elec_s_50_lv1.5___2030_python.log (check log file(s) for error details)
    conda-env: /home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_

Exiting because a job execution failed. Look above for error message
WorkflowError:
At least one job did not complete successfully.
[Fri Apr 19 16:44:15 2024]
Error in rule solve_sector_network_myopic:
    jobid: 74
    input: results/mem-test/prenetworks-brownfield/elec_s_50_lv1.5___2030.nc, resources/mem-test/costs_2030.csv
    output: results/mem-test/postnetworks/elec_s_50_lv1.5___2030.nc, results/mem-test/configs/config.elec_s_50_lv1.5___2030.yaml
    log: results/mem-test/logs/elec_s_50_lv1.5___2030_solver.log, results/mem-test/logs/elec_s_50_lv1.5___2030_memory.log, results/mem-test/logs/elec_s_50_lv1.5___2030_python.log (check log file(s) for error details)
    conda-env: /home/koen/Dokument/UiT/Research/Modelling/pypsa-eur/.snakemake/conda/0d18979ece638ab905bf948360770a8a_

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-04-19T164322.590433.snakemake.log
WorkflowError:
At least one job did not complete successfully.

@koen-vg
Copy link

koen-vg commented Apr 23, 2024

I'm not sure exactly how much time I'll have to look into this the coming week or two, so if anyone else has the time/opportunity, feel free to dig in.

Just some quick (almost administrative) notes:

  • One issue is the memory spike in the above set-up which seems (at least on the surface) to be a linopy issue. Whether it really has to do with sparsity or not I'm not actually 100% sure of, so I could also set up a separate issue for it in order to keep things organised.
  • A rather separate issue is the substantial overall memory- and time-impact of myopic foresight in pypsa-eur. This is obviously pypsa-eur-related and not a linopy problem, and I'll see if I can open an issue about it there. On the other hand, it might be that streamlining myopic foresight optimisations in pypsa-eur (i.e. doing some aggregation of components) would also eliminate the memory spike observed here.

@FabianHofmann
Copy link
Collaborator

Hey @koen-vg, thanks again for the update. I will try to catch up next week (this week is too busy)

@koen-vg
Copy link

koen-vg commented May 9, 2024

As you can see in the PR mentioning this issue, I've implemented a kind of aggregation over build-years that solves pypsa-eurs problems with performance. But It doesn't rule out the potential for memory spikes. Again, it wasn't entirely clear if the memory spike was really coming from linopy or pypsa-eur though, so the above PR isn't necessarily a final solution for the kind of memory spike I showed.

@FabianHofmann
Copy link
Collaborator

@koen-vg some news here:

Linopy has now a new LP-writing backend based on polars. It is very fast and memory efficient. Could you have another try with latest master and setting io_api="lp-polars"?

@koen-vg
Copy link

koen-vg commented May 22, 2024

Unfortunately, that doesn't look like it fixes the issue.

Here is a comparison with and without lp-polars:
image

The lp-polars option seems faster to be sure, but it actually uses more memory in total; I guess some memory isn't being released at some point?

Anyway, The big memory spike in the beginning is identical with and without lp-polars so I don't think it's due to the actual writing of the lp file. Again, it might actually be on pypsa-eur, and how it defines the constraints? Not exactly sure. I'm sure it would be possible to dig into it and figure out where in the code this memory spike is happening exactly but I don't quite have time right now.

With a slightly larger model I can again replicate the memory spike overshadowing the maximum memory usage during solving, also while using the lp-polars option:

image

It's by the by but I will just note that there also seems to be a lot of potential for cutting down on the total memory usage by release as much memory as possible before solving. This goes for both linopy and pypsa-eur. I've consistently seen a 100%+ memory overhead (across different model sizes) on top of the memory usage of gurobi. So for anyone that is eager to contribute, this seems like low-hanging fruit :)

@koen-vg
Copy link

koen-vg commented May 22, 2024

In fairness, it does look like lp-polars solves one of two memory spikes. I wasn't careful enough in seeing and pointing this out. It's a bit more obvious in the following plot (higher spatial resolution brings out the memory spikes):
image
You can see that the secondary spike (with the default setting) also results in a higher memory usage than during solving itself. And this spike is resolved by the lp-polars option. So it looks to me like if linopy can be made to release some more memory to bring the total down (to match the default setting), then lp-polars is half the solution. (The other half possibly belonging to pypsa-eur.)

@FabianHofmann
Copy link
Collaborator

I start to understand now...

There is two major memory peaks:

  1. in linopy itself, when the model is created, and in particular when the nodal balance constraints are defined (many nan's, large expressions).
  2. When the nodal balance constraints written down to file as pointed out in your previous commment here Support for sparse xarrays #248 (comment).

The polars implementation addresses the second peak, but it does not do a proper garbage collection, leading to a stronger increase of memory with time. And unfortunately, polars implementation is not much faster. For smaller networks it looked like a great improvement, see #294 (comment).

So, I think there are some possible partial solution

  1. improve the constraints building in the nodal balance constraint in the myopic code: avoid parallel non-extendable components, instead aggregate previously built capacities (don't know how to track phase out dates though) -> both peaks peak
  2. release memory as you did -> both peaks
  3. try lazy polars writing -> second peak
    ...

@koen-vg
Copy link

koen-vg commented May 22, 2024

As mentioned only briefly before in this thread, I have been working (quite a bit) on alternative 1. in this PR: PyPSA/pypsa-eur#1056. It's non-trivial, but the results are worth it. Not only are the memory spikes entirely resolved (no need for polars either), but the size of the LP passed to gurobi is simply much smaller, leading to much improved memory and time footprints overall.

However, some information is always lost. In particular, changing efficiencies and marginal costs are a challenge. I think the build year aggregation is nearly always worth it but it doesn't quite produce identical results and since the implementation isn't trivial, there are chances for unexpected interactions with custom constraints etc. etc. that I might have missed in my testing.

Still, I will be using the build year aggregation for myopic foresight in pypsa-eur regardless of the memory footprint of model building with linopy :)

@FabianHofmann
Copy link
Collaborator

I really like the result of your aggregation strategy. I am just thinking if we could generalize a bit more.

Perhaps, we could finally create a new component column "active" in pypsa which allows to ingore components in the optimization. Instead of removing the components to be aggregated, we simply set their activity status to False and add a one representative component instead, which is dissolved and removed after the optimization.
This approach would lead to less intrusive changes of the network. But I don't think if helps with the other issues.

@koen-vg
Copy link

koen-vg commented May 22, 2024

Glad to hear it's appreciated :)

Yeah actually I can see the appeal of such an active attribute! In my current implementation, quite a bit of effort goes into carefully storing certain attributes of to-be-aggregated components in extra dataframe columns (by build year) so that they can later by disaggregated again. This could be simplified by an active attribute. Of course some work would still have to be done in terms of actually aggregating correctly (one has to carefully set p_nom_min, p_nom_max, etc. correctly) and disaggregating output variables after optimisation.

Overall I guess this build year aggregation business is just inherently a bit complex when you put together all the little details, so there's a limit to how much things can be simplified with the right abstractions.

(P.S. I have some mixed experience with boolean values in PyPSA networks (looking at you, reversed), which netCDF seems to complain about in a very inconsistent way. Hopefully that wouldn't be a big problem with a potentially boolean active column.)

@ollie-bell
Copy link

ollie-bell commented Sep 13, 2024

Apologies to jump into an issue but I have some thoughts that are related to sparsity so thought I'd add to this discussion rather than create a new one for now.

I've just started using linopy (and not from a linear programming background either but have used xarray extensively in other contexts) and came across behaviour which I found unintuitive - which is the default behaviour of LinearExpression.where. I found I needed to pass drop=True to get rid of "null" terms in the equations (i.e. significantly reduce the size of the _term dimension)... but my intuition as a new user would have expected this to be the default.

So the question / discussion point is what are the reasons for drop=True not being default in LinearExpression.where (and presumably Variable.where as well but I've not thought about that so much yet). Are there situations where a user would want to keep entirely null terms in their equations? Or is it simply to align with the xarray defaults?

EDIT

On further thought I see why drop=False remains the default, because now in application code I need to deal with the case of drop=True resulting in misaligned / missing coordinates relative to other data. However, I think the documentation would benefit from a small example of using drop=True. Happy to open a PR if that is of interest?

Perhaps also some indication in the __repr__ of LinearExpression of the sparsity. I think most of my confusion came from the fact that the __repr__ hides all the nan coeffs, but they are still present in the _terms dimension of the underlaying data.

@FabianHofmann
Copy link
Collaborator

@ollie-bell thanks for the exaplanation. Indeed, this behaviour is a bit confusing and together with the representation you easily lose track. An example would be nice, I would be more than happy to review a PR.
Perhaps the representation explanation could also be added quite early to one of the existing example (perhaps here https://linopy.readthedocs.io/en/latest/creating-expressions.html#Using-.where-to-select-active-variables-or-expressions?)

@ollie-bell
Copy link

@ollie-bell thanks for the exaplanation. Indeed, this behaviour is a bit confusing and together with the representation you easily lose track. An example would be nice, I would be more than happy to review a PR. Perhaps the representation explanation could also be added quite early to one of the existing example (perhaps here https://linopy.readthedocs.io/en/latest/creating-expressions.html#Using-.where-to-select-active-variables-or-expressions?)

Great. I probably won't get to it until next month now as I'm about to go on holiday but I'll set myself a reminder for when I'm back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants