Skip to content

Conversation

@Manangka
Copy link
Collaborator

Fixes #

Description

Checklist

  • Links to correct issue
  • Update changelog, if changes affect users
  • PR title starts with Issue #nr, e.g. Issue #737
  • Unit tests were added
  • If feature added: Added/extended example
  • If feature added: Added feature to API documentation
  • If pixi.lock was changed: Ran pixi run generate-sbom and committed changes

@sonarqubecloud
Copy link

@Manangka Manangka closed this Oct 20, 2025
JoerivanEngelen added a commit that referenced this pull request Oct 30, 2025
Fixes #1698


# Description
This is part 2 of fixing the performance issues with large model.
In part 1 #1693 the modelsplitter has been optimized. In this PR the
focus is on wiring the partitioned model.

As @Huite pointed out in #1686 the performance bottleneck had to do with
the fact that the same package had to be loaded from file multiple times
while only a part of the file is actually needed.

After digging around for a while i discovered that this had to do with
the fact how we open de the dataset.
`dataset = xr.open_dataset(path, **kwargs)`

In the line above we don't specify anything chunk related. That has as a
result that when you access the dataset the entire file has to be loaded
from disk. By simply adding `chunks="auto"` this is no longer the case
and a huge performance gain is achieved.

There are some other changes related to setting chunking to auto. There
are some parts of the code that don't expect to receive dask arrays. For
instance you can use .item() on a dask array. Instead i now use
.values[()].

I was also getting some errors when the to_netcdf method were called on
the package. All of them had something to do with wrong/unsupported
datatypes. In this PR you will find that an encoding is added for
float16 types. And that in some packages the from_file method has been
updated to ensure that he loaded type is converted to a supported type

An unrelated change but performance wise significant change has been
applied to the `_get_transport_models_per_flow_model` method. This
method is used to match gwf models to gwt models so that gwfgwt
exchanges can be created. This method was doing a full comparison of
domains, which is expensive. There is also a method available that does
the comparison on domain level. By switching to this method the matching
algorithm becomes almost instantaneously.

**NOTE**
This PR has issue #1699 as a base. The base needs to altered to master
once that PR is in

**NOTE**
This PR also improves the `dump` method

**NOTE**
some timmings:

<img width="833" height="739" alt="image"
src="https://github.com/user-attachments/assets/974c841c-0413-4433-8486-1abe98dc0715"
/>

<img width="843" height="215" alt="image"
src="https://github.com/user-attachments/assets/c7082975-af35-4143-a6f9-860557b3eb09"
/>

<img width="842" height="705" alt="image"
src="https://github.com/user-attachments/assets/383bf1a6-f028-4cb4-aa72-48ab95e84e3d"
/>



<!---
Before requesting review, please go through this checklist:
-->

- [x] Links to correct issue
- [ ] Update changelog, if changes affect users
- [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737``
- [ ] Unit tests were added
- [ ] **If feature added**: Added/extended example
- [ ] **If feature added**: Added feature to API documentation
- [ ] **If pixi.lock was changed**: Ran `pixi run generate-sbom` and
committed changes

---------

Co-authored-by: JoerivanEngelen <joerivanengelen@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants