Issue #1698 Model write optimization #1700

Manangka · 2025-10-20T07:24:56Z

Description

This is part 2 of fixing the performance issues with large model.
In part 1 #1693 the modelsplitter has been optimized. In this PR the focus is on wiring the partitioned model.

As @Huite pointed out in #1686 the performance bottleneck had to do with the fact that the same package had to be loaded from file multiple times while only a part of the file is actually needed.

After digging around for a while i discovered that this had to do with the fact how we open de the dataset.
dataset = xr.open_dataset(path, **kwargs)

In the line above we don't specify anything chunk related. That has as a result that when you access the dataset the entire file has to be loaded from disk. By simply adding chunks="auto" this is no longer the case and a huge performance gain is achieved.

There are some other changes related to setting chunking to auto. There are some parts of the code that don't expect to receive dask arrays. For instance you can use .item() on a dask array. Instead i now use .values[()].

I was also getting some errors when the to_netcdf method were called on the package. All of them had something to do with wrong/unsupported datatypes. In this PR you will find that an encoding is added for float16 types. And that in some packages the from_file method has been updated to ensure that he loaded type is converted to a supported type

An unrelated change but performance wise significant change has been applied to the _get_transport_models_per_flow_model method. This method is used to match gwf models to gwt models so that gwfgwt exchanges can be created. This method was doing a full comparison of domains, which is expensive. There is also a method available that does the comparison on domain level. By switching to this method the matching algorithm becomes almost instantaneously.

NOTE
This PR has issue #1699 as a base. The base needs to altered to master once that PR is in

NOTE
This PR also improves the dump method

NOTE
some timmings:

Links to correct issue
Update changelog, if changes affect users
PR title starts with Issue #nr, e.g. Issue #737
Unit tests were added
If feature added: Added/extended example
If feature added: Added feature to API documentation
If pixi.lock was changed: Ran pixi run generate-sbom and committed changes

…eading op the nc and zarr files

…al applicalble and some unittests are failing

# Conflicts: # pixi.lock

… the way the solution is updated in the split model

…idn't expect to recieve dask objects. Optimize the flow-transport model matcher (cherry picked from commit 9c5134a)

# Conflicts: # pixi.lock

…not drop layer dimension

…te_optimization

JoerivanEngelen · 2025-10-29T15:04:15Z

imod/mf6/wel.py

        like = ones_like(active)
        bottom = like * bottom
        top_2d = (like * top).sel(layer=1)
-        top_3d = bottom.shift(layer=1).fillna(top_2d)


Interesting that this is necessary! I think we can probably load top and bottom in the beginning of the function? Then it is more explicit and performance for consecutive computations will be faster.

JoerivanEngelen · 2025-10-29T15:04:38Z

imod/typing/grid.py



 @dispatch
 def is_equal(array1: xu.UgridDataArray, array2: xu.UgridDataArray) -> bool:


Nice optimization shortcut!

JoerivanEngelen · 2025-10-29T15:07:19Z

imod/mf6/simulation.py

                    path = f"{filename}.nc"
-                    exchange_package.dataset.to_netcdf(directory / path)
+                    exchange_package.dataset.to_netcdf(
+                        directory / path, format="NETCDF4"


I'm not sure whether we should hardcode "NETCDF4" format? Maybe we should add an netcdf_kwargs argument to dump , which accepts a dictionary with settings for to_netcdf? In pkgbase I see we forward kwargs, I think we can do that here as well?

JoerivanEngelen · 2025-10-29T15:13:33Z

imod/mf6/rch.py

+        vars = [
+            "species",
+        ]
+        for var in vars:
+            if var in instance.dataset:
+                instance.dataset[var] = instance.dataset[var].astype(str)
+
+        return instance


I think this string conversion should be part of the pkgbase class. Or BoundaryCondition class, as species can also be part of other packages.

JoerivanEngelen · 2025-10-29T15:25:56Z

imod/mf6/hfb.py


        """
        kwargs.update({"encoding": self._netcdf_encoding()})
+        kwargs.update({"format": "NETCDF4"})


See my comment about the format in pkgbase

JoerivanEngelen · 2025-10-29T15:28:01Z

imod/mf6/pkgbase.py


        """
        kwargs.update({"encoding": self._netcdf_encoding()})
+        kwargs.update({"format": "NETCDF4"})


We should probably add a netcdf_kwargs argument to the dump method so that users can specify how datasets should be written themselves. It could be they don't want to write with the netcdf4 library but with one of the many other options.

…t NetCDF format specification

sonarqubecloud · 2025-10-30T14:25:50Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

JoerivanEngelen

I've fixed my comments, thanks for creating this. This indeed speeds up write times considerably for the example!

Manangka added 17 commits October 13, 2025 19:13

Optimize modelsplitter

9ea9075

Remove old slice_model method

07cad1a

Clean up modelsplitter

0b631fa

Refactor modelsplitter

29b6f8d

Add documentation to the split method. More cleaning up

5457c49

Fix incorrect use of item()

6b71831

Directly call any() on the dask array

f72c774

Fix regrid error due to use of wrong interface

c201314

Rvert to earlier more efficient evaluation of has_overlap. Optimize r…

f19553c

…eading op the nc and zarr files

Revert chunk optimization when opening zarr or nc files. Its not genr…

9a99506

…al applicalble and some unittests are failing

Merge branch 'master' into model_splitter_optimization

372c1f7

# Conflicts: # pixi.lock

Apply review comment

8567903

Fix incorrect order of removing non-spatial dims

cbe5624

Fix lint error

f37972f

Refactor updating of th buy and ssm package after splitting. Refactor…

a32e989

… the way the solution is updated in the split model

Add comments to the ModelSplitter class

04f746e

Add chunking when opening netcdf files. Handle errors for code that d…

ba1e808

…idn't expect to recieve dask objects. Optimize the flow-transport model matcher (cherry picked from commit 9c5134a)

Manangka changed the base branch from master to issue_#1698_model_splitter_optimization October 20, 2025 07:25

Manangka changed the title ~~Issue #1698 Model write optimization~~ Issue #1696 Model write optimization Oct 20, 2025

Manangka changed the title ~~Issue #1696 Model write optimization~~ Issue #1698 Model write optimization Oct 20, 2025

Manangka requested review from Huite and JoerivanEngelen October 20, 2025 07:42

Manangka marked this pull request as ready for review October 20, 2025 07:48

Manangka mentioned this pull request Oct 20, 2025

Model split perf #1686

Closed

Merge branch 'master' into model_splitter_optimization

67ca384

# Conflicts: # pixi.lock

Manangka changed the base branch from issue_#1698_model_splitter_optimization to model_splitter_optimization October 20, 2025 08:22

FransRoelofsen added this to iMOD Suite Oct 20, 2025

FransRoelofsen moved this to 🏗 In Progress in iMOD Suite Oct 20, 2025

JoerivanEngelen added 2 commits October 28, 2025 13:15

Move logic to test whether to skip a package into its own method. Do …

e86f735

…not drop layer dimension

Rename to update_dependent_packages

1967f2d

JoerivanEngelen added 10 commits October 28, 2025 13:27

Reflow paragraphs to 80 characters.

c265e41

Change hardcoded reference to "index" to call to .is_empty()

dd47d59

Add trim_time_dimension logic to ModelSplitter

0273210

format

5afc773

Fix mypy errors

7f87bf6

revert toml and lock file

037bd18

Merge branch 'master' into model_splitter_optimization

d5129f7

Update changelog

86fd4a4

Reduce cognitive load a bit

6df0093

Merge branch 'master' into model_splitter_optimization

59157e0

Base automatically changed from model_splitter_optimization to master October 29, 2025 11:08

JoerivanEngelen changed the base branch from master to model_splitter_optimization October 29, 2025 14:47

Merge branch 'model_splitter_optimization' into issue_#1698_model_wri…

e8ce8f5

…te_optimization

JoerivanEngelen changed the base branch from model_splitter_optimization to master October 29, 2025 14:50

Merge branch 'master' into issue_#1698_model_write_optimization

9b9ac40

JoerivanEngelen reviewed Oct 29, 2025

View reviewed changes

JoerivanEngelen moved this from 🏗 In Progress to 🧐 In Review in iMOD Suite Oct 29, 2025

JoerivanEngelen mentioned this pull request Oct 29, 2025

.values[()] in _get_unfiltered_pkg_options should be .item()? #1682

Open

JoerivanEngelen added 5 commits October 30, 2025 12:19

Move compute upwards

01deede

Move string conversion to base class deserialization; remove redundan…

2eda736

…t NetCDF format specification

Remove ruff warning

8d48854

Remove NETCDF4 format specification from HFB to_netcdf call

b1e9bd2

Fix broken test with extra compute call

f68f2b5

JoerivanEngelen approved these changes Oct 30, 2025

View reviewed changes

JoerivanEngelen merged commit ae8ff7c into master Oct 30, 2025
7 checks passed

JoerivanEngelen deleted the issue_#1698_model_write_optimization branch October 30, 2025 15:34

github-project-automation bot moved this from 🧐 In Review to ✅ Done in iMOD Suite Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue #1698 Model write optimization #1700

Issue #1698 Model write optimization #1700

Uh oh!

Manangka commented Oct 20, 2025 •

edited by JoerivanEngelen

Loading

Uh oh!

JoerivanEngelen Oct 29, 2025

Uh oh!

JoerivanEngelen Oct 29, 2025

Uh oh!

JoerivanEngelen Oct 29, 2025

Uh oh!

JoerivanEngelen Oct 29, 2025

Uh oh!

JoerivanEngelen Oct 29, 2025

Uh oh!

JoerivanEngelen Oct 29, 2025

Uh oh!

sonarqubecloud bot commented Oct 30, 2025

Uh oh!

JoerivanEngelen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@dispatch
		def is_equal(array1: xu.UgridDataArray, array2: xu.UgridDataArray) -> bool:

Issue #1698 Model write optimization #1700

Issue #1698 Model write optimization #1700

Uh oh!

Conversation

Manangka commented Oct 20, 2025 • edited by JoerivanEngelen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

JoerivanEngelen Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

JoerivanEngelen Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

JoerivanEngelen Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

JoerivanEngelen Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

JoerivanEngelen Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

JoerivanEngelen Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 30, 2025

Quality Gate passed

Uh oh!

JoerivanEngelen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Manangka commented Oct 20, 2025 •

edited by JoerivanEngelen

Loading