Skip to content

Conversation

@Manangka
Copy link
Collaborator

@Manangka Manangka commented Oct 20, 2025

Fixes #1698

Description

This is part 2 of fixing the performance issues with large model.
In part 1 #1693 the modelsplitter has been optimized. In this PR the focus is on wiring the partitioned model.

As @Huite pointed out in #1686 the performance bottleneck had to do with the fact that the same package had to be loaded from file multiple times while only a part of the file is actually needed.

After digging around for a while i discovered that this had to do with the fact how we open de the dataset.
dataset = xr.open_dataset(path, **kwargs)

In the line above we don't specify anything chunk related. That has as a result that when you access the dataset the entire file has to be loaded from disk. By simply adding chunks="auto" this is no longer the case and a huge performance gain is achieved.

There are some other changes related to setting chunking to auto. There are some parts of the code that don't expect to receive dask arrays. For instance you can use .item() on a dask array. Instead i now use .values[()].

I was also getting some errors when the to_netcdf method were called on the package. All of them had something to do with wrong/unsupported datatypes. In this PR you will find that an encoding is added for float16 types. And that in some packages the from_file method has been updated to ensure that he loaded type is converted to a supported type

An unrelated change but performance wise significant change has been applied to the _get_transport_models_per_flow_model method. This method is used to match gwf models to gwt models so that gwfgwt exchanges can be created. This method was doing a full comparison of domains, which is expensive. There is also a method available that does the comparison on domain level. By switching to this method the matching algorithm becomes almost instantaneously.

NOTE
This PR has issue #1699 as a base. The base needs to altered to master once that PR is in

NOTE
This PR also improves the dump method

NOTE
some timmings:

image image image
  • Links to correct issue
  • Update changelog, if changes affect users
  • PR title starts with Issue #nr, e.g. Issue #737
  • Unit tests were added
  • If feature added: Added/extended example
  • If feature added: Added feature to API documentation
  • If pixi.lock was changed: Ran pixi run generate-sbom and committed changes

@Manangka Manangka changed the base branch from master to issue_#1698_model_splitter_optimization October 20, 2025 07:25
@Manangka Manangka changed the title Issue #1698 Model write optimization Issue #1696 Model write optimization Oct 20, 2025
@Manangka Manangka changed the title Issue #1696 Model write optimization Issue #1698 Model write optimization Oct 20, 2025
@Manangka Manangka marked this pull request as ready for review October 20, 2025 07:48
@Manangka Manangka mentioned this pull request Oct 20, 2025
@Manangka Manangka changed the base branch from issue_#1698_model_splitter_optimization to model_splitter_optimization October 20, 2025 08:22
@FransRoelofsen FransRoelofsen moved this to 🏗 In Progress in iMOD Suite Oct 20, 2025
Base automatically changed from model_splitter_optimization to master October 29, 2025 11:08
@JoerivanEngelen JoerivanEngelen changed the base branch from master to model_splitter_optimization October 29, 2025 14:47
@JoerivanEngelen JoerivanEngelen changed the base branch from model_splitter_optimization to master October 29, 2025 14:50
like = ones_like(active)
bottom = like * bottom
top_2d = (like * top).sel(layer=1)
top_3d = bottom.shift(layer=1).fillna(top_2d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this is necessary! I think we can probably load top and bottom in the beginning of the function? Then it is more explicit and performance for consecutive computations will be faster.



@dispatch
def is_equal(array1: xu.UgridDataArray, array2: xu.UgridDataArray) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization shortcut!

path = f"{filename}.nc"
exchange_package.dataset.to_netcdf(directory / path)
exchange_package.dataset.to_netcdf(
directory / path, format="NETCDF4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we should hardcode "NETCDF4" format? Maybe we should add an netcdf_kwargs argument to dump , which accepts a dictionary with settings for to_netcdf? In pkgbase I see we forward kwargs, I think we can do that here as well?

imod/mf6/rch.py Outdated
Comment on lines 177 to 184
vars = [
"species",
]
for var in vars:
if var in instance.dataset:
instance.dataset[var] = instance.dataset[var].astype(str)

return instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this string conversion should be part of the pkgbase class. Or BoundaryCondition class, as species can also be part of other packages.

imod/mf6/hfb.py Outdated
"""
kwargs.update({"encoding": self._netcdf_encoding()})
kwargs.update({"format": "NETCDF4"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about the format in pkgbase

"""
kwargs.update({"encoding": self._netcdf_encoding()})
kwargs.update({"format": "NETCDF4"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a netcdf_kwargs argument to the dump method so that users can specify how datasets should be written themselves. It could be they don't want to write with the netcdf4 library but with one of the many other options.

@JoerivanEngelen JoerivanEngelen moved this from 🏗 In Progress to 🧐 In Review in iMOD Suite Oct 29, 2025
@sonarqubecloud
Copy link

Copy link
Contributor

@JoerivanEngelen JoerivanEngelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed my comments, thanks for creating this. This indeed speeds up write times considerably for the example!

@JoerivanEngelen JoerivanEngelen merged commit ae8ff7c into master Oct 30, 2025
7 checks passed
@JoerivanEngelen JoerivanEngelen deleted the issue_#1698_model_write_optimization branch October 30, 2025 15:34
@github-project-automation github-project-automation bot moved this from 🧐 In Review to ✅ Done in iMOD Suite Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Improve write operation for large models

3 participants