-
Notifications
You must be signed in to change notification settings - Fork 5
Model split perf #1686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model split perf #1686
Conversation
In a nutshell: make sure DIS is in RAM, avoid re-loading data over and over, avoid unnecessary work. Refactoring model splitter: Create a class to re-use data where possible. * DO NOT PURGE AFTER PARTITION MODEL CREATION. If data is spatially unchunked, this forces loading the entire dataset into memory. This means performance degrades linearly with the number of partitions (since each partition requires a separate load) unless all data is loaded into memory at start (not feasible for large models). * This implementation runs a groupby instead on the unpartitioned data (i.e. a single load) and counts the number of active elements. If a package has zero active elements, it is omitted. Some dispatching for point and line data (matches clipping logic). * Avoid looking for paired transport models; mapping is fully known a priori due to same partioning labels; keep them together in a NamedTuple (with some helpers). * Added some trailing returns (matter of taste) * Seems to increase performance from 2 hours (Joost) to 1 minute (me locally). The performance unfortunately still gets worse with each partition when writing the model. In principle, chunking fixes this (but requires data locality of unstructured meshes, can be done, see xugrid issues); alternatively, we might get dask to optimize writing. Needs research.
|
This script will re-order the data to a locality preserving order, and them dump everything to TOML + Zarr: # %%
import numpy as np
import imod
import xarray as xr
# %%
def morton_encode_2d(x, y):
"""
Encode 2D coordinates to Morton codes.
Uses 32 bits per dimension for 64-bit Morton codes.
Parameters
----------
x: np.ndarray of floats
y: np.ndarray of floats
Returns
-------
codes: np.ndarray
Morton codes as uint64
"""
x = np.asarray(x, dtype=np.float64)
y = np.asarray(y, dtype=np.float64)
min_x, max_x = np.min(x), np.max(x)
min_y, max_y = np.min(y), np.max(y)
# Add small padding to avoid edge cases
x_range = max_x - min_x
y_range = max_y - min_y
padding = 1e-12 # Smaller padding for maximum resolution
if x_range == 0:
x_range = 1.0
padding = 0.5
if y_range == 0:
y_range = 1.0
padding = 0.5
min_x -= x_range * padding
max_x += x_range * padding
min_y -= y_range * padding
max_y += y_range * padding
# Scale coordinates to [0, 2^32 - 1] for maximum resolution
max_coord = 0xFFFFFFFF # 2^32 - 1
x_scaled = ((x - min_x) / (max_x - min_x) * max_coord).astype(np.uint64)
y_scaled = ((y - min_y) / (max_y - min_y) * max_coord).astype(np.uint64)
# Clamp to valid range
x_scaled = np.clip(x_scaled, 0, max_coord)
y_scaled = np.clip(y_scaled, 0, max_coord)
# Dilate bits for 32-bit values - maximum resolution bit interleaving
def dilate_bits_32(vals):
"""Dilate 32-bit values by inserting zeros between bits"""
vals = vals & 0xFFFFFFFF # Ensure 32-bit
vals = (vals | (vals << 32)) & 0x00000000FFFFFFFF
vals = (vals | (vals << 16)) & 0x0000FFFF0000FFFF
vals = (vals | (vals << 8)) & 0x00FF00FF00FF00FF
vals = (vals | (vals << 4)) & 0x0F0F0F0F0F0F0F0F
vals = (vals | (vals << 2)) & 0x3333333333333333
vals = (vals | (vals << 1)) & 0x5555555555555555
return vals
# Dilate and interleave: x gets odd positions, y gets even positions
x_dilated = dilate_bits_32(x_scaled)
y_dilated = dilate_bits_32(y_scaled)
# Interleave by shifting y left by 1 and OR-ing with x
morton_codes = x_dilated | (y_dilated << 1)
return morton_codes
def reorder_morton(data):
grid = data.ugrid.grid
x, y = grid.centroids.T
codes = morton_encode_2d(x, y)
order = np.argsort(codes)
return data.isel({grid.face_dimension: order})
def reorder_and_rechunk_simulation(
sim: imod.mf6.Modflow6Simulation,
dimension: str = "mesh2d_nFaces",
chunks: dict[str, int]=None,
):
if chunks is None:
chunks = {"time": 10, dimension: 20_000}
for model in sim.values():
if isinstance(model, imod.mf6.model.Modflow6Model):
for package in model.values():
dims = package.dataset.dims
if dimension in dims:
package_chunks = {k: v for k, v in chunks.items() if k in dims}
package.dataset = reorder_morton(package.dataset)
# Set encoding chunks which will end up in the zarr file.
if package_chunks:
package.dataset.encoding = {"chunks": package_chunks}
return
# %%
path = "Z:/tmp/joost/v5_composiet_v1b_v2_nodsp_aut/v5_composiet_v1b_v2_nodsp.toml"
sim = imod.mf6.Modflow6Simulation.from_file(path)
reorder_and_rechunk_simulation(sim)
sim.dump("rechunked_zarrsim", validate=False, engine="zarr")
|
|
Annoyingly, I get an error inside the function calls that # %%
import imod
# %%
def load_non_spatial_data(sim, dimension: str = "mesh2d_nFaces"):
for model in sim.values():
if isinstance(model, imod.mf6.model.Modflow6Model):
for package in model.values():
if dimension not in package.dataset.dims:
package.dataset.load()
for var in package.dataset.data_vars:
if dimension not in package.dataset[var].dims:
package.dataset[var].load()
else:
model.dataset.load()
return
# %%
path = "rechunked-simulation/v5_composiet_v1b_v2_nodsp.toml"
sim = imod.mf6.Modflow6Simulation.from_file(path)
load_non_spatial_data(sim)
sim._validation_context = imod.mf6.ValidationSettings(strict_well_validation=False, ignore_time=True)
# %%
# Serial test
# sim.write("serial_simulation", validate=False)
submodel_labels = sim.create_partition_labels(npartitions=8)
sim_partitions = sim.split(submodel_labels, ignore_time_purge_empty=True)
sim_partitions._validation_context = imod.mf6.ValidationSettings(strict_well_validation=False, ignore_time=True)
sim_partitions.write("partitioned_simulation", validate=False)
# %%For future to do's: the morton curve ordering should become a part of xugrid. See: Deltares/xugrid#389 It may make sense to wrap the ordering on the simulation levels, as a method instead of a free function. |
|
Some general thoughts (I've had a little time to mull it over); we a minior optimization in the Secondly, in the get_unfiltered_options; we should probably use |
|
This issue has been addressed in #1693 |
|
Accidently closed this PR. |
|
|
The second part of this issue has been addressed in #1700 |
Fixes #1698 # Description This is part 2 of fixing the performance issues with large model. In part 1 #1693 the modelsplitter has been optimized. In this PR the focus is on wiring the partitioned model. As @Huite pointed out in #1686 the performance bottleneck had to do with the fact that the same package had to be loaded from file multiple times while only a part of the file is actually needed. After digging around for a while i discovered that this had to do with the fact how we open de the dataset. `dataset = xr.open_dataset(path, **kwargs)` In the line above we don't specify anything chunk related. That has as a result that when you access the dataset the entire file has to be loaded from disk. By simply adding `chunks="auto"` this is no longer the case and a huge performance gain is achieved. There are some other changes related to setting chunking to auto. There are some parts of the code that don't expect to receive dask arrays. For instance you can use .item() on a dask array. Instead i now use .values[()]. I was also getting some errors when the to_netcdf method were called on the package. All of them had something to do with wrong/unsupported datatypes. In this PR you will find that an encoding is added for float16 types. And that in some packages the from_file method has been updated to ensure that he loaded type is converted to a supported type An unrelated change but performance wise significant change has been applied to the `_get_transport_models_per_flow_model` method. This method is used to match gwf models to gwt models so that gwfgwt exchanges can be created. This method was doing a full comparison of domains, which is expensive. There is also a method available that does the comparison on domain level. By switching to this method the matching algorithm becomes almost instantaneously. **NOTE** This PR has issue #1699 as a base. The base needs to altered to master once that PR is in **NOTE** This PR also improves the `dump` method **NOTE** some timmings: <img width="833" height="739" alt="image" src="https://github.com/user-attachments/assets/974c841c-0413-4433-8486-1abe98dc0715" /> <img width="843" height="215" alt="image" src="https://github.com/user-attachments/assets/c7082975-af35-4143-a6f9-860557b3eb09" /> <img width="842" height="705" alt="image" src="https://github.com/user-attachments/assets/383bf1a6-f028-4cb4-aa72-48ab95e84e3d" /> <!--- Before requesting review, please go through this checklist: --> - [x] Links to correct issue - [ ] Update changelog, if changes affect users - [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737`` - [ ] Unit tests were added - [ ] **If feature added**: Added/extended example - [ ] **If feature added**: Added feature to API documentation - [ ] **If pixi.lock was changed**: Ran `pixi run generate-sbom` and committed changes --------- Co-authored-by: JoerivanEngelen <joerivanengelen@hotmail.com>
… in mf6 model (#1706) Fixes #1683 # Description This PR salvages the logic from #1686 to dump and import files from zarr and zip store and adds unit tests. # Checklist - [x] Links to correct issue - [x] Update changelog, if changes affect users - [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737`` - [x] Unit tests were added - [x] **If feature added**: Added/extended example - [x] **If feature added**: Added feature to API documentation - [ ] **If pixi.lock was changed**: Ran `pixi run generate-sbom` and committed changes --------- Co-authored-by: Sunny Titus <sunny.titus@deltares.nl>



For @jdelsman
Sample script:
The
sim.split()takes about 70 seconds here, which seems reasonable (compared to 2 hours before?).