Skip to content

Speed up the loading of large tables#2026

Merged
visr merged 1 commit into
mainfrom
overly-broad-cast
Feb 3, 2025
Merged

Speed up the loading of large tables#2026
visr merged 1 commit into
mainfrom
overly-broad-cast

Conversation

@visr
Copy link
Copy Markdown
Member

@visr visr commented Jan 31, 2025

This fixes a performance issue that @rbruijnshkv encountered trying to initialize a model with a Basin / time column of 6 million rows, spread over 1000 Basin nodes. It spent around 1-2 seconds per Basin node on this line. time is a StructVector, which stores columns as vectors. By broadcasting getfield we iterated over rows generating BasinTime structs and then taking one field, which works but is much slower than just taking out the field that is already a vector.

The general recommendation for such large tables is to not store them in the model database but a separate Arrow file like here: https://github.com/Deltares/Ribasim/blob/v2025.1.0/python/ribasim_testmodels/ribasim_testmodels/basic.py#L210. Doing this shrank the database from 400 to 100 MB, and also sped up initialization. This should help both formats though.

Copy link
Copy Markdown
Member

@evetion evetion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these kinds of improvements 🎉

@visr visr merged commit c75888a into main Feb 3, 2025
@visr visr deleted the overly-broad-cast branch February 3, 2025 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants