Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use xugrid in mesh methods #535

Merged
merged 23 commits into from
Oct 6, 2023
Merged

use xugrid in mesh methods #535

merged 23 commits into from
Oct 6, 2023

Conversation

hboisgon
Copy link
Contributor

@hboisgon hboisgon commented Sep 29, 2023

Issue addressed

Fixes #420

Explanation

Replaced using zonal stats in mesh2d generic setup methods by xugrid Regridder. Also moved the functions to mesh worklows for easy re-use.

Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Tests & pre-commit hooks pass
  • Updated documentation if needed
  • Updated changelog.rst if needed

@hboisgon hboisgon self-assigned this Sep 29, 2023
@hboisgon
Copy link
Contributor Author

Hi @Huite I started to use the xugrid Regridder to resample regular gridded to mesh (I was using mesh to geopandas.GeoDataFrame and then zonal stats before). With the Regridder, our tests are now failing on timeout as they run for too long. I did notice that the first time I used any of the regrid methods it takes very long but if I run the tests a second or third time it goes much faster. Is that something you recognize?

@Huite
Copy link
Contributor

Huite commented Sep 29, 2023

Hi @hboisgon,

Yes, I recognize this. There are two reasons for latency:

  • Behind the scenes in xugrid, I use numba_celltree as the spatial index. As the name suggests, numba_celltree uses numba to do just-in-time (JIT) compilation to speed up functions. Numba produces good machine code, but it is definitely NOT a fast compiler. If you check numba_celltree, you will find that I'm caching all functions. In normal use (on your local machine), this means it might take a minute or so to compile the first time around. All subsequent calls are then fast (even after restarting Python), because it can use the cache (which contains the machine code).
  • I don't think you mean this: but another thing is that the regridder itself pre-computes all weights once (during initialization). All regrid calls should be quite fast, because it's just applying the regridding weights (lazily with dask arrays).

So I'm almost certain you're running into the first issue. How long does it take, exactly? And how long are you allowing it?
Secondly, how big are the datasets you're regridding? If your datasets are small(ish), you could run everything without numba, by setting an environmental variable (NUMBA_DISABLE_JIT=1). E.g. see the tox workflow for xugrid: https://github.com/Deltares/xugrid/blob/main/tox.ini
(Not really recommended, but this can also be done in an adhoc manner using import os; os.environ["NUMBA_DISABLE_JIT"] = "1". But it has to be specified before numba is imported.)

There's another option, but I don't think it does any good. Numba can also do fully ahead of time compilation (AOT), if you specify all function signatures. I can do this for numba_celltree, because I know exactly what will end up going in every function (basically only 64-bit integers and 64-bit floats). However, this just moves the compilation to the setup phase, rather than during runtime. I don't think your total CI time will improve.

Another option is just using Numba as a AOT compiler, and providing wheels. But in that case, I (almost) might as well use C++ or Fortran (or Rust) instead. For me, the primary benefits of numba is not having to distribute binaries!

Alternatively, you could speed up testing a lot (potentially), if you would be able to re-use environments in CI workflows. I've investigated in the past, and it was hopeless, but maybe things have changed.

@Huite
Copy link
Contributor

Huite commented Sep 29, 2023

There's another option, which is that I tell Numba to optimize less. I'm currently inlining relatively many functions in numba_celltree. This allows the compiler to optimize more aggressively and results in relevant speedups (e.g. 30-40%), but also doubles compilation time.

As mentioned, in "normal use", this seems worth it since most users will incur the compilation time only once every few months or so, and then in their day to day use everything just goes 30% faster. That seems like it's well worth it -- but for CI it's different, because it's incurring the compilation cost every time.

@Huite
Copy link
Contributor

Huite commented Sep 29, 2023

Best to be concrete of course...
In my experience, compiling takes around a minute or so. 120s is in the same order of magnitude. The reason it's failing is because of the pytest plugin:

image

@Hofer-Julian Had a good suggestion here: use a fixture to force the compilation beforehand. That way, you still get the useful time out messages from pytest-timeout. It will still increase your CI time notably of course which is kinda meh, but installing always takes at least a few minutes anyway...

You can get a session fixture as follows:

@pytest.fixture(scope="session", autouse=True)

See e.g.: https://github.com/Deltares/imod_coupler/blob/9f4ec17b77add4452b6d1cbc564e61c9f15e8d9d/tests/fixtures/fixture_paths.py#L24

@savente93
Copy link
Contributor

I'm working on a better Ci setup that can make better uses of the chaches. This might help resolve this, though nothing is guaranteed, just FYI

@savente93
Copy link
Contributor

@Huite Do you know where numba_celltree stores it's artifacts? (so I can add them to the CI cache)

@Huite
Copy link
Contributor

Huite commented Oct 2, 2023

I've never looked at it very carefully, but I think it's stored in the pycache for the package installation.

On the my windows machine, that's: c:\Users\bootsma\.conda\envs\main\Lib\site-packages\numba_celltree\__pycache__\

There's a number of files there with suffixes nbi and nbc (suggesting numba). At first glance, some sizes seem plausible for executable programs.

@savente93 savente93 mentioned this pull request Oct 3, 2023
5 tasks
@DirkEilander
Copy link
Contributor

DirkEilander commented Oct 6, 2023

Thanks for the input @Huite and tests @savente93

I've tried a couple of options. By forcing the compilation beforehand with @pytest.fixture(scope="session", autouse=True) all tests still ran into the timeout limit, see 03b9034 and test logs

I see two possible solutions:

  • running with NUMBA_DISABLE_JIT=1 gives the fastest results. As long as we keep the tests small and don't repeat using jitted functions we don't really see the benefit of using jit anyway and this is probably the best option, see 5c12ede
  • increasing the timeout on the specific tests that use xugrid regridding. This is currently the only option that results in passing tests, see e3de517

For the first solution we currently run into an issue with pyflwdir which requires some maintenance, see test log. I'll see if I can get that sorted in Deltares/pyflwdir#36 after which we can decide for the best solution here.

@DirkEilander
Copy link
Contributor

DirkEilander commented Oct 6, 2023

@savente93 can you review my proposed solution for the test timeout using NUMBA_DISABLE_JIT=1 and see if you agree. This is a factor ~2 faster for our tests in which case all tests remain well within the 120 sec timeout.

The new code from @hboisgon looks good to me.

Copy link
Contributor

@savente93 savente93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this solves the problem, I'm very happy with the solution. I have one small comment which may or may not be valid. If it isn't feel free to just ignore it and merged. Thank you for solving this :)

pyproject.toml Outdated Show resolved Hide resolved
@DirkEilander DirkEilander merged commit a26e620 into main Oct 6, 2023
8 checks passed
@DirkEilander DirkEilander deleted the xugrid_regrid branch October 6, 2023 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use xugrid regridding in generic mesh model setup methods
4 participants