Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numba for speed up #27

Closed
wants to merge 10 commits into from
Closed

Numba for speed up #27

wants to merge 10 commits into from

Conversation

amoodie
Copy link
Member

@amoodie amoodie commented May 7, 2020

Highlights

I was messing about and tried to see what kind of speed-up we could get with just using Numba.

I profiled the following loop on develop (i.e., the tests/time_water.py script):

delta = pyDeltaRCM()
start = time.time()
for _t in range(0, 10):
    print("timestep:", _t)
    delta.update()
end = time.time()

for an elapsed time = 1.2779 min.

With some really simple changes to the water routing routines (only changed one of three list(map(lambda(... calls) and change of random picking in a few places, I get an elapsed time 0.5052 min.

This is really promising for #1, I think.

Question

How much do we care about exact consistency, down to the random number sequences?

These changes use the Numba random number generator (rather than the numpy rng), and so they produce different results. For example, the tests/test_output_consistency.py test fails with the Numba implementation. The expected bed elevation at a small slice after one update():

what is returned with Numba:

array([[ 0.        , -1.        , -1.        , -1.        ,  0.        ],
       [-0.92157906, -0.9095978 , -0.8295175...99838   , -0.99919   ],
       [-1.        , -1.        , -1.        , -1.        , -1.        ]],

what is expected:

array([[ 0.        , -1.        , -1.        , -1.        ,  0.        ],
       [-0.8469125 , -0.82371545, -0.8295175...   , -0.9981775 , -0.9989875 , -0.9997975 ],
       [-1.        , -1.        , -1.        , -1.        , -1.        ]]))

This ultimately would be an issue for any C implementations too. My feeling is that it's fine to break exact consistency between versions, as long as we note it in a changelog. I think breaking behavior changes of the model are much more important, and require a higher-level need to allow.

Anyway, I don't want to merge this now, just looking for some disucssion. @elbeejay @ericbarefoot @ammilten

Also, sorry the commit history will shorten when #26 is merged.

@amoodie amoodie marked this pull request as ready for review May 7, 2020 20:01
@amoodie amoodie marked this pull request as draft May 7, 2020 20:01
@amoodie
Copy link
Member Author

amoodie commented May 8, 2020

On second thought, reproducible runs are important, so I'll look at ways to make sure the runs are deterministic from a seed.

@amoodie
Copy link
Member Author

amoodie commented May 8, 2020

okay, well turns out that was pretty easy, just have to make sure the rng seed is set in a jitted function, and then only and always call np.random from jitted functions.

import time
import numba

# utilities used in various places in the model and docs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very similar in intent to the shared_tools.py file we have. Any reason they can't be combined? (I have no attachment to either name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this would eliminate the shared_tools file. I have this done in another commit, but haven't pushed it. I'll do some of the jitting again, on top of Eric's changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that numba doesn't support jitting classes in the way that we would like to use them (for example, they don't import inheritance) and so the way you've done it seems like the simplest way forward for us.

@elbeejay
Copy link
Member

Idea is quite nice, that is a pretty significant speed up indeed.

ericbarefoot added a commit to ericbarefoot/pyDeltaRCM that referenced this pull request May 22, 2020
@ericbarefoot ericbarefoot mentioned this pull request May 22, 2020
ericbarefoot added a commit to ericbarefoot/pyDeltaRCM that referenced this pull request May 26, 2020
This makes them available outside the object, which is important in some
cases, and the class just gets them from here.

feat: make shared_tools into a module, not a class.

This is in preparation for jitting, since we want them to be standalone
functions.

feat: break out step that updates water parcel positions

These functions now become helper utility functions in shared_tools, and
are wrapped by update_Q, which handles them all, and passes them what
they need.

refactor: break out sed functions into smaller functions

This modularizes the code and makes it easier to compile.

feat: jit the functions

apply jit to functions, and fix some type problems that came along with
refactoring.

refactor: move next step calculation to jitted functions

The next slowest operation was getting the indexes of the next cells, so
I moved that to a jitted function too.

refactor: jit the calculation of weight array for the water

fix: move random number generation into numba

Copy of what @amoodie did in DeltaRCM#27

fix: some broken tests to test the shared tools portion.

fix: add numba to requirements

fix: move all inlet picking into the random_pick function

There's essentially no difference between the random_pick function and
the random_pick_inlet function other than there is no preference given
for the inlet cells, so you can pick whichever one, so I just adjust the
mechanism to use the existing random pick machinery.

fix: water and sediment routing bugs that caused inconsistent behavior

There were two issues with the refactoring. One was that water routing
developed a flat flow field because I wasn't adding each passing water
parcel to the Q field. The other was related to sediment parcel routing,
where I had transcribed the i and j of stepping incorrectly in the
jitted functions.
@ericbarefoot
Copy link
Collaborator

Since we merged #36, I'm going to go ahead and close this. Thanks for the great idea and all the good investigation and development that this PR motivated.

amoodie pushed a commit to amoodie/pyDeltaRCM that referenced this pull request Feb 25, 2021
This makes them available outside the object, which is important in some
cases, and the class just gets them from here.

feat: make shared_tools into a module, not a class.

This is in preparation for jitting, since we want them to be standalone
functions.

feat: break out step that updates water parcel positions

These functions now become helper utility functions in shared_tools, and
are wrapped by update_Q, which handles them all, and passes them what
they need.

refactor: break out sed functions into smaller functions

This modularizes the code and makes it easier to compile.

feat: jit the functions

apply jit to functions, and fix some type problems that came along with
refactoring.

refactor: move next step calculation to jitted functions

The next slowest operation was getting the indexes of the next cells, so
I moved that to a jitted function too.

refactor: jit the calculation of weight array for the water

fix: move random number generation into numba

Copy of what @amoodie did in DeltaRCM#27

fix: some broken tests to test the shared tools portion.

fix: add numba to requirements

fix: move all inlet picking into the random_pick function

There's essentially no difference between the random_pick function and
the random_pick_inlet function other than there is no preference given
for the inlet cells, so you can pick whichever one, so I just adjust the
mechanism to use the existing random pick machinery.

fix: water and sediment routing bugs that caused inconsistent behavior

There were two issues with the refactoring. One was that water routing
developed a flat flow field because I wasn't adding each passing water
parcel to the Q field. The other was related to sediment parcel routing,
where I had transcribed the i and j of stepping incorrectly in the
jitted functions.
@amoodie amoodie deleted the numba branch February 25, 2021 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants