Performance of interp_like when converting dfsu to dfs2 for large files #413

rywm-dhi · 2022-09-05T11:12:46Z

Is your feature request related to a problem? Please describe.
Converting large dfsu files to dfs2 seems to take a long time (for a 120mb item, it took roughly 6 hours). I used the following code:

ds = mikeio.read(dfsu_filename, items=item, time=time)
da = ds[item]
g = da.geometry.get_overset_grid(dx=dxdy)
da_grid = da.interp_like(g)

The PC has 64GB memory and it's CPU is: Intel Xeon E5-2650 v4 @ 2.20Ghz.

Describe the solution you'd like
Is there anything I could do to make this run faster?

Describe alternatives you've considered
I haven't tried anything yet. One idea is to vectorize the for loop in "_interp_itemstep".

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

ecomodeller · 2022-09-05T13:00:04Z

😟6h!

More details needed:

Number of timesteps
Number of elements in the dfsu file
Output grid resolution (dxdy)

@jsmariegaard Do we really need a loop here?

mikeio/mikeio/interpolation.py

Lines 124 to 125 in c06b415

    
           for j in range(ni): 
        
               idat[j] = np.dot(data[elem_ids[j]], weights[j])

rywm-dhi · 2022-09-05T13:12:32Z

Haha, yea it's a long time to match a big file, but hopefully it could be improved nonetheless :) .

Number of timesteps
1 timestep (it's a statistics dfsu file of model results)

Number of elements in the dfsu file
10 871 367

Output grid resolution (dxdy)
1m x 1m

jsmariegaard · 2022-09-05T13:58:01Z

Bummer. It would be very helpful if you could profile it e.g. with cProfile and snakeviz. But that would of course require a smaller file or that you only consider a subset using the area argument...

> pip install snakeviz   

> python -m cProfile -o report.prof myscript.py
> snakeviz report.prof

But you're probably right about the element-loop. It doesn't look healthy.

jsmariegaard · 2022-09-05T14:02:34Z

You could also try with

da_grid = da.interp_like(g, n_nearest=1)

which use nearest point instead of IDW interpolation.

rywm-dhi · 2022-09-05T14:27:52Z

Here's the profile for testing it on a much smaller dfsu (report attached as zip). I'll try a test on a larger file overnight to see how scale might change that.

report.zip

jsmariegaard · 2022-09-05T14:44:39Z

Hmm, yes, the contains() method (to find out if points are inside the mesh) is actually what takes the most time in this case. But closely followed by the dot product. It may be better to set extrapolate=True and then remove points outside the domain afterwards

> da_grid = da.interp_like(g, extrapolation=True)   # will not call contains()

rywm-dhi · 2022-09-06T08:18:26Z

The results from the larger dfsu file are in (no. elements = 3 760 848). It looks like the contains() method is the main culprit. The block taking ~15% is the cKDTree, not _interp_itemstep (which is 3% here).

report.zip

jsmariegaard · 2022-09-06T09:15:56Z

Did you try

 da_grid = da.interp_like(g, extrapolation=True)

rywm-dhi · 2022-09-06T11:05:28Z

Did you try

 da_grid = da.interp_like(g, extrapolation=True)

This definitely helped a lot, thanks! The final result still needs masking though, I'll try it with the radius next to see if I could avoid that.

FYI, this is the same example as above but with extrapolation as true (5x faster):

cethceth · 2022-11-01T11:42:11Z

I have a similar size dfsu file and have the same issue, but I get an error when I try to set 'extrapolation=True':

Any idea how to solve this?

cethceth · 2022-11-01T11:43:13Z

.. and can it really be true that this process takes that long in Python when in Mike Zero it is quite fast?

jsmariegaard · 2022-11-01T12:32:36Z

@cethceth, which version of MIKE IO are you using?

cethceth · 2022-11-01T12:35:07Z

Solved it with 'extrapolate' instead of 'extrapolation' :)

ecomodeller · 2022-11-02T17:17:50Z

.. and can it really be true that this process takes that long in Python when in Mike Zero it is quite fast?

Are you referring to Create new Grid series - From Dfsu file (with the less than intuitive grid specification)?

cethceth · 2022-11-03T10:03:42Z

Yes. But it works fine with extrapolate now. Any idea of how to mask the output grid (fast)? I would prefer not having to load shapefiles.

ecomodeller · 2022-11-04T09:56:58Z

@cethceth maybe you can tell us a bit more about your use-case., not sure we can do anything about it right now, but it is useful to know. I suppose converting the dfsu to dfs2 is simply an intermediate step?

cethceth · 2022-11-04T11:49:02Z

@ecomodeller we want an automatic tool for converting dfsu files to tiff files which we use for reporting, calculation of damage cost etc.. We have that now. But since the output tiff files are not masked this has to be done as a post processing step in GIS. I was hoping to gather all the processing in python.. and also avoid having to load shapefiles in python for masking the output. If there was a way to get the extent of the dfsu file which could then be used when masking the dfs2/tiff file in python that would be really nice :)

ecomodeller mentioned this issue Sep 5, 2022

Interpolation matrix multiply (testing...) #414

Closed

jsmariegaard mentioned this issue Sep 6, 2022

Faster interpolation #415

Merged

ecomodeller added the enhancement New feature or request label Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of interp_like when converting dfsu to dfs2 for large files #413

Performance of interp_like when converting dfsu to dfs2 for large files #413

rywm-dhi commented Sep 5, 2022 •

edited

ecomodeller commented Sep 5, 2022

rywm-dhi commented Sep 5, 2022

jsmariegaard commented Sep 5, 2022

jsmariegaard commented Sep 5, 2022

rywm-dhi commented Sep 5, 2022

jsmariegaard commented Sep 5, 2022

rywm-dhi commented Sep 6, 2022 •

edited

jsmariegaard commented Sep 6, 2022

rywm-dhi commented Sep 6, 2022

cethceth commented Nov 1, 2022

cethceth commented Nov 1, 2022

jsmariegaard commented Nov 1, 2022

cethceth commented Nov 1, 2022

ecomodeller commented Nov 2, 2022

cethceth commented Nov 3, 2022

ecomodeller commented Nov 4, 2022

cethceth commented Nov 4, 2022

Performance of interp_like when converting dfsu to dfs2 for large files #413

Performance of interp_like when converting dfsu to dfs2 for large files #413

Comments

rywm-dhi commented Sep 5, 2022 • edited

ecomodeller commented Sep 5, 2022

rywm-dhi commented Sep 5, 2022

jsmariegaard commented Sep 5, 2022

jsmariegaard commented Sep 5, 2022

rywm-dhi commented Sep 5, 2022

jsmariegaard commented Sep 5, 2022

rywm-dhi commented Sep 6, 2022 • edited

jsmariegaard commented Sep 6, 2022

rywm-dhi commented Sep 6, 2022

cethceth commented Nov 1, 2022

cethceth commented Nov 1, 2022

jsmariegaard commented Nov 1, 2022

cethceth commented Nov 1, 2022

ecomodeller commented Nov 2, 2022

cethceth commented Nov 3, 2022

ecomodeller commented Nov 4, 2022

cethceth commented Nov 4, 2022

rywm-dhi commented Sep 5, 2022 •

edited

rywm-dhi commented Sep 6, 2022 •

edited