Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory usage in scaloa #10

Open
dksasaki opened this issue Aug 11, 2021 · 5 comments
Open

memory usage in scaloa #10

dksasaki opened this issue Aug 11, 2021 · 5 comments

Comments

@dksasaki
Copy link

dksasaki commented Aug 11, 2021

Hey guys,

I've used your objective mapping function scaloa and noticed there is a simple way to reduce memory usage.

The variables d2 and dc2 can occupy a huge memory space, so deleting them after defining both correlation and cross correlation matrices (A,C, respectively) and before inverting the matrix is useful. In one of my cases, it it frees up a few gbs of memory (of course, this depends on both grid and data).

(...)
    d2 = ((np.tile(x, (n, 1)).T - np.tile(x, (n, 1))) ** 2 +
    (np.tile(y, (n, 1)).T - np.tile(y, (n, 1))) ** 2)
    nv = len(xc)
    xc, yc = np.reshape(xc, (1, nv)), np.reshape(yc, (1, nv))
    # Squared distance between the observations and the grid points.
    dc2 = ((np.tile(xc, (n, 1)).T - np.tile(x, (nv, 1))) ** 2 +
    (np.tile(yc, (n, 1)).T - np.tile(y, (nv, 1))) ** 2)
    # Correlation matrix between stations (A) and cross correlation (stations
    # and grid points (C))
    A = (1 - err) * np.exp(-d2 / corrlen ** 2)
    C = (1 - err) * np.exp(-dc2 / corrlen ** 2)
    if 0: # NOTE: If the parameter zc is used (`scaloa2.m`)
        A = (1 - d2 / zc ** 2) * np.exp(-d2 / corrlen ** 2)
        C = (1 - dc2 / zc ** 2) * np.exp(-dc2 / corrlen ** 2)
        
    # here!!!!!!!!!   <----------
    del(d2, dc2)
        
(...)
@iuryt
Copy link
Member

iuryt commented Aug 17, 2021

Hi @dksasaki,

Thanks for raising this issue.

Do you mean that there is a memory leakage after running the function or just cleaning these variables before running the rest of the interpolation to reduce the peak of memory usage?

We could check how other packages usually deal with this problem and see if del is the best solution.

Me and @dantecn were also thinking about creating an option for breaking grid points into blocks to reduce performance while keeping low memory usage.

We can also just simply add an example for that on documentation.

@dksasaki
Copy link
Author

Hi @iuryt,

There is no memory leakage. When the method runs, these extra matrices can contribute significantly to the memory usage making the peak in memory even worse. The del thing was just a quick-fix I added, but considering the simplicity if this solution I wonder what problems could arise from this choice.

Breaking the grid into chunks is a good idea, although the whole processing gets slower due to multiple matrix inversions. Let me know if you plan to implement it, I have written a few lines that may help.

@iuryt
Copy link
Member

iuryt commented Aug 17, 2021

If you want to implement breaking into blocks, go ahead. You can add an argument like nblocks=None to scaloa and vectoa.
Despite loosing performance, I believe this is a nice way to bypass memory overload. You may also add verbose=False that can activate some progress bar for lazy interpolation.

Can you check how other packages as xarray deal with cleaning memory?
I believe @Ryukamusa may be the best person in the group to check that as well.

Once you make some of the modifications on your forked repo, you can make a pull request and relate to this issue.

Please, let me know if you have any questions, we just started the group and we are also still learning how to manage the development process here.

@iuryt
Copy link
Member

iuryt commented Jul 18, 2023

It turns out that I came back here for some reason. I just think that we could make this package better by making it work with xarray, that way makes easy to paralelize or run it in slow mode using dask when needed.

@dksasaki
Copy link
Author

Sorry for not replying, I also forgot about this issue. I developed a way to make this piece faster without using as much memory. I basically only consider data from a given point within a certain distance range. Not sure how to use dask and xarray with it though, but we can give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants