Fixes dask out of memory error (sets max chunks) #166

pwolfram · 2017-03-29T22:24:59Z

This specifies a maximum chunk size for each multifile dataset that is opened to avoid dask out of memory errors.

pwolfram · 2017-03-29T22:28:18Z

@xylar and @milenaveneziani, I went back and dug into the code. I need to do some more work on this but this is the correct approach.

pwolfram · 2017-03-29T22:29:11Z

This will get us a short term fix before #164 can be more fully resolved, performance errors and all. This should resolve #161 explicitly to the error.

pwolfram · 2017-03-29T23:20:40Z

@xylar and @milenaveneziani, this fix should solve #161 for the target case. Please let me know if this works for you. It is working for me and it is fairly fast too-- I don't think you'll be as disappointed as in #164.

Note, we still need some type of fix in #164 to prevent loading of computed results into memory, which could produce a memory error at large enough scale. But, if we stay with xarray datasets they can be larger than memory and still get written to disk, which is a huge benefit.

milenaveneziani · 2017-03-30T04:30:52Z

@pwolfram: can you please explain a bit more what this does? Is it going to solve the issue of loading big 3d data of the kind we need to load in for the MOC calculation?
Also, I wasn't disappointed by #164: I thought it was great that you were able to use xarray for all computations. We just need to solve all problems involved in computing the MOC (which, btw, has always been a heavy computation in the past for other models as well. A good reason for accelerating testing and turning on of the MOC AM).

xylar · 2017-03-30T07:29:28Z

@pwolfram, this is great! I'm still testing the EC60to30 on my laptop now with maxChunkSize = 2000, but it seems like the QU240 case ran great and the memory consumption for both seems totally reasonable.

I'll run a longer test on Edison and then I'm happy to merge.

xylar · 2017-03-30T07:30:23Z

@pwolfram, I'm going to remove the mpas_xarray flag because I think we only want to use it when we explicitly modify mpas_xarray and not the generalized reader. If you disagree, say so, and we can discuss.

xylar · 2017-03-30T10:58:26Z

@pwolfram, I'm currently trying to test this on Edison but am running into trouble with batch jobs (unrelated to this PR). Specifically, when I try to access ncremap, it's using the default python environment from when I launch a new shell, not the one I specifically loaded in with the following module commands in the job script:

module unload python
module unload python_base
module unload cray-netcdf-hdf5parallel
module unload cray-hdf5-parallel
module load cray-hdf5
module load cray-netcdf
module use /global/project/projectdirs/acme/software/modulefiles/all
module load python/anaconda-2.7-climate

Normally, I use a different python environment that is set up for POPSICLES, not MPAS, so NCO is not available.

I tried on an interactive job and everything worked fine, as expected, so I don't know what is going wrong with my batch jobs. I'm trying again with ... python run_analysis.py ... instead of just ... ./run_analysis.py ... in case that makes a difference.

pwolfram · 2017-03-30T11:56:12Z

@milenaveneziani,

@pwolfram: can you please explain a bit more what this does? Is it going to solve the issue of loading big 3d data of the kind we need to load in for the MOC calculation?

This essentially ensures that calculations done with xarray and dask don't overflow memory. It turns out this is something that is the user responsibility for now (pydata/xarray#1338). This PR solves the immediate issue but is incomplete.

Also, I wasn't disappointed by #164: I thought it was great that you were able to use xarray for all computations. We just need to solve all problems involved in computing the MOC (which, btw, has always been a heavy computation in the past for other models as well. A good reason for accelerating testing and turning on of the MOC AM).

We also need #164 for even larger datasets, but it is less time sensitive than this PR.

pwolfram · 2017-03-30T12:06:40Z

mpas_analysis/shared/generalized_reader/generalized_reader.py

    # select only the data in the specified range of dates
    ds = ds.sel(Time=slice(startDate, endDate))

+    # limit chunk size to prevent memory error


@xylar, I'm thinking that this should this be moved as a function into mpas_xarray's preprocess with the argument maxChunkSize because this is a problem that needs handled for general use of xarray. This seems better organized.

I'm good with that change. In that case, add the mpas_xarray tag back.

pwolfram · 2017-03-30T14:53:04Z

@xylar, I'm satisfied with this PR and if you are satisfied it fixes #161 then please feel free to merge at your convenience.

xylar · 2017-03-30T15:09:38Z

@pwolfram, I still haven't been able to run any of the analysis that uses ncremap on the compute nodes, which I would have preferred to be able to do. But I'm running MOC on its own right now. As soon as it's done and I have a chance to make sure the results look reasonable, I'll merge.

xylar · 2017-03-30T15:56:49Z

Okay, everything seems to be behaving well both on Edison and on my laptop. Nice work!

milenaveneziani · 2017-03-30T21:49:57Z

@pwolfram, @xylar: thanks!

pwolfram added the in progress label Mar 29, 2017

pwolfram force-pushed the fix_moc_memory_error_simple branch 2 times, most recently from e390db0 to d643154 Compare March 29, 2017 23:18

pwolfram added bug mpas_xarray priority and removed in progress labels Mar 29, 2017

milenaveneziani mentioned this pull request Mar 30, 2017

Fixes MOC memory error #164

Closed

xylar removed the mpas_xarray label Mar 30, 2017

pwolfram commented Mar 30, 2017

View reviewed changes

pwolfram added the mpas_xarray label Mar 30, 2017

pwolfram force-pushed the fix_moc_memory_error_simple branch 2 times, most recently from 3bc1713 to affae3f Compare March 30, 2017 14:36

pwolfram added 2 commits March 30, 2017 07:38

Fixes dask out of memory error (sets max chunks)

2514c06

Note out of memory possibility in MOC in comment

3b08774

pwolfram force-pushed the fix_moc_memory_error_simple branch from affae3f to 3b08774 Compare March 30, 2017 14:38

xylar merged commit 3b08774 into MPAS-Dev:develop Mar 30, 2017

pwolfram deleted the fix_moc_memory_error_simple branch March 30, 2017 15:58

xylar mentioned this pull request Mar 30, 2017

Checklist for v0.2 tag #154

Closed

48 tasks

xylar mentioned this pull request Mar 31, 2017

Memory error with MOC analysis on edison compute node #161

Closed

Fixes dask out of memory error (sets max chunks) #166

Fixes dask out of memory error (sets max chunks) #166

Uh oh!

Conversation

pwolfram commented Mar 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwolfram commented Mar 29, 2017

Uh oh!

pwolfram commented Mar 29, 2017

Uh oh!

pwolfram commented Mar 29, 2017

Uh oh!

milenaveneziani commented Mar 30, 2017

Uh oh!

xylar commented Mar 30, 2017

Uh oh!

xylar commented Mar 30, 2017

Uh oh!

xylar commented Mar 30, 2017

Uh oh!

pwolfram commented Mar 30, 2017

Uh oh!

pwolfram Mar 30, 2017

Choose a reason for hiding this comment

Uh oh!

xylar Mar 30, 2017

Choose a reason for hiding this comment

Uh oh!

pwolfram commented Mar 30, 2017

Uh oh!

xylar commented Mar 30, 2017

Uh oh!

xylar commented Mar 30, 2017

Uh oh!

milenaveneziani commented Mar 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pwolfram commented Mar 29, 2017 •

edited

Loading