Skip to content

Conversation

@pwolfram
Copy link
Contributor

@pwolfram pwolfram commented Mar 29, 2017

This specifies a maximum chunk size for each multifile dataset that is opened to avoid dask out of memory errors.

@pwolfram
Copy link
Contributor Author

@xylar and @milenaveneziani, I went back and dug into the code. I need to do some more work on this but this is the correct approach.

@pwolfram
Copy link
Contributor Author

This will get us a short term fix before #164 can be more fully resolved, performance errors and all. This should resolve #161 explicitly to the error.

@pwolfram pwolfram force-pushed the fix_moc_memory_error_simple branch 2 times, most recently from e390db0 to d643154 Compare March 29, 2017 23:18
@pwolfram
Copy link
Contributor Author

@xylar and @milenaveneziani, this fix should solve #161 for the target case. Please let me know if this works for you. It is working for me and it is fairly fast too-- I don't think you'll be as disappointed as in #164.

Note, we still need some type of fix in #164 to prevent loading of computed results into memory, which could produce a memory error at large enough scale. But, if we stay with xarray datasets they can be larger than memory and still get written to disk, which is a huge benefit.

@milenaveneziani
Copy link
Collaborator

@pwolfram: can you please explain a bit more what this does? Is it going to solve the issue of loading big 3d data of the kind we need to load in for the MOC calculation?
Also, I wasn't disappointed by #164: I thought it was great that you were able to use xarray for all computations. We just need to solve all problems involved in computing the MOC (which, btw, has always been a heavy computation in the past for other models as well. A good reason for accelerating testing and turning on of the MOC AM).

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@pwolfram, this is great! I'm still testing the EC60to30 on my laptop now with maxChunkSize = 2000, but it seems like the QU240 case ran great and the memory consumption for both seems totally reasonable.

I'll run a longer test on Edison and then I'm happy to merge.

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@pwolfram, I'm going to remove the mpas_xarray flag because I think we only want to use it when we explicitly modify mpas_xarray and not the generalized reader. If you disagree, say so, and we can discuss.

@xylar xylar removed the mpas_xarray label Mar 30, 2017
@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@pwolfram, I'm currently trying to test this on Edison but am running into trouble with batch jobs (unrelated to this PR). Specifically, when I try to access ncremap, it's using the default python environment from when I launch a new shell, not the one I specifically loaded in with the following module commands in the job script:

module unload python
module unload python_base
module unload cray-netcdf-hdf5parallel
module unload cray-hdf5-parallel
module load cray-hdf5
module load cray-netcdf
module use /global/project/projectdirs/acme/software/modulefiles/all
module load python/anaconda-2.7-climate

Normally, I use a different python environment that is set up for POPSICLES, not MPAS, so NCO is not available.

I tried on an interactive job and everything worked fine, as expected, so I don't know what is going wrong with my batch jobs. I'm trying again with ... python run_analysis.py ... instead of just ... ./run_analysis.py ... in case that makes a difference.

@pwolfram
Copy link
Contributor Author

@milenaveneziani,

@pwolfram: can you please explain a bit more what this does? Is it going to solve the issue of loading big 3d data of the kind we need to load in for the MOC calculation?

This essentially ensures that calculations done with xarray and dask don't overflow memory. It turns out this is something that is the user responsibility for now (pydata/xarray#1338). This PR solves the immediate issue but is incomplete.

Also, I wasn't disappointed by #164: I thought it was great that you were able to use xarray for all computations. We just need to solve all problems involved in computing the MOC (which, btw, has always been a heavy computation in the past for other models as well. A good reason for accelerating testing and turning on of the MOC AM).

We also need #164 for even larger datasets, but it is less time sensitive than this PR.

# select only the data in the specified range of dates
ds = ds.sel(Time=slice(startDate, endDate))

# limit chunk size to prevent memory error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, I'm thinking that this should this be moved as a function into mpas_xarray's preprocess with the argument maxChunkSize because this is a problem that needs handled for general use of xarray. This seems better organized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with that change. In that case, add the mpas_xarray tag back.

@pwolfram pwolfram force-pushed the fix_moc_memory_error_simple branch 2 times, most recently from 3bc1713 to affae3f Compare March 30, 2017 14:36
@pwolfram pwolfram force-pushed the fix_moc_memory_error_simple branch from affae3f to 3b08774 Compare March 30, 2017 14:38
@pwolfram
Copy link
Contributor Author

@xylar, I'm satisfied with this PR and if you are satisfied it fixes #161 then please feel free to merge at your convenience.

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@pwolfram, I still haven't been able to run any of the analysis that uses ncremap on the compute nodes, which I would have preferred to be able to do. But I'm running MOC on its own right now. As soon as it's done and I have a chance to make sure the results look reasonable, I'll merge.

@xylar xylar merged commit 3b08774 into MPAS-Dev:develop Mar 30, 2017
@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

Okay, everything seems to be behaving well both on Edison and on my laptop. Nice work!

@pwolfram pwolfram deleted the fix_moc_memory_error_simple branch March 30, 2017 15:58
@xylar xylar mentioned this pull request Mar 30, 2017
48 tasks
@milenaveneziani
Copy link
Collaborator

@pwolfram, @xylar: thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants