Skip to content

Conversation

@pwolfram
Copy link
Contributor

This removes memory error resulting from conversion of xarray dataset to numpy,
e.g., attempting to load in a dataset larger than on-node memory. Use of xarray and
dask prevents this problem via memory chunking.

@pwolfram
Copy link
Contributor Author

This PR is to resolve #161 and is pending further testing.

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

@pwolfram, I'm happy to test this on all the runs that support the MOC (i.e. where we have appropriate mask files). Is it at the point where I should do that or are you still verifying that the end result isn't changed from before?

@pwolfram
Copy link
Contributor Author

Thanks @xylar, I have a long test running and am working on getting a fast test completed to do the verification.

Can you please take a quick look at the code to make sure you are ok with the changes as of yet? One thing I did not do here was to remove the reading and writing of the cached files via netCDF4. That may be something we want to do, however, it doesn't seem high priority so I've skipped it at this point.

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

@pwolfram, we definitely want to keep the reading and writing of the cache files. As you see, the computation takes a long time so we definitely only want to do it once per result, even if the analysis gets run multiple times as the job progresses.

If you want to convert the code to use xarray to do that I/O, that's fine. But please don't get rid of it entirely. And please do make sure that it can successfully pick up where it left off by, say, requesting a time series of 2 years and then one of 3 years.

I'll take a look at the code and see whether I have any specific comments.

@pwolfram
Copy link
Contributor Author

Sorry-- I meant switch out netCDF4 with xarray, not remove the computation. I didn't do this refractor.

transportZ = (refLayerThickness*horizontalVel.where(transect)
*transectEdgeMaskSigns.where(transect)
*dvEdge.where(transect)
).sum('nEdges')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, very nice! I had seen that this could be vectorized and I appreciate you doing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xylar!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice @pwolfram! I spent some time to get this right without the loop, alas unsuccessfully. Thanks for taking care of it.

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

I haven't taken a look at the code in a great deal of detail but what I see looks great!

mocTop = xr.DataArray(np.zeros([len(nz), len(latBins)]),
coords=[nz, latBins])
mocTop[1:, 0] = transportZ.cumsum(axis=0)
for iLat in np.arange(1, len(latBins)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove this loop once pydata/xarray#924 is realized.

Copy link
Contributor Author

@pwolfram pwolfram Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        cellbins = np.logical_and(latCell >= latBins[iLat-1],
                                  latCell < latBins[iLat])
        mocTop[:, iLat] = mocTop[:, iLat-1] + velArea.where(cellbins).sum('nCells')

is just clunky and probably slow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw your comment. But I don't think that loop is a problem. There's a lot of computation going on in there so I can't imagine vectorizing it will make a huge difference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt the line you pointed out is that slow. I'm sure it's negligible in comparison to the sum below.

Copy link
Contributor Author

@pwolfram pwolfram Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things aren't always intuitive when it comes to dask because some optimizations can be performed, e.g., groupby_bins was faster than a by-hand equivalent. But, you are probably right unless there is some fancy graph-based optimization that could be used in the generalized, multi-index groupy_bins approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just be careful about not over-optimizing code that isn't actually a bottleneck...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, in order to use groupby_bins here, couldn't you just mask out the full dataset much earlier on using regionCellMask (probably by using isel), and then use groupby_bins using latitude bins? That way, you don't need 2 arguments to groupby.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that mocTop needs to end up to be a 2D array. The groupby_bins will collapse that down to a single binned variable. In the example at http://xarray.pydata.org/en/stable/groupby.html, lat gets reduced and we need it to stay unreduced.

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

@pwolfram, I couldn't resist testing. On the QU240 test (on my laptop), the climatology part worked fine but the time-series part produced this error:

  Compute and/or plot post-processed Atlantic MOC time series...
   Load data...
   Process and save MOC time series
     date: 0001-01
Traceback (most recent call last):
  File "./run_analysis.py", line 243, in <module>
    analysis(config)
  File "./run_analysis.py", line 182, in analysis
    moc_streamfunction(config)
  File "/home/xylar/code/mpas-work/analysis/fix_moc_memory_error/mpas_analysis/ocean/meridional_overturning_circulation.py", line 137, in moc_streamfunction
    regionNames, dictTseries, mocDictClimo, dictRegion)
  File "/home/xylar/code/mpas-work/analysis/fix_moc_memory_error/mpas_analysis/ocean/meridional_overturning_circulation.py", line 458, in _compute_moc_time_series_postprocess
    mocAtlantic26 = np.amax(mocTop[:, indlat26])
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/dataarray.py", line 469, in __getitem__
    return self.isel(**self._item_key_to_dict(key))
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/dataarray.py", line 657, in isel
    ds = self._to_temp_dataset().isel(drop=drop, **indexers)
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/dataset.py", line 1119, in isel
    new_var = var.isel(**var_indexers)
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/variable.py", line 547, in isel
    return self[tuple(key)]
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/variable.py", line 377, in __getitem__
    values = self._indexable_data[key]
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/indexing.py", line 467, in __getitem__
    key = self._convert_key(key)
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/indexing.py", line 454, in _convert_key
    key = orthogonal_indexer(key, self.shape)
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/indexing.py", line 79, in orthogonal_indexer
    key = list(canonicalize_indexer(key, len(shape)))
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/indexing.py", line 67, in canonicalize_indexer
    return tuple(canonicalize(k) for k in expanded_indexer(key, ndim))
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/indexing.py", line 67, in <genexpr>
    return tuple(canonicalize(k) for k in expanded_indexer(key, ndim))
  File "/home/xylar/anaconda2/envs/default/lib/python2.7/site-packages/xarray/core/indexing.py", line 56, in canonicalize
    raise ValueError('orthogonal array indexing only supports '
ValueError: orthogonal array indexing only supports 1d arrays

@pwolfram
Copy link
Contributor Author

@xylar, is the QU240 test on IC? It looks like I flipped an index.

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

@pwolfram, yep, just look in the configs/lanl folder.

@pwolfram
Copy link
Contributor Author

If the QU240 isn't on IC would it be possible to get more information so that I can reproduce the error?

@pwolfram
Copy link
Contributor Author

Thanks @xylar!

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

Sorry, let me amend that to say, look in configs/lanl in #163. I've fixed the path to the mapping and mask files in that PR.

@pwolfram
Copy link
Contributor Author

Gotcha, just reproduced the error.

@pwolfram pwolfram force-pushed the fix_moc_memory_error branch from 3586d8a to d64f127 Compare March 28, 2017 21:07
@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

So my tests suggest that the climatology calculation is, indeed, using reasonable amounts of memory. It's super slow so far on the EC60to30 grid but maybe you can't get the one without the other?

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

Rerunning the QU240 case with your latest push.

@pwolfram
Copy link
Contributor Author

pwolfram commented Mar 28, 2017

It's super slow so far on the EC60to30 grid but maybe you can't get the one without the other?

I'm not sure I understand what you mean-- are you saying that now that the memory is under control the runtime is slow?

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

I'm not sure I understand what you mean-- are you saying that now that the memory is under control the runtime is slow?

Yep, seems that way.

@xylar
Copy link
Collaborator

xylar commented Mar 28, 2017

QU240 is still crashing. Let me know if you're not seeing that on your side, and I'll post details.

@pwolfram
Copy link
Contributor Author

I've got it too-- debugging now

@pwolfram pwolfram force-pushed the fix_moc_memory_error branch from d64f127 to 1e4f925 Compare March 28, 2017 21:40
@pwolfram
Copy link
Contributor Author

@xylar, the code completed and visually is identical. They aren't strictly BFB, but are practically:

└─▪ ~/src/scrIpT/test_zeros_diff_netcdf.py output_trusted/timeseries/mocTimeSeries.nc output_candidate/timeseries/mocTimeSeries.nc 
Determining if fields are the same for output_trusted/timeseries/mocTimeSeries.nc and output_candidate/timeseries/mocTimeSeries.nc:
 
testing Time -- Time is bit identical.
testing mocAtlantic26 -- ERROR mocAtlantic26 field is not bit-zero for min!
min   -0.343137635398 -0.343137635398
max   7.77140371566 7.77140371566
mean  2.55768124242 2.55768124242
min  diff  -1.7763568394e-15
max  diff  9.15933995316e-16
mean diff  -3.58509518369e-17

@pwolfram
Copy link
Contributor Author

I'll also run a quick performance comparison between the two, although to be fair this isn't as representative as that with a full production run.

load = partial(preload, maxgb=config.getfloat('input',
'reasonableArraySizeGB'))
horizontalVel = load(ds.avgNormalVelocity.isel(Time=tIndex))
velArea = load(ds.avgVertVelocityTop.isel(Time=tIndex) * areaCell)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it actually be clearer to read if you just cached the value of reasonableArraySizeGB and had 2 calls to preload without the need to use partial? If we want future developers to use preload, I think we want it to be clear how to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it this way is to parallel the xarray load method. Also, it seems like reasonableArraySizeGB isn't really something we need to change per call to preload. But, partial is confusing for first-time users. I also thought about making reasonableArraySizeGB a global but decided against it for obvious reasons.

Do you think this needs changed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of a style choice but I'd prefer having it be as straightforward as possible for someone who might use this code as a starting point to figure out how to do something similar in another analysis.

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@pwolfram, I noticed an irritating bug where the figure title and caption is using startYearClimo and endYearClimo instead of the potentially updated values in dictClimo['startYearClimo'] and dictClimo['endYearClimo']. I thought I had fixed this in #157 but obviously not. Would you be willing to replace these on lines 166 and 169 or would you like me to submit a separate PR once this is merged?

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@pwolfram, so it's still quite slow in my testing. Suppose we want to do a 50-year time series, this is going to take hours and hours. What kind of performance were you getting?

@pwolfram
Copy link
Contributor Author

@vanroekel, I don't think we're at the point where it makes sense to test on such a big problem yet. In fact, I'm not sure post-processing the MOC will ever be possible with such a high-res grid. We would really need a big parallel job to do it, and I don't think we're heading that direction with the analysis.
I'll be happy to make you a region mask for the 18to6 grid once there is another (more computationally feasible) task to use it for, though.

@xylar, can you please get @vanroekel set up with the files needed to test the 18to6? We should be set up from #164 for this to work and if it doesn't we should know sooner rather than later. Someone will eventually want to run the MOC on this case and we need to make sure that this works before someone finds a bug before us. I'm very grateful you found #161 before it caused a bigger problem. Thank you!

@pwolfram
Copy link
Contributor Author

@xylar, is f847c07 ok?

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@xylar, is f847c07 ok?

Could you make the same change on line 169?

@pwolfram pwolfram force-pushed the fix_moc_memory_error branch 2 times, most recently from 3e9797b to c2ddc7c Compare March 30, 2017 21:31
@pwolfram
Copy link
Contributor Author

Sorry @xylar, thanks for catching that!

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@xylar, can you please get @vanroekel set up with the files needed to test the 18to6? We should be set up from #164 for this to work and if it doesn't we should know sooner rather than later. Someone will eventually want to run the MOC on this case and we need to make sure that this works before someone finds a bug before us. I'm very grateful you found #161 before it caused a bigger problem. Thank you!

I'm working on it now. I'm skeptical that the performance will be good enough to be practical. The RRS18to6 mesh has about 24 times as many edges and cells as the EC60to30, so it will take at least a factor of that much to do each computation. With each month of the MOC currently taking several minutes, I don't think the current performance is even practical for the EC60to30 runs, let alone for the bigger ones.

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

@vanroekel, if you look on LANL turquoise for the file

/usr/projects/climate/SHARED_CLIMATE/mpas_analysis/region_masks/RRS18to6_region_masks_and_southern_transect.nc

this is the RRS18to6 version of the regional mask file for the Atlantic. This should hopefully be all you need to run the streamfunctionMOC analysis. Let me know if you have trouble.

@vanroekel
Copy link
Collaborator

@xylar thanks! I'm trying now. I'm also testing this branch on the core2 G-case (6030 out 110 years now).

@xylar
Copy link
Collaborator

xylar commented Mar 30, 2017

So to give you guys a sense of the performance I'm seeing on Edison:

  • To compute a 10-year MOC climatology on the EC60to30 grid took about 1 hour
  • 4.5 years of the Atlantic MOC time series has taken 2 hours. (That's a lot computing time for essentially a single number per 2.5 minutes.)

Multiply that by 24 and I would expect it to take at least a full day to compute a 10-year MOC climatology on the 18to6 and about an hour to compute a single Atlantic MOC time point on the RRS18to6 grid. Like I said, I don't really think this postprocessing of the MOC is going to be practical for the big grids.

@vanroekel
Copy link
Collaborator

Update on my testing. I've been running the 18to6 for almost 9 hours and it is not completed. This is on 1 year of data. Also, I'm running the 100 years of CORE2 data on blues, it is also not completed (nearly 9 hours as well) I see this output thus far

Plotting streamfunction of Meridional Overturning Circulation (MOC)...

  List of files for climatologies:
/lcrc/group/acme/jwolfe/acme_scratch/20170320.beta0.GMPAS-IAF.T62_oECv3.anvil/run/./mpaso.hist.am.timeSeriesStatsMonthly.0001-12-01.nc through /lcrc/group/acme/jwolfe/acme_scratch/20170320.beta0.GMPAS-IAF.T62_oECv3.anvil/run/./mpaso.hist.am.timeSeriesStatsMonthly.0096-01-01.nc

  List of files for timeSeries:
/lcrc/group/acme/jwolfe/acme_scratch/20170320.beta0.GMPAS-IAF.T62_oECv3.anvil/run/./mpaso.hist.am.timeSeriesStatsMonthly.0001-12-01.nc through /lcrc/group/acme/jwolfe/acme_scratch/20170320.beta0.GMPAS-IAF.T62_oECv3.anvil/run/./mpaso.hist.am.timeSeriesStatsMonthly.0096-01-01.nc

  Reading region and transect mask for Atlantic...

  Compute and/or plot post-processed MOC climatological streamfunction...
   Load data...
/lcrc/group/acme/lvanroe/conda/lib/python2.7/site-packages/dask/array/rechunk.py:347: RuntimeWarning: invalid value encountered in double_scalars
  key=lambda k: np.log(graph_size_effect[k]) / np.log(block_size_effect[k]),

Is there a way to check if this is working? Am I missing a necessary setting? My config file is

/home/lvanroe/MPAS-Analsysis/configLuke

I was able to run 70 years through MOC in 2.5 hours before without issue, so I feel like I'm not setting this up right.

@xylar
Copy link
Collaborator

xylar commented Mar 31, 2017

@vanroekel, you can take a look at the cache files in clim/mpas/moc*.nc and timeSeries/moc*.nc. The time series file gets updated once per simulation year, while the climatology cache file gets written only once.

As I have commented above, this move to purely xarray seems to have slowed things down a lot (maybe by a factor of 10) from what is now in develop. I think this is too much of a performance hit to be practical so I think we should hold off on merging this PR and try to make some more optimizations.

I don't think I have access to the machine where you're running to look at your scripts. But (other than the weird "invalid value encountered" warning) everything sounds like it's behaving as expected and you didn't configure anything wrong.

@pwolfram
Copy link
Contributor Author

@xylar, I agree. @vanroekel, can you confirm whether develop crashes on the 18to6 if that is easy to try? The overarching goal here is to make sure the analysis is robust on the production runs and that should give us a good indication.

@pwolfram pwolfram force-pushed the fix_moc_memory_error branch from 0f85336 to 43f3712 Compare April 2, 2017 23:04
@pwolfram
Copy link
Contributor Author

pwolfram commented Apr 3, 2017

Just talked with @vanroekel-- even develop doesn't finish for the 18to6 (didn't finish loading data). The issue is that loading of data is too slow and is the bottleneck. The 60to30 works with the current version of develop.

Do we want / need to address this issue now @xylar? We can do so with xarray + dask.distributed using multiple nodes to load in data in parallel. But, this is a level of complexity it sounds like we currently don't want or need to support, especially since MOC should be done online. In my estimation, we aren't at this point right now.

I'm going to close this PR now because it sounds like it is out of scope for our needs and we solved issue #161 via #164, which supercedes this PR. Please reopen if you disagree.

@pwolfram pwolfram closed this Apr 3, 2017
@xylar
Copy link
Collaborator

xylar commented Apr 3, 2017

Agreed. I hope this work can be used later on, but for now I do think we need to explore other approaches. I don't think we necessarily want to support offline computation of the MOC for grids bigger than EC60to30. I would think that's what analysis mode in MPAS itself could be used for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants