Fields have incorrect sizes causing array out of bounds #44

douglasjacobsen · 2014-09-25T19:06:56Z

I only ran into this issue when using KPP, but the problem ends up being a combination of lines:
451 and 600-602 in https://github.com/CVMix/CVMix-src/blob/master/src/shared/cvmix_kpp.F90

Basically, the arrays new_Mdiff (and such) are allocated using nlev, which is the number of active levels in a column, not necessarily the number of total levels in the column. Later a direct copy happens moving old_Mdiff into Mdiff_out, but they have different sizes (one is active, one is total) which causes an array out of bounds with some compilers. Ideally, the arrays would be allocated using the total number of levels whether or not they are used.

mnlevy1981 · 2014-09-25T19:28:00Z

I'm not sure I see the problem... as far as CVMix is concerned, the number of active levels and the total number of levels in the column are the same. That is to say, if CVmix_vars%nlev = 10, then CVmix_vars%Mdiff_iface should be size 11 (and clearly new_Mdiff is declared to be size 11 in line 451).

If you are using pointers into an nlev x ncol matrix, then you want to set up CVmix_vars%Mdiff_iface to be something like

CVmix_vars(col_index)%Mdiff_iface => global_Mdiff(1:CVMix_vars(col_index)%nlev+1,col_index)

douglasjacobsen · 2014-09-25T19:37:44Z

So, in MPAS we allocate the Mdiff arrays once (again to remove a performance from allocating inside an nCell loop). This means that Mdiff is allocated with a size of nVertLevels (total number of cells in a column).

Then we set nlev = maxLevelCell(iCell) where maxLevelCell is the number of active cells in a column (different from the total number). However, when you statically allocate the new_Mdiff arrays, you use nlev instead of whatever the size of the cvmix_vars % Mdiff_iface arrays are, then when you do the copy it gives an out of bounds error.

When using these models, I don't necessarily think it's a safe assumption that the number of active levels are the number of total levels. I thought that was the whole point of passing in nlev, was that each column could have more levels than they have active, but you would only loop over nlev levels. One possible fix would be adding a cvmix_var % maxnlev variable, and then using that for these statically allocated arrays. That would allow models to provide columns with only some active levels in them.

mnlevy1981 · 2014-09-25T19:58:15Z

The fix I'm leaning towards is changing lines 612 - 613 to read

nlev_p1 = size(Mdiff_out)
nlev = nlev_p1 - 1

Because nlev and nlev_p1 should be the number of active levels and interfaces (respectively). If I do that immediately after the variable declarations in the subroutine, I can fix the bug you've described by initializing Mdiff_out to just the first nlev_p1 elements in the old_Mdiff array:

 Mdiff_out = old_Mdiff(1:nlev_p1)
Tdiff_out = old_Tdiff(1:nlev_p1)
Sdiff_out = old_Sdiff(1:nlev_p1)

How does that sound to you? Unfortunately, I don't have a good idea of how widespread this bug is... but much of the coding is done with the assumption that active levels = total levels. Please let me know if you run into this issue elsewhere and I'll keep working on 'em!

As an aside, there is a max_nlev attribute in the cvmix_global_params datatype, but it isn't very widely used (only in background and tidal mixing, I believe).

douglasjacobsen · 2014-09-25T19:59:48Z

I can definitely give that one a try. I tried adding the maxnlev bit, and that seemed to fix the issues I'm running into. I'll post back soon with results from testing the fix you're proposing.

douglasjacobsen · 2014-09-25T20:03:33Z

That fixed it for me, so I would be happy with that fix.

Do you think you could push it and make a new tag? Then I'll point MPAS at that tag.

Doug Jacobsen points out that MPAS-O differentiates between total number of vertical levels and active number of vertical levels... so the array CVmix_vars%Mdiff_iface may have more than CVmix_vars%nlev+1 elements. The subroutine cvmix_coeffs_kpp_low did not consider the situation where Mdiff_out was a smaller array than old_Mdiff.

mnlevy1981 · 2014-09-25T20:36:15Z

This is fixed in v0.55-beta

douglasjacobsen · 2014-09-25T22:37:00Z

It actually seems like the issue is in more places than just KPP. I just ran into the same issue in convection (though the code is different).

How difficult would it be to make use of the global_params type in all of the parameterizations (to get at max_nlev)?

mnlevy1981 · 2014-09-25T22:48:48Z

I'm not sure I see how to use max_nlev to fix this problem. Correct me if I'm wrong, but the issue boils down to the following:

when using any of the cvmix_coeffs_*_wrap routines, the output is expected to be size nlev+1 (where nlev is the active number of levels in a column) and the input is either nlev+1 or max_nlev+1 (POP uses the former, MPAS-O uses the latter)

Forcing the output to be max_nlev+1 would require forcing the input to be max_nlev+1 as well, and I don't think we want to make that choice for users. It seems to me like the best fix is to keep setting

Mdiff_out = old_Mdiff(1:nlev+1)

rather than assuming that Mdiff_out and old_Mdiff are the same size. I'll go through the rest of the mixing modules and see what I can do, but I suspect you'll be pointing out a few more instances of this bug in future beta tags.

douglasjacobsen · 2014-09-25T22:54:32Z

I think assuming input and output arrays are the same size would be a good assumption. Then a model could chose what array size they want to input.

I think the current problem ends up being that previously (before the allocate statements were removed) the arrays were assumed to be the same size, but after that performance change the sizes could be different (i.e. the input size isn't actually used, instead nlev is used).

I'm wondering if there isn't an easier way to fix it than going through and trying to copy selected portions of arrays.

mnlevy1981 · 2014-09-25T23:07:00Z

Another alternative would be to pass only a portion of CVmix_vars%Mdiff_iface to cvmix_coeffs_*_low. For example, change lines 474-484 of cvmix_kpp.F90 to

call cvmix_coeffs_kpp(new_Mdiff, new_Tdiff, new_Sdiff, &
                      CVmix_vars%zw_iface(1:CVmix_vars%nlev+1), &
                      CVmix_vars%zt_cntr(1:CVmix_vars%nlev), &
                      CVmix_vars%Mdiff_iface(1:CVmix_vars%nlev+1), &
                      CVmix_vars%Tdiff_iface(1:CVmix_vars%nlev+1), &
                      CVMix_vars%Sdiff_iface(1:CVmix_vars%nlev+1), &
                      CVmix_vars%BoundaryLayerDepth, &
                      CVmix_vars%kOBL_depth, &
                      CVmix_vars%kpp_Tnonlocal_iface, &
                      CVmix_vars%kpp_Snonlocal_iface, &
                      CVmix_vars%SurfaceFriction, &
                      CVmix_vars%SurfaceBuoyancyForcing, &
                      CVmix_kpp_params_user)

This might still cause problems for users trying to skip the wrapped interface, but we can deal with that issue if it ever comes up. (Maybe nlev should be an optional input to the low level interface rather than assuming Mdiff_out is the "correct" size?)

douglasjacobsen · 2014-09-25T23:15:59Z

I want to recommend changing the wrap routine to have an optional max_nlev argument, but that won't fix the problem unless it's required and not optional.

I was thinking we could put a pointer to the global type in each of the subtypes, and somehow associate it with a global type. Then when these arrays are statically allocated you could use cvmix_vars%global_params%max_nlev instead of cvmix_vars%nlev.

I really think if the sizes of those arrays are changed, that fixes all the problems. It really just ends up being that those arrays are not allocated by the "user" and their size is assumed to be the same as the user defined arrays, which is not necessarily true.

The convection problem I ran into was on line 299, but in that case it seems like the sizes of the arrays (Mdiff and something else) are assumed to be the same, which at that point isn't true again since it's using the smaller Mdiff array, and a larger other array.

douglasjacobsen · 2014-09-25T23:30:36Z

I guess the solution that seems the best to me (right now at least) would be adding max_nlev into the cvmix_data_type. And then insteaed of using cvmix_vars % nlev for the definition of these Mdiff arrays, use cvmix_vars % max_nlev. That allows them to be different, without changing any other code in cvmix.

I'll push a branch up and then you can look at it and let me know what you think.

douglasjacobsen · 2014-09-25T23:35:51Z

I just pushed to this branch:
https://github.com/CVMix/CVMix-src/tree/fix_array_mismatch

Feel free to merge it if you're happy with it, or we can discuss possible solutions more if you don't think this is sufficient.

As far as my tests go, this fixes the issues I've been running into (both in kpp and convection).

In cvmix_coeffs_kpp_wrap, the size of Mdiff_out, Tdiff_out, and Sdiff_out depend on the size of CVmix_vars%zw_iface rather than CVmix_vars%nlev+1 (to account for models that allocate memory for vertical data before knowing exactly how many levels each column will use. (Without explicitly storing max_nlev anywhere, some columns create arrays of size max_nlev and max_nlev+1 in CVmix_vars but then set CVmix_vars%nlev to a smaller number.) Two comments: 1) I don't like the kludgy nature of changing CVmix_vars%nlev (twice) in the wrapped interface; one option would be to add an option nlev argument to cvmix_put_real_1D. 2) This needs to be done in all the other modules as well.

mnlevy1981 · 2014-10-27T16:28:41Z

I think 42d33fb fixed this for real

douglasjacobsen added the bug label Sep 25, 2014

mnlevy1981 closed this as completed Sep 25, 2014

douglasjacobsen reopened this Sep 25, 2014

mnlevy1981 self-assigned this Oct 27, 2014

mnlevy1981 added this to the First public release milestone Oct 27, 2014

mnlevy1981 closed this as completed Oct 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fields have incorrect sizes causing array out of bounds #44

Fields have incorrect sizes causing array out of bounds #44

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Oct 27, 2014

Fields have incorrect sizes causing array out of bounds #44

Fields have incorrect sizes causing array out of bounds #44

Comments

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

douglasjacobsen commented Sep 25, 2014

mnlevy1981 commented Oct 27, 2014