Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fields have incorrect sizes causing array out of bounds #44

Closed
douglasjacobsen opened this issue Sep 25, 2014 · 14 comments
Closed

Fields have incorrect sizes causing array out of bounds #44

douglasjacobsen opened this issue Sep 25, 2014 · 14 comments
Assignees
Labels

Comments

@douglasjacobsen
Copy link
Member

I only ran into this issue when using KPP, but the problem ends up being a combination of lines:
451 and 600-602 in https://github.com/CVMix/CVMix-src/blob/master/src/shared/cvmix_kpp.F90

Basically, the arrays new_Mdiff (and such) are allocated using nlev, which is the number of active levels in a column, not necessarily the number of total levels in the column. Later a direct copy happens moving old_Mdiff into Mdiff_out, but they have different sizes (one is active, one is total) which causes an array out of bounds with some compilers. Ideally, the arrays would be allocated using the total number of levels whether or not they are used.

@mnlevy1981
Copy link
Contributor

I'm not sure I see the problem... as far as CVMix is concerned, the number of active levels and the total number of levels in the column are the same. That is to say, if CVmix_vars%nlev = 10, then CVmix_vars%Mdiff_iface should be size 11 (and clearly new_Mdiff is declared to be size 11 in line 451).

If you are using pointers into an nlev x ncol matrix, then you want to set up CVmix_vars%Mdiff_iface to be something like

CVmix_vars(col_index)%Mdiff_iface => global_Mdiff(1:CVMix_vars(col_index)%nlev+1,col_index)

@douglasjacobsen
Copy link
Member Author

So, in MPAS we allocate the Mdiff arrays once (again to remove a performance from allocating inside an nCell loop). This means that Mdiff is allocated with a size of nVertLevels (total number of cells in a column).

Then we set nlev = maxLevelCell(iCell) where maxLevelCell is the number of active cells in a column (different from the total number). However, when you statically allocate the new_Mdiff arrays, you use nlev instead of whatever the size of the cvmix_vars % Mdiff_iface arrays are, then when you do the copy it gives an out of bounds error.

When using these models, I don't necessarily think it's a safe assumption that the number of active levels are the number of total levels. I thought that was the whole point of passing in nlev, was that each column could have more levels than they have active, but you would only loop over nlev levels. One possible fix would be adding a cvmix_var % maxnlev variable, and then using that for these statically allocated arrays. That would allow models to provide columns with only some active levels in them.

@mnlevy1981
Copy link
Contributor

The fix I'm leaning towards is changing lines 612 - 613 to read

nlev_p1 = size(Mdiff_out)
nlev = nlev_p1 - 1

Because nlev and nlev_p1 should be the number of active levels and interfaces (respectively). If I do that immediately after the variable declarations in the subroutine, I can fix the bug you've described by initializing Mdiff_out to just the first nlev_p1 elements in the old_Mdiff array:

 Mdiff_out = old_Mdiff(1:nlev_p1)
Tdiff_out = old_Tdiff(1:nlev_p1)
Sdiff_out = old_Sdiff(1:nlev_p1)

How does that sound to you? Unfortunately, I don't have a good idea of how widespread this bug is... but much of the coding is done with the assumption that active levels = total levels. Please let me know if you run into this issue elsewhere and I'll keep working on 'em!

As an aside, there is a max_nlev attribute in the cvmix_global_params datatype, but it isn't very widely used (only in background and tidal mixing, I believe).

@douglasjacobsen
Copy link
Member Author

I can definitely give that one a try. I tried adding the maxnlev bit, and that seemed to fix the issues I'm running into. I'll post back soon with results from testing the fix you're proposing.

@douglasjacobsen
Copy link
Member Author

That fixed it for me, so I would be happy with that fix.

Do you think you could push it and make a new tag? Then I'll point MPAS at that tag.

mnlevy1981 added a commit that referenced this issue Sep 25, 2014
Doug Jacobsen points out that MPAS-O differentiates between total number of
vertical levels and active number of vertical levels... so the array
CVmix_vars%Mdiff_iface may have more than CVmix_vars%nlev+1 elements. The
subroutine cvmix_coeffs_kpp_low did not consider the situation where Mdiff_out
was a smaller array than old_Mdiff.
@mnlevy1981
Copy link
Contributor

This is fixed in v0.55-beta

@douglasjacobsen
Copy link
Member Author

It actually seems like the issue is in more places than just KPP. I just ran into the same issue in convection (though the code is different).

How difficult would it be to make use of the global_params type in all of the parameterizations (to get at max_nlev)?

@mnlevy1981
Copy link
Contributor

I'm not sure I see how to use max_nlev to fix this problem. Correct me if I'm wrong, but the issue boils down to the following:

  • when using any of the cvmix_coeffs_*_wrap routines, the output is expected to be size nlev+1 (where nlev is the active number of levels in a column) and the input is either nlev+1 or max_nlev+1 (POP uses the former, MPAS-O uses the latter)

Forcing the output to be max_nlev+1 would require forcing the input to be max_nlev+1 as well, and I don't think we want to make that choice for users. It seems to me like the best fix is to keep setting

Mdiff_out = old_Mdiff(1:nlev+1)

rather than assuming that Mdiff_out and old_Mdiff are the same size. I'll go through the rest of the mixing modules and see what I can do, but I suspect you'll be pointing out a few more instances of this bug in future beta tags.

@douglasjacobsen
Copy link
Member Author

I think assuming input and output arrays are the same size would be a good assumption. Then a model could chose what array size they want to input.

I think the current problem ends up being that previously (before the allocate statements were removed) the arrays were assumed to be the same size, but after that performance change the sizes could be different (i.e. the input size isn't actually used, instead nlev is used).

I'm wondering if there isn't an easier way to fix it than going through and trying to copy selected portions of arrays.

@mnlevy1981
Copy link
Contributor

Another alternative would be to pass only a portion of CVmix_vars%Mdiff_iface to cvmix_coeffs_*_low. For example, change lines 474-484 of cvmix_kpp.F90 to

call cvmix_coeffs_kpp(new_Mdiff, new_Tdiff, new_Sdiff, &
                      CVmix_vars%zw_iface(1:CVmix_vars%nlev+1), &
                      CVmix_vars%zt_cntr(1:CVmix_vars%nlev), &
                      CVmix_vars%Mdiff_iface(1:CVmix_vars%nlev+1), &
                      CVmix_vars%Tdiff_iface(1:CVmix_vars%nlev+1), &
                      CVMix_vars%Sdiff_iface(1:CVmix_vars%nlev+1), &
                      CVmix_vars%BoundaryLayerDepth, &
                      CVmix_vars%kOBL_depth, &
                      CVmix_vars%kpp_Tnonlocal_iface, &
                      CVmix_vars%kpp_Snonlocal_iface, &
                      CVmix_vars%SurfaceFriction, &
                      CVmix_vars%SurfaceBuoyancyForcing, &
                      CVmix_kpp_params_user)

This might still cause problems for users trying to skip the wrapped interface, but we can deal with that issue if it ever comes up. (Maybe nlev should be an optional input to the low level interface rather than assuming Mdiff_out is the "correct" size?)

@douglasjacobsen
Copy link
Member Author

I want to recommend changing the wrap routine to have an optional max_nlev argument, but that won't fix the problem unless it's required and not optional.

I was thinking we could put a pointer to the global type in each of the subtypes, and somehow associate it with a global type. Then when these arrays are statically allocated you could use cvmix_vars%global_params%max_nlev instead of cvmix_vars%nlev.

I really think if the sizes of those arrays are changed, that fixes all the problems. It really just ends up being that those arrays are not allocated by the "user" and their size is assumed to be the same as the user defined arrays, which is not necessarily true.

The convection problem I ran into was on line 299, but in that case it seems like the sizes of the arrays (Mdiff and something else) are assumed to be the same, which at that point isn't true again since it's using the smaller Mdiff array, and a larger other array.

@douglasjacobsen
Copy link
Member Author

I guess the solution that seems the best to me (right now at least) would be adding max_nlev into the cvmix_data_type. And then insteaed of using cvmix_vars % nlev for the definition of these Mdiff arrays, use cvmix_vars % max_nlev. That allows them to be different, without changing any other code in cvmix.

I'll push a branch up and then you can look at it and let me know what you think.

@douglasjacobsen
Copy link
Member Author

I just pushed to this branch:
https://github.com/CVMix/CVMix-src/tree/fix_array_mismatch

Feel free to merge it if you're happy with it, or we can discuss possible solutions more if you don't think this is sufficient.

As far as my tests go, this fixes the issues I've been running into (both in kpp and convection).

mnlevy1981 added a commit that referenced this issue Sep 26, 2014
In cvmix_coeffs_kpp_wrap, the size of Mdiff_out, Tdiff_out, and Sdiff_out
depend on the size of CVmix_vars%zw_iface rather than CVmix_vars%nlev+1 (to
account for models that allocate memory for vertical data before knowing
exactly how many levels each column will use. (Without explicitly storing
max_nlev anywhere, some columns create arrays of size max_nlev and max_nlev+1
in CVmix_vars but then set CVmix_vars%nlev to a smaller number.)

Two comments:

1) I don't like the kludgy nature of changing CVmix_vars%nlev (twice) in the
wrapped interface; one option would be to add an option nlev argument to
cvmix_put_real_1D.

2) This needs to be done in all the other modules as well.
@mnlevy1981 mnlevy1981 self-assigned this Oct 27, 2014
@mnlevy1981 mnlevy1981 added this to the First public release milestone Oct 27, 2014
@mnlevy1981
Copy link
Contributor

I think 42d33fb fixed this for real

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants