Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ns flag of WB output causing crashes with vector-based setups #22

Closed
mee067 opened this issue Nov 26, 2023 · 15 comments
Closed

ns flag of WB output causing crashes with vector-based setups #22

mee067 opened this issue Nov 26, 2023 · 15 comments
Assignees
Labels
bug Unexpected or incorrect behaviour observed in a component of MESH

Comments

@mee067
Copy link

mee067 commented Nov 26, 2023

This seems to be a weird error as it does not occur in all cases. For example, I ran the same setup successfully using 25 soil layers. When I switched to 4 layers, I got the error. The crash is thrown by this line 1376 (in save_basin_output.f90)

ina = min(ina, shd%NAA)

because ina got assigned some strangely large number for the 3rd sub-basin and instead of being assigned 3.
This routine is called from line 766 (writing the WB output):

       !> Daily: IKEY_DLY
        if (ic%now%day /= ic%next%day) then
            if (btest(bnoflg%wb%t, 0)) then
!-                call save_water_balance(shd, 86400, IKEY_DLY)
                call write_water_balance(fls, shd, fls%fl(mfk%f900)%iun, 86400, shd%NAA, out%d)
                if (allocated(bnoflg%wb%ns)) then
                    do n = 1, size(bnoflg%wb%ns)
                        if (bnoflg%wb%ns(n) > 0) then
                            call write_water_balance(fls, shd, (fls%fl(mfk%f900)%iun*1000 + n), 86400, &
                                                     fms%stmg%meta%rnk(bnoflg%wb%ns(n)), out%d)
                        end if
                    end do
                end if
!-                call reset_water_balance(IKEY_DLY)
            end if

`

I printed bnoflg%wb%ns(n) upon assignment and it is always identical to "n", the loop counter. I printed it before calling the "write_water_balance" routine to and it came out as:
1
2
1112486707

before it crashed at the added print line, simply because there is no rank for such subbasin - it should be 3.

As I mentioned, I had no issue with this before the recent changes and I have no issues for grid-based setup (as far as I see). The same setup ran with 25 soil layers without crashes.

if bnoflg%wb%ns(n) = n; why do we need to save it?

I also noticed options nr and n for WB and these are not documented - we talked about "nr" before but I did not know it got implemented. I could not figure out the "n" option.

@dprincz
Copy link
Contributor

dprincz commented Nov 27, 2023

It's a generic routine to write the output provided any n. The check is meant to ensure 'n' is within the range from 1:NAA. Maybe before that line of ina = min(ina, NAA), you could print the two values to see which or if both are causing the issue in that instance. Functionally, there --should-- be no difference between the indices of vector v. grid-based setups...

The option of 'n' is to print the specified grid cell(s). 'nr' is a shortcut to automatically print those associated with reservoirs. Similar 'ns' is a shortcut to automatically print those associated with outlet locations. While they might exist in part, I don't know if the other options have been completely implemented.

@mee067
Copy link
Author

mee067 commented Nov 27, 2023

I understand the routine is generic for both grid-based and vector-based setups. As I said, it ran fine for the same vector-based setup when I used 25 soil layers (with a dp-compiled version). It looks like a weird error - at one point it picks some wrong number from the memory causing a crash.

I printed NAA and ina upon the routine call, and it gave an error for ina because the index went wrong (from the routine call). I stepped the printing up to the calling routine and got the numbers given above. For n = 3, something went wrong, and bnoflg%wb%ns(3) = 1112486707. I printed the values upon assignment in the initialization routine, and bnoflg%wb%ns(3)=3.

obviously, 1112486707 is out of range for the array and thus it crashes.

Is "nr" an option for the WB flag? or is it activated via "REACHOUTFLAG"?

How to use the "n" option? do we specify a list of ranks after it?

@mee067
Copy link
Author

mee067 commented Nov 27, 2023

seems it has something to do with the recent fixes - older compiled code works. That's weird because I did not touch this module in the recent fixes and bnoflg%wb%ns is kind of local to the module, although it is saved between the function calls as per line 133 before any subroutine definition:

type(BasinOutputConfig), save :: bnoflg

should we try to make it a global variable?

@mee067
Copy link
Author

mee067 commented Nov 27, 2023

I cannot understand it but it does have something to do with the LongSimFix. I reversed the fix and the code works in such a case. There is a hidden link between the rte variables (changing the 999999 to 1) and bnoflg%wb%ns.

@mee067
Copy link
Author

mee067 commented Nov 27, 2023

I have 38 gauges in the streamflow file. Printing after the parsing call, I get:

            call parse_basin_output_flag(shd, BASINAVGWBFILEFLAG, bnoflg%wb)
			do n = 1, size(bnoflg%wb%ns)
				write(*, *) n, bnoflg%wb%ns(n)				! ME 2023/11/27
			end do
       1           1
       2           2
       3           3
       4           4
       5           5
       6           6
       7           7
       8           8
       9           9
      10          10
      11          11
      12          12
      13          13
      14          14
      15          15
      16          16
      17          17
      18          18
      19          19
      20          20
      21          21
      22          22
      23          23
      24          24
      25          25
      26          26
      27          27
      28          28
      29          29
      30          30
      31          31
      32          32
      33          33
      34          34
      35          35
      36          36
      37          37
      38          38

printing before the "write_water_balance" call

        !> Daily: IKEY_DLY
        if (ic%now%day /= ic%next%day) then
            if (btest(bnoflg%wb%t, 0)) then
!-                call save_water_balance(shd, 86400, IKEY_DLY)
                call write_water_balance(fls, shd, fls%fl(mfk%f900)%iun, 86400, shd%NAA, out%d)
                if (allocated(bnoflg%wb%ns)) then
                    do n = 1, size(bnoflg%wb%ns)
						write(*, *) n, bnoflg%wb%ns(n), size(bnoflg%wb%ns)				! ME 2023/11/27
                        if (bnoflg%wb%ns(n) > 0) then							
                            call write_water_balance(fls, shd, (fls%fl(mfk%f900)%iun*1000 + n), 86400, &
                                                     fms%stmg%meta%rnk(n), out%d)
                        end if
                    end do
                end if
!-                call reset_water_balance(IKEY_DLY)
            end if

produces:
1 1 38
2 2 38
3 1112486707 38
4 -1082130432 38
5 -1082130432 38
6 1116536832 38
7 -1082130432 38
8 1095237632 38
9 -1082130432 38
10 -1082130432 38
11 1116458189 38
12 -1082130432 38
13 -1082130432 38
14 -1082130432 38
15 -1082130432 38
16 -1082130432 38
17 -1082130432 38
18 -1082130432 38
19 -1082130432 38
20 -1082130432 38
21 -1082130432 38
22 -1082130432 38
23 -1082130432 38
24 -1082130432 38
25 -1082130432 38
26 -1082130432 38
27 -1082130432 38
28 -1082130432 38
29 -1082130432 38
30 -1082130432 38
31 -1082130432 38
32 -1082130432 38
33 -1082130432 38
34 1132494848 38
35 -1082130432 38
36 -1082130432 38
37 -1082130432 38
38 -1082130432 38

the condition ">0" makes replacing bnoflg%wb%ns(n) with n fails for most gauges. What made the variable bnoflg%wb%ns change values?

@mee067
Copy link
Author

mee067 commented Nov 27, 2023

the only place where this variable is assigned is in the parsing step during initialization - I am puzzled and it is indeed looking like a weird memory thing.

@dprincz
Copy link
Contributor

dprincz commented Nov 28, 2023

Did you rebuild this code in its entirety since you made the 'LongSimFix' changes?

@mee067
Copy link
Author

mee067 commented Nov 28, 2023

yes
using intel 2018 and intel 2021 - I got the same problem so it is not a compiler issue.

@mee067
Copy link
Author

mee067 commented Nov 28, 2023

we need to make the bnoflg variable resilient

@mee067
Copy link
Author

mee067 commented Dec 1, 2023

@kasra-keshavarz has just reported the same issue for the Fraser setup using a serially-complied version with intel (2020) on Graham

@mee067
Copy link
Author

mee067 commented Dec 20, 2023

Zelalem reported it too for his setup. He did not get a crash but got output for some sub-basins and no output for others.

@mee067
Copy link
Author

mee067 commented Jan 31, 2024

This seems to have been forgotten. I have a fix but I did not upload it. The fix eliminates the use of "bnoflg%wb%ns(n)" and uses n directly - in all the cases I have, n = bnoflg%wb%ns(n) at the assignment time. As long as MESH prints WB for all sub-basins, the flag will be 1 for all. There is currently no option to select but maybe there was an idea to allow selecting.

@sujata91
Copy link

sujata91 commented Mar 26, 2024

Just wondering if there is any update on this. As I am unable to produce the water balance for the subbasin using control flag BASINAVGWBFILEFLAG daily ns from the model (both 1860_ME and 1860_ME_ZT version) @mee067 @dprincz

@dprincz
Copy link
Contributor

dprincz commented Aug 8, 2024

@mee067 Is this a duplicate of #56, or #56 of this?

@dprincz dprincz added the bug Unexpected or incorrect behaviour observed in a component of MESH label Aug 8, 2024
@dprincz dprincz self-assigned this Aug 8, 2024
@dprincz
Copy link
Contributor

dprincz commented Aug 13, 2024

Should be resolved with dprincz/MESH-Dev@5fc2d3b, as documented in #56.

@dprincz dprincz closed this as completed Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected or incorrect behaviour observed in a component of MESH
Projects
None yet
Development

No branches or pull requests

3 participants