Feature/mpi pio read2ndfile #145

ShervanGharari · 2020-08-09T02:15:29Z

This is to break the step in pull 129 into two steps; in the first step (this pull) the are data structure changed introduce for reading the second nc file which provide data on water management component such as fluxes from/to river segments and target volume for lakes. Most of the change is in the standalone part for this pull.

… alone to read the second netcdf file that inclue flux to or from river segment or target volume for lakes

Feature/mpi pio

…e-v2 Feature/mpi pio read2ndfile v2

…o different type of data strcuture such as runoff and water management can be pass in model setup

ShervanGharari · 2020-08-12T17:44:00Z

Standalone, get basin runoff and model setup are changed so a another nc file can be read and the data can be read and sorter based on the nSeg for abstraction/injection or target volume.
The general changes are as follow:
1- A new data structure in DataType is given
2- Read runoff or read metadata is generalized so it does not rely on runoff data structure can read more general input out/put
3- If provided the second nc file, a data structure infield_info is populated.
4- The start and end of the files for the second file are corrected based on the first time step of runoff input nc files so the second file can be read based on iTime and iTime_local_wm (local iTime for the second file)
5- two more variables, flux and target volume are added in the get basin runoff.
The code compiles but needs to be tested.

ShervanGharari · 2020-08-12T22:22:13Z

I get segmentation fault when reading the runoff data from the global HDMA case with CLM input.

[gra807:20491:0:20491] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
 0 0x0000000000010e90 __funlockfile()  ???:0
 1 0x00000000005abb41 read_runoff_mp_read_2d_runoff_()  /home/shg096/mizuRoute/route/build/src/standalone/read_runoff.f90:406
 2 0x00000000005a94d0 read_runoff_mp_read_runoff_data_()  /home/shg096/mizuRoute/route/build/src/standalone/read_runoff.f90:303
 3 0x00000000005ac471 get_runoff_mp_get_hru_runoff_()  /home/shg096/mizuRoute/route/build/src/standalone/get_basin_runoff.f90:67
 4 0x0000000000775666 MAIN__()  /home/shg096/mizuRoute/route/build/src/standalone/route_runoff.f90:88
 5 0x0000000000411b8e main()  ???:0
 6 0x00000000000202e0 __libc_start_main()  ???:0
 7 0x0000000000411aaa _start()  /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120
===================

It is from the line that pass the read and populated dummy to sim or sim2D variable:

https://github.com/ShervanGharari/mizuRoute/blob/feature/mpi-pio-read2ndfile/route/build/src/standalone/read_runoff.f90#L406

to generalize the reading with multiple data structure, unlike runoff only, I have generalized this part to pass a 2D array out of the read function.

ShervanGharari · 2020-08-12T22:59:35Z

The segmentation fault was due to the fact that the sim or sim2d was not allocated. I have changed that to inout so the already allocated runoff_data%sim or runoff_data%sim2d can be pass to the subroutines and be given the values.

…seg length in the river network topology

ShervanGharari · 2020-08-13T20:58:00Z

The code result in identical streamflow simulation for HDMA, CLM case on 16 CPUs to the branch NCAR/feature/mpi-pio.
The code is able to read the flux passed to it. it is capable of reading the correct abstraction injection nc files and its local_time (local_iTime_wm)...
A more rigorous test is needed when the read runoff addition to the river segment is distributed to CPU. This will be done in a new pull in mpi_process after this pull is merged.

route/build/src/standalone/model_setup.f90

route/build/src/public_var.f90

nmizukami · 2020-08-13T23:07:55Z

route/build/src/standalone/model_setup.f90

-   else
-     infileinfo_data(iFile)%unit = trim(time_units)
-   end if
+   call get_var_attr(trim(dir_name)//trim(inputfileinfo(iFile)%infilename), &


Users are allowed to specify time units (e.g., days since 1980-01-01 00:00:00) and calendar in control file (so overwriting the one from netcdf). This is because we need to enforce time unit format (some format is not recognized in mizuRoute) and calendar name (only allow - standard, noleap, gregorian etc.) and if netcdf has different time unit format and calendar name, it will cause problem later (maybe get run time error). To overwrite, I put some check here https://github.com/NCAR/mizuRoute/blob/62837b893cf9831d955875fc61c9471d8ac5661a/route/build/src/standalone/model_setup.f90#L218 for time unit check.

Question is now we have multiple input files. I am not sure the code needs to check and make sure that calendar and time unit is the same for both input file? This is usability issue (do users get annoyed by enforcing the same time unit and calendar ?)

I have added the calendar and time step from the control file assume that they should be the same as the water management calendar and time steps.

nmizukami · 2020-08-13T23:19:28Z

route/build/src/standalone/model_setup.f90

+ ! private subroutine: get the two infiledata and convert the iTimebound of
+ ! the input_info_wm to match the input_info
+ ! *********************************************************************
+ SUBROUTINE inFile_corr_time(inputfileinfo,      & ! input: the structure of simulated runoff, evapo and


hmm. .. I am not sure what this subroutine is actually doing. does this just compare time bounds between two file streams (runoff vs reach fluxes take)? inputfileinfo_wm(:)%iTimebound(:) has already computed when inFile_pop is called (in init_inFile_pop) ?

The init_infile_pop populated both iTimebound(:) for runoff netcdf and abstraction/injection.
The second file might have different starting and ending point in time (the first one might start earlier, later or the same might end earlier or later). To have a reference to the iTime and iTimelocal, I decided to correct this before going into the model that will make infile_name simpler.
Examples:
1)if the first of second files start earlier than the runoff files its initial iTimebound will be negative (like -10000 days meaning the file is starting 30 years earlier)
2)if the first of second files start later than runoff then the iTimebound will be positive (like it start from 1000 day after the runoff file)
in the first example the model should not have an issue the iTime_local_wm start from 10000 while iTime and iTime_local are actually 1. in the second example the iTime_local_wm will be -1000 if the model start from the first iTime (from runoff time step) and the model stops. however if the starting point is set to sometime after 1000 days then the model works as it can read the second file values...
alternatively we can add this step into infile_name is that will be cleaner...
what is your suggestions on that?

…model_setup.f90

… the model setup

ShervanGharari · 2020-08-25T16:56:42Z

I have added the checks for the timing of the second netcdf files. The checks are comparing the start and end of the simulation to the start and end of the second netcdf files. There is no need to check the start and end of the runoff netcdf files with the second files as start and end of simulations are updated if they area earlier or later than the runoff file.
consider the following example
runoff files 1980-01-01 to 1984-12-31
water management files 1975-01-01 to 1979-12-31
start_sim 1975-01-01
end_sime 1985-12-31
first init_time will reset the start from 1975-01-01 to start of runoff file which is 1980-01-01, then the code check the start_sim with the water management file and as it is past the water management last time step the simulation stops. Similarly if the water management runoff is after the runoff the simulation stopes because the first time. steps of water management will be passed the last time step of runoff file (which is the actual or updated sim_end).
the code compiles but need further checks.

ShervanGharari · 2020-08-25T23:37:01Z

The scatter_wm subroutine is added to the mpi_process.f90. the subroutine is not called. the code compiles.
next will be to call the scatter_wm and pass the distributed target volume and fluxes for main and tributary reaches to the main_route.

…d to the mpi_process.f90

ShervanGharari · 2020-08-26T17:59:03Z

more checks are added after scatter runoff for evaporation and precipitation in case is_lake_sim flag is true. The code compiles.

…d main_route subroutine

…n main_route

ShervanGharari · 2020-08-26T19:36:25Z

the variables of the second files are passed all the way down to the main route and are distributed to the RCHFLX_OUT based on seg order (needs checking in main_route). The code compiles but need to be tested generally.
One test can be to read the model simulation as the second file to the model. If the printed RCHFLX_OUT%REACH_WM_FLUX is identical for the same reach with simulated discharge in the river segment then scattering and ordering can be assumed correctly implemented.

…s int irf route and get basin runoff

ShervanGharari · 2020-09-01T03:59:23Z

I have performed the proposed test:
1-Simulated the runoff (HDMA + CLM)
2-feed the runoff (which is in time*seg dimensions) as second nc files to the model
3-check if the passed RCHFLX_OUT%REACH_WM_FLUX is the same as RCHFLX_OUT%REACH_Q_IRF
to ensure that the scatter_wm in mpi_process.f90 is working as it should be I have done the first step with 16 cpus and the second step with 10.
the result shows that the simulated reach runoff and read reach runoff from the second file are very similar but not identical... a snippet of that can be seen here... is this the result of precision of read and write files? or the single format of writing the output files?

  SIM                          READ
  7.558259815608805E-005       7.558259676443413E-005
  9.010879396723562E-007       9.010879580273468E-007
  7.421825730101835E-007       7.421825785058900E-007
  9.062744859441921E-002       9.062744677066803E-002
  7.998593531490339E-002       7.998593896627426E-002
  5.284206748944350E-002       5.284206569194794E-002
  0.258958966906890            0.258958965539932     
  0.222698873564474            0.222698867321014     
  0.318773752833959            0.318773746490479     
  5.824489644434995E-005       5.824489562655799E-005
  4.921001208789678E-005       4.921001163893379E-005
  2.38470290873135             2.38470292091370     
  1.42609475347486             1.42609477043152     
  2.31175813896434             2.31175804138184     
  2.65326062034029             2.65326070785522     
  2.63590238006958             2.63590240478516     
  2.04380923184174             2.04380917549133     
  2.06412614986773             2.06412625312805     
  0.253412337054753            0.253412336111069     
  0.166440462818452            0.166440457105637     
  0.148173535820387            0.148173540830612

nmizukami · 2020-09-01T13:47:48Z

Yes, output (write_simout_pio.f90) is in single precision (https://github.com/NCAR/mizuRoute/blob/62837b893cf9831d955875fc61c9471d8ac5661a/route/build/src/write_simoutput_pio.f90#L426). Variables in output netcdf lost some precision. if you change ncd_float->ncd_double, output becomes in double precision, and when you read in, it should keep the precision (i think)

ShervanGharari · 2020-09-01T15:42:34Z

Thank you Naoki, the next test I make is if only a handful of seg are available in the second nc file. We can then later check what will be the result with double precision instead. For now I think it is pretty clear where the difference is coming from.

ShervanGharari · 2020-09-01T20:18:13Z

I have created a new wm input data which has only values for two segments (the sample of files are attached wm.zip). The simulation terminates shortly after it starts. The error massage reads as below; When given the second netcdf of the model output that includes all the seg values the code runs without any issues however this arises when given the wm file that only have the two segment identified. interestingly the code crashes in a place that I cannot really understand why... it also never gets to the get_basin_runoff.f90. what can it be?

srun: error: gra823: task 0: Floating point exception (core dumped)
srun: Terminating job step 37847655.0
slurmstepd: error: *** STEP 37847655.0 ON gra823 CANCELLED AT 2020-09-01T14:57:24 ***
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
route_runoff.mpi-  0000000000932B4E  Unknown               Unknown  Unknown
libpthread-2.24.s  00002AE574946E90  Unknown               Unknown  Unknown
hmca_bcol_basesmu  00002AE58333CBB0  hmca_bcol_basesmu     Unknown  Unknown
libhcoll.so.1      00002AE5847222B5  hmca_coll_ml_barr     Unknown  Unknown
mca_coll_hcoll.so  00002AE5842BDF1A  mca_coll_hcoll_ba     Unknown  Unknown
libmpi.so.40.10.2  00002AE573E25C21  PMPI_Barrier          Unknown  Unknown
libmpi_mpifh.so.4  00002AE573B7E553  MPI_Barrier_f08       Unknown  Unknown
route_runoff.mpi-  0000000000465B89  mpi_mod_mp_shr_mp         748  mpi_utils.f90
route_runoff.mpi-  00000000006FA65C  mpi_routine_mp_sc        1172  mpi_process.f90
route_runoff.mpi-  00000000006F0CD4  mpi_routine_mp_mp         775  mpi_process.f90
route_runoff.mpi-  0000000000780F1F  MAIN__                     94  route_runoff.f90
route_runoff.mpi-  0000000000411B8E  Unknown               Unknown  Unknown
libc-2.24.so       00002AE574D772E0  __libc_start_main     Unknown  Unknown
route_runoff.mpi-  0000000000411AAA  Unknown               Unknown  Unknown

I am just checking; can it be the result of allocation of wm_data starting from line 869 in model_setup.f90. There the size is only 2 instead of total reachID. we take case of the missing values in sort however I am wondering is this causes the problem for mpi.

   ! allocate the hru_ix based on number of hru_id presented in the
   allocate(wm_data_in%seg_ix(size(wm_data_in%seg_id)), stat=ierr)
   if(ierr/=0)then; message=trim(message)//'problem allocating runoff_data_in%hru_ix'; return; endif

   ! get indices of the seg ids in the input file in the routing layer
   call get_qix(wm_data_in%seg_id,  &    ! input: vector of ids in mapping file
                reachID,            &    ! input: vector of ids in the routing layer
                wm_data_in%seg_ix,  &    ! output: indices of hru ids in routing layer
                ierr, cmessage)          ! output: error control
   if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif

ShervanGharari · 2020-09-02T21:44:04Z

It seems that pervious communication can be solved is the second input is given the full segment that exists in the river network topology. The input should be given as such:

    seg  1    2         3    4
time  
1        3.0  missing   5    missing
2        3.2  missing   5.2  missing
.
.
.
5        6.1  missing   5.9  missing

instead of

    seg  1     3    
time. 
1        3     5    
2        3.2   5.2  
.
.
.
5        6.1   5.9

I will still need to prepare a case for this and check...

… this moment the is not abstraction or injection or target volume to the lake as there is no lake module

ShervanGharari · 2020-10-20T23:13:10Z

I have added the abstration/injection to the irf. the idea is as follow:
reach streamflow - abstraction > 0 then
actual_abs = abstraction and reach stream = reach stream - abstraction
else
actual_abs = reach streamflow and reach streamflow = 0;
I have compiled the code but I faces some reason with srun on Graham in the submitted jobs. so did not yet tested the code.

… the streamflow after abstration and actual abstration are compared to give a sum of zero

ShervanGharari · 2020-10-21T00:09:41Z

The code compiles, the water balance difference of initial streamflow, streamflow after abstraction and actual abstraction are set to zero. it need more rigorous testing.
For now pull 145 can be merged to the main code. Additionally and before merge I can make the actual abstraction to be written in output file.

route/build/src/irf_route.f90

route/build/src/read_control.f90

route/build/src/standalone/model_setup.f90

ShervanGharari · 2020-10-23T04:01:32Z

code compiles successfully...!

ShervanGharari and others added 11 commits August 8, 2020 21:30

new strcuture, global varibaribles and modified model set up in stand…

7a0137e

… alone to read the second netcdf file that inclue flux to or from river segment or target volume for lakes

changes in global data

912bfd6

change to wm_data strcuture

6e043da

Update remap.f90

685020c

Merge pull request #34 from NCAR/feature/mpi-pio

56b75d5

Feature/mpi pio

Merge branch 'feature/mpi-pio-read2ndfile-v2' into feature/mpi-pio

4d2ca06

Merge pull request #36 from ShervanGharari/feature/mpi-pio

98d2ae2

Feature/mpi pio

Merge pull request #37 from ShervanGharari/feature/mpi-pio-read2ndfil…

00d057e

…e-v2 Feature/mpi pio read2ndfile v2

minor changes in the model_setup

3961e58

the read_runoff_metadata and read_runoff subroutins are generalized s…

6133205

…o different type of data strcuture such as runoff and water management can be pass in model setup

get basin runoff reads the extra files

49d44e4

ShervanGharari marked this pull request as ready for review August 12, 2020 17:53

ShervanGharari added 2 commits August 12, 2020 17:11

changes in read runoff to make it similar to the origin

761b44d

changes in read runoff to make it similar to the origin

bf2e96a

ShervanGharari added 2 commits August 12, 2020 20:16

the model setup is fixed so that the seg_id length is similar to the …

f578fc1

…seg length in the river network topology

print statments and fixing of time correction for second file

c6c4aa9

nmizukami reviewed Aug 14, 2020

View reviewed changes

ShervanGharari and others added 6 commits August 17, 2020 14:12

minor changes in commnets and also reading calendar and time step in …

bbc6db7

…model_setup.f90

cleaing of get basin runoff

dace0e0

the checks for start and end of the second nc file is incorporated in…

c8e8542

… the model setup

chekcing if nc files do not have gap or overlaps

3e25ec7

few changes

8e571b7

small changes

db80cdc

scatter_wm is implemented in mpi_process.f90

1bfc0ca

the checks for scatter runoff, evaporation and precipitation are adde…

d9734f6

…d to the mpi_process.f90

ShervanGharari added 3 commits August 26, 2020 14:32

scatter_wm and its check is called in the mpi_process/f90

3fb572e

the scattered flux and target volumes are passed to main_route.f90 an…

085e2d8

…d main_route subroutine

passing and allocating the flux and target volume to the RCHFLX_OUT i…

b778495

…n main_route

ShervanGharari added 2 commits August 31, 2020 22:36

small change of nTime to nTime_wm in model setup also print statement…

c536336

…s int irf route and get basin runoff

minor changed, commenting the print statments in irt_rote

15bd6d3

print statements are added

dd07ebd

the abstraction or injection to the river segment is added to irf; at…

21f1f11

… this moment the is not abstraction or injection or target volume to the lake as there is no lake module

water balance is caluclated for each reach, so the inital streamflow,…

f0261d8

… the streamflow after abstration and actual abstration are compared to give a sum of zero

nmizukami reviewed Oct 22, 2020

View reviewed changes

route/build/src/irf_route.f90 Outdated Show resolved Hide resolved

route/build/src/irf_route.f90 Show resolved Hide resolved

nmizukami reviewed Oct 22, 2020

View reviewed changes

route/build/src/read_control.f90 Outdated Show resolved Hide resolved

nmizukami reviewed Oct 22, 2020

View reviewed changes

route/build/src/standalone/model_setup.f90 Outdated Show resolved Hide resolved

ShervanGharari added 2 commits October 22, 2020 23:56

the commnets where fixed

3fb6039

print statement is removed

a06954b

nmizukami merged commit 06655b2 into ESCOMP:feature/mpi-pio Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/mpi pio read2ndfile #145

Feature/mpi pio read2ndfile #145

ShervanGharari commented Aug 9, 2020

ShervanGharari commented Aug 12, 2020

ShervanGharari commented Aug 12, 2020

ShervanGharari commented Aug 12, 2020

ShervanGharari commented Aug 13, 2020 •

edited

Loading

nmizukami Aug 13, 2020

ShervanGharari Aug 17, 2020

nmizukami Aug 13, 2020

ShervanGharari Aug 15, 2020

ShervanGharari commented Aug 25, 2020 •

edited

Loading

ShervanGharari commented Aug 25, 2020

ShervanGharari commented Aug 26, 2020 •

edited

Loading

ShervanGharari commented Aug 26, 2020

ShervanGharari commented Sep 1, 2020 •

edited

Loading

nmizukami commented Sep 1, 2020

ShervanGharari commented Sep 1, 2020

ShervanGharari commented Sep 1, 2020 •

edited

Loading

ShervanGharari commented Sep 2, 2020

ShervanGharari commented Oct 20, 2020

ShervanGharari commented Oct 21, 2020

ShervanGharari commented Oct 23, 2020

Feature/mpi pio read2ndfile #145

Feature/mpi pio read2ndfile #145

Conversation

ShervanGharari commented Aug 9, 2020

ShervanGharari commented Aug 12, 2020

ShervanGharari commented Aug 12, 2020

ShervanGharari commented Aug 12, 2020

ShervanGharari commented Aug 13, 2020 • edited Loading

nmizukami Aug 13, 2020

Choose a reason for hiding this comment

ShervanGharari Aug 17, 2020

Choose a reason for hiding this comment

nmizukami Aug 13, 2020

Choose a reason for hiding this comment

ShervanGharari Aug 15, 2020

Choose a reason for hiding this comment

ShervanGharari commented Aug 25, 2020 • edited Loading

ShervanGharari commented Aug 25, 2020

ShervanGharari commented Aug 26, 2020 • edited Loading

ShervanGharari commented Aug 26, 2020

ShervanGharari commented Sep 1, 2020 • edited Loading

nmizukami commented Sep 1, 2020

ShervanGharari commented Sep 1, 2020

ShervanGharari commented Sep 1, 2020 • edited Loading

ShervanGharari commented Sep 2, 2020

ShervanGharari commented Oct 20, 2020

ShervanGharari commented Oct 21, 2020

ShervanGharari commented Oct 23, 2020

ShervanGharari commented Aug 13, 2020 •

edited

Loading

ShervanGharari commented Aug 25, 2020 •

edited

Loading

ShervanGharari commented Aug 26, 2020 •

edited

Loading

ShervanGharari commented Sep 1, 2020 •

edited

Loading

ShervanGharari commented Sep 1, 2020 •

edited

Loading