Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibly set random number generators #528

Merged
merged 5 commits into from
Sep 19, 2022
Merged

Conversation

dschlaep
Copy link
Member

@dschlaep dschlaep commented Sep 6, 2022

  • pull in SOILWAT2 branch "feature_pcg_seeding"
  • new set_all_rngs() to set each STEPWAT2 random number generator to produce sequences of random numbers that are reproducible (if user-provided "seed" is non-zero) and
  • unique among RNGs, iterations, years, and grid cells (most RNGs)
  • unique among RNGs, iterations, and years but identical among grid cells (weather generator RNG).
  • A user-provided "seed" of zero produces non-reproducible random number sequences which are non-coinciding among RNGs, iterations, and grid cells

- pull in SOILWAT2 branch "feature_pcg_seeding"
* now, a random number sequence can be exactly reproduced ("initstate" and "initseq"), see DrylandEcology/SOILWAT2#327
* see also DrylandEcology/SOILWAT2#326
* function `RandSeed()` has now two arguments "initstate" and "initseq"
-> update STEPWAT2 calls to `RandSeed()` with new arguments

- new `set_all_rngs()` to set each STEPWAT2 random number generator to produce sequences of random numbers that are reproducible (if user-provided "seed" is non-zero) and
* unique among RNGs, iterations, years, and grid cells (most RNGs)
* unique among RNGs, iterations, and years but identical among grid cells (weather generator RNG).
* A user-provided "seed" of zero produces non-reproducible random number sequences which are non-coinciding among RNGs, iterations, and grid cells.

- both non-gridded and gridded mode exactly reproduce weather among runs if seed != 0; weather is not reproduced among runs if seed == 0
- note: gridded mode reproduces weather almost but not exactly among cells (this is not the intended behavior and requires further investigation)
@dschlaep
Copy link
Member Author

dschlaep commented Sep 6, 2022

@kpalmqui I finally got around to implement the updates to the random number generators that we discussed a couple weeks ago! It appears to work ok except that grid cells do not exactly (but almost exactly) reproduce weather in gridded mode.

- SOILWAT2 commit 947ae2f1b8a67119ff70762be5115d0ada60d15c updated "weathsetup.in"
- changes are SOILWAT2-standalone but reading that file correctly is required for STEPWAT2
This includes
- `SW_WTH_init_run()` now also initializes yesterday's weather values
The problem was that gridded mode used the last values in a year of one cell as first values of the next cell. This went mostly unnoticed but when the last day contains precipitation, then this affects the weather generator's behavior for the first day of the next cell.

This is fixed by
(i) SOILWAT2's `SW_WTH_init_run()` did not zero out yesterday's weather values;
this was an issue only for STEPWAT2's gridded mode which does not deconstruct/construct each SOILWAT2 run (fixed with commit 8a0a754).
(ii) STEPWAT2's `load_cell()` did not call `SW_CTL_init_run()` (and `SW_WTH_init_run()`) or equivalently to prevent the carry-over of values from one cell to the next (fixed with this commit).

This now works as expected: i.e.,
* if seed != 0 (output is reproduced among runs)
** weather is exactly identical among runs and cells
** weather is different among years, iterations and seeds
* if seed == 0 (output cannot be reproduced among runs)
** weather is different among cells, years, iterations and runs

However and ideally, a grid cell should continue with the state that it ended at during the previous year (and not be zeroed out).
@dschlaep
Copy link
Member Author

This now works as expected: i.e.,

  • if seed != 0 (output is reproduced among runs)
    ** weather is exactly identical among runs and cells
    ** weather is different among years, iterations and seeds
  • if seed == 0 (output cannot be reproduced among runs)
    ** weather is different among cells, years, iterations and runs

Script to check expectations (requires to manually adjust seed during interactive use):

make clean

#-------------------------------------------------------------------------------
#--- Test nongridded mode ------------------------------------------------------

#-------------------------------------------------------------------------------
#--- * if seed != 0 (output is reproduced among runs) ------
# Set model.in: 3 100 7 (niter, nyrs, seed)
make bint_testing_nongridded
cp -r testing.sagebrush.master/Stepwat_Inputs/Output testing.sagebrush.master/Stepwat_Inputs/Output_i3y100s7_r1
make bint_testing_nongridded


#--- ** weather is exactly identical among runs ------
# Expect no differences
diff -aqr testing.sagebrush.master/Stepwat_Inputs/Output testing.sagebrush.master/Stepwat_Inputs/Output_i3y100s7_r1


#--- ** weather is different among years ------
Rscript -e 'x <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output/bmassavg.csv")[, c("PPT", "Temp")]; apply(x, 2, sd) > 0'

#--- ** weather is different among iterations ------
Rscript -e 'x <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output/bmassavg.csv")[, c("StdDev", "StdDev.1")]; apply(x, 2, sd) > 0'


#--- ** weather is different among seeds ------
# Set model.in: 3 100 6 (niter, nyrs, seed) # or any seed other than 0 and 7
make bint_testing_nongridded

# Expect differences
Rscript -e 'x1 <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output/bmassavg.csv")[, c("PPT", "Temp")]; x0 <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output_i3y100s7_r1/bmassavg.csv")[, c("PPT", "Temp")]; !isTRUE(all.equal(x1, x0))'




#-------------------------------------------------------------------------------
#--- * if seed == 0 (output cannot be reproduced among runs) ------
# Set model.in: 3 100 0 (niter, nyrs, seed)
make bint_testing_nongridded
cp -r testing.sagebrush.master/Stepwat_Inputs/Output testing.sagebrush.master/Stepwat_Inputs/Output_i3y100s0_r1
make bint_testing_nongridded


#--- ** weather is different among years ------
Rscript -e 'x <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output/bmassavg.csv")[, c("PPT", "Temp")]; apply(x, 2, sd) > 0'

#--- ** weather is different among iterations ------
Rscript -e 'x <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output/bmassavg.csv")[, c("StdDev", "StdDev.1")]; apply(x, 2, sd) > 0'


#--- ** weather is different among runs ------
Rscript -e 'x1 <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output/bmassavg.csv")[, c("PPT", "Temp")]; x0 <- read.csv("testing.sagebrush.master/Stepwat_Inputs/Output_i3y100s0_r1/bmassavg.csv")[, c("PPT", "Temp")]; !isTRUE(all.equal(x1, x0))'




#-------------------------------------------------------------------------------
#--- Test gridded mode ---------------------------------------------------------

#-------------------------------------------------------------------------------
#--- * if seed != 0 (output is reproduced among runs) ------
# Set model.in: 3 100 7 (niter, nyrs, seed)
make bint_testing_gridded
cp -r testing.sagebrush.master/Output testing.sagebrush.master/Output_i3y100s7_r1
make bint_testing_gridded


#--- ** weather is exactly identical among runs and cells ------
# Expect no differences among runs
diff -aqr testing.sagebrush.master/Output testing.sagebrush.master/Output_i3y100s7_r1

# Expect no differences among cells
Rscript -e 'x <- lapply(seq_len(4), function(k) read.csv(paste0("testing.sagebrush.master/Output/g_bmassavg", k - 1, ".csv"))[, c("PPT", "Temp")]); sapply(seq_len(4), function(k) all.equal(x[[k]], x[[1]]))'


#--- ** weather is different among years ------
Rscript -e 'x <- read.csv("testing.sagebrush.master/Output/g_bmass_cell_avg.csv")[, c("PPT", "Temp")]; apply(x, 2, sd) > 0'

#--- ** weather is different among iterations ------
Rscript -e 'x <- lapply(seq_len(4), function(k) read.csv(paste0("testing.sagebrush.master/Output/g_bmassavg", k - 1, ".csv"))[, c("StdDev", "StdDev.1")]); sapply(x, function(xk) apply(xk, 2, sd) > 0)'


#--- ** weather is different among seeds ------
# Set model.in: 3 100 6 (niter, nyrs, seed) # or any seed other than 0 and 7
make bint_testing_gridded

# Expect differences
Rscript -e 'x1 <- read.csv("testing.sagebrush.master/Output/g_bmass_cell_avg.csv")[, c("PPT", "Temp")]; x0 <- read.csv("testing.sagebrush.master/Output_i3y100s7_r1/g_bmass_cell_avg.csv")[, c("PPT", "Temp")]; !isTRUE(all.equal(x1, x0))'



#-------------------------------------------------------------------------------
#--- * if seed == 0 (output cannot be reproduced among runs) ------
# Set model.in: 3 100 0 (niter, nyrs, seed)
make bint_testing_gridded
cp -r testing.sagebrush.master/Output testing.sagebrush.master/Output_i3y100s0_r1
make bint_testing_gridded


#--- ** weather is different among cells ------
Rscript -e 'x <- lapply(seq_len(4), function(k) read.csv(paste0("testing.sagebrush.master/Output/g_bmassavg", k - 1, ".csv"))[, c("PPT", "Temp")]); sapply(seq_len(4)[-1], function(k) !isTRUE(all.equal(x[[k]], x[[1]])))'


#--- ** weather is different among years ------
Rscript -e 'x <- read.csv("testing.sagebrush.master/Output/g_bmass_cell_avg.csv")[, c("PPT", "Temp")]; apply(x, 2, sd) > 0'

#--- ** weather is different among iterations ------
Rscript -e 'x <- lapply(seq_len(4), function(k) read.csv(paste0("testing.sagebrush.master/Output/g_bmassavg", k - 1, ".csv"))[, c("StdDev", "StdDev.1")]); sapply(x, function(xk) apply(xk, 2, sd) > 0)'


#--- ** weather is different among runs ------
Rscript -e 'x1 <- read.csv("testing.sagebrush.master/Output/g_bmass_cell_avg.csv")[, c("PPT", "Temp")]; x0 <- read.csv("testing.sagebrush.master/Output_i3y100s0_r1/g_bmass_cell_avg.csv")[, c("PPT", "Temp")]; !isTRUE(all.equal(x1, x0))'

@dschlaep dschlaep merged commit 7dce3c7 into master Sep 19, 2022
@dschlaep dschlaep deleted the reproducible_rngs branch September 19, 2022 02:33
dschlaep added a commit that referenced this pull request Oct 5, 2022
- SOILWAT2 branch "feature_read_weather" isolated the handling of daily weather data and moved it from within the simulation loop to the overall setup process

- STEPWAT2 needs now to handle daily weather data itself; there are two basic options:
i) follow SOILWAT2's new approach and generate daily weather for all years of a simulation run (for each grid cell and iteration); this would require that each grid cell stores and handles a local copy of `SW_Weather`
ii) stick with the previous approach which generated daily weather for each year

- this commit follows option (ii), i.e., generate daily weather for each year
** new `_sxw_generate_weather()` handles the generation of daily weather for the current year
** `Env_Generate()` now calls `_sxw_generate_weather()` before running SOILWAT2 for the current year
** non-gridded mode needed to set RNGs for each year (so that `markov_rng` gets updated with fresh values for each year)

-> this commit satisfies expectations, i.e., (script based on #528 (comment))
    - if seed != 0 (output is reproduced among runs)
    ** weather is exactly identical among runs and cells
    ** weather is different among years, iterations and seeds
    - if seed == 0 (output cannot be reproduced among runs)
    ** weather is different among cells, years, iterations and runs
dschlaep added a commit that referenced this pull request Jun 30, 2023
STEPWAT2 should meet reproducibility expectations as formulated with
PR #528 (#528) that was
merged into the main branch on Sep 18, 2022 with commit
(7dce3c7)
"Exactly reproduce random number sequences"

This new bash script automatically runs gridded and nongridded example runs with STEPWAT2 using different seeds and uses 'diff' and 'Rscript' to check the following:

* if seed != 0 (output is reproduced among runs)
** weather is exactly identical among runs and cells
** weather is different among years, iterations and seeds

* if seed == 0 (output cannot be reproduced among runs)
** weather is different among cells, years, iterations and runs

Note that 'master' and 'Seed_Dispersal' branches use different naming schemes of output files -> these need to be manually adjusted with variables `tag_gridded_biomass_cellk` (line 30) and `fname_gridded_biomass_meancell` (line 35) -- they are currently set to the 'Seed_Dispersal' scheme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants