Suffix assignment to sGLFluxPartition output is not complete and leads to overwriting #17

lsigut · 2018-08-16T13:26:12Z

Hi,
I am trying to do a workaround for the behaviour that is not allowing me to store all the parameters of sGLFluxPartition when run over multiple suffixes. If that is not too problematic due to the internal structure, it would be better if it can be corrected in the first place. If I use suffixes c('uStar', 'U05', 'U50', 'U95', 'UNone'), I will get some of the columns with suffixes after sExportResults, while some not. Some of them are identical for different runs, some of them not (PotRad, FP_Temp should not get suffixes as they are identical, also in sMRFluxPartition). Those variables without suffix will be overwritten, I will get only the results for 'UNone' and Warning message is shown:

Warning message:
In EC$sGLFluxPartition(TempVar.s = paste(FP_temp, "_f", sep = ""),  :
  replacing existing output columnsFP_VARnight, FP_VARday, NEW_FP_Temp, NEW_FP_VPD, FP_RRef_Night, FP_qc, FP_dRecPar, FP_errorcode, FP_GPP2000, FP_OPT_VPD, FP_OPT_NoVPD, FP_k, FP_beta, FP_alpha, FP_RRef, FP_E0, FP_k_sd, FP_beta_sd, FP_alpha_sd, FP_RRef_sd, FP_E0_sd, Reco_DT_uStar, GPP_DT_uStar, Reco_DT_uStar_SD, GPP_DT_uStar_SD

When I go through the suffix loop, resave the object and run sGLFluxPartition for 'uStar' suffix again I can check that content of these columns is not identical and thus should have respective suffix:

 [1] "Reco_DT_uStar_SD" "GPP_DT_uStar_SD"  "FP_VARnight"      "FP_VARday"        "FP_RRef_Night"   
 [6] "FP_qc"            "FP_dRecPar"       "FP_errorcode"     "FP_GPP2000"       "FP_k"            
[11] "FP_beta"          "FP_alpha"         "FP_RRef"          "FP_E0"            "FP_k_sd"         
[16] "FP_beta_sd"       "FP_alpha_sd"      "FP_RRef_sd"       "FP_E0_sd"

Please ignore "Reco_DT_uStar_SD", "GPP_DT_uStar_SD". These are probably result of some random number generation during SD reestimation for the second time. Also the warning message is complaining about rewriting of variables with 'uStar' suffix which is correct behaviour. You can notice that warning message is also mentioning collumns that are not finally exported.
I would suggest to include suffix support for the above vector of variable names and relax the expectations for the unexported ones. I assume it will not affect the correct working of the method. Maybe you could also include suffixes in the internal collumns and get rid of warning() whatsoever - replacing original content with almost identical one should not cause issues.
Thanks!

The text was updated successfully, but these errors were encountered:

bgctw · 2018-08-17T07:20:03Z

Originally, REddyProc has been designed for a defined workflow.
It is not recommended to run gapfilling or partitioning more than once with the same class, instead, instantiate a new object by sEddyProc$new from the same data. Currently, the code
adds new columns to the results data.frame. This may cause conflicts when columns with the same name are existing. At some instances, the code checks for existing columns, deletes those, and issues the mentioned warning to notify the user that this is probably not what the user wanted.
A redesign would require quite some work and needs to be thoroughly tested.

Hence, for repeated processing, please, instantiate a new object before, or use the functions that are not part of sEddyProc class. In the second case you need to take care yourself for storing the outputs. In your case you would use function partitionNEEGL.

Regarding the additional output columns:
The many GPP_suffix and Reco_suffix columns, and the many columns of parameters and intermediate results are already confusing to some users. Hence, we decided to restrict the suffix to the final results.
If you need quantiles of the intermediate output, you need to take care for the output yourself for now between runs with different suffixes.

If you make a strong case for a common use case, that requires the distribution of some specific intermediate results, we could negotiate to add suffixes for some of additional output columns.

lsigut · 2018-08-20T07:27:55Z

Thank you for explanations! For me the most important are the parameters of the light response curve as they can be used for further analysis. Though I do not need the quantiles of the parameters, I want to make sure that the one I want to use are not overwritten. I see that it could be difficult to implement. I add my workaround:

# Save colnames to a list after sMRFluxPartition method
colnames <- list()
colnames$FP_MR05 <- colnames(EddyProc.C$sExportResults())

# Lasslop et al. (2010)
suffixes <- c('uStar', 'U05', 'U50', 'U95', 'UNone')
FP_temp <- 'Tsoil'
saveRDS(EddyProc.C, file = "EddyProc.C.rds")
FP_GL10_out <- EddyProc.C$sExportResults()[, 0]
for (i in suffixes) {
  rm(EddyProc.C)
  EddyProc.C <- readRDS("EddyProc.C.rds")
  EddyProc.C$sGLFluxPartition(
    TempVar.s = paste(FP_temp, "_f", sep = ""), 
    QFTempVar.s = paste(FP_temp, "_fqc", sep = ""),
    Suffix.s = i)
  if (i == 'uStar') saveRDS(EddyProc.C, file = "EddyProc.C_GL10_uStar.rds")
  out <- EddyProc.C$sExportResults()
  cols <- colnames(out)
  cols_out <- cols[!cols %in% colnames$FP_MR05]
  out <- out[cols_out]
  corr_filter <- !cols_out %in% grep(paste(suffixes, collapse = "|"), 
                                     cols_out, value = TRUE)
  names(out)[corr_filter] <- paste(cols_out[corr_filter], i, sep = '_')
  FP_GL10_out <- cbind(FP_GL10_out, out)
}
rm(EddyProc.C)
EddyProc.C <- readRDS("EddyProc.C_GL10_uStar.rds")
colnames$FP_GL10 <- colnames(FP_GL10_out)

The output that I want is then kept in FP_GL10_out. I also did not notice before that the active binding in EddyProc.C leads to "cloning" when the object is resaved under different name. The results of computations done on the new object will be also transferred to the original one. Thus I better save the object as a file and reload it after each iteration. I hope it could also help to others.

bgctw · 2019-05-10T09:16:39Z

Hello @lsigut.
Have a look at pull request #35.
I added argument uStarScenKeep to sEddyProc_sApplyUStarScen and the bootstrapped partitioning methods.
By default results for the first uStar threshold scenario (usually the non-bootstrapped) are kept.
Please, comment on if this resolves your needs and on the interface (argument naming ...).

lsigut · 2019-05-10T10:56:01Z

Hello @progtw,
thanks for having a look into it! I was just expecting that the solution would be adequate to nighttime-based partitioning where all scenario-specific variables are renamed using the suffix system. I was expecting that this would be the only needed step - to evaluate which columns are scenario specific, attach suffix and let those that are the same for all scenarios to rewrite without warning. All such outputs would be then possible to get when extracting results. But of course I do not know the internal design of sGLFluxPartition. I am not sure if it is not too complicated to use it now. The test_that example is not well readable for me. Some example in vignette could help!

bgctw · 2019-05-10T21:28:17Z

The complications for this proposal for the design are one issue, but the main I repeat:

The many GPP_suffix and Reco_suffix columns, and the many columns of parameters and intermediate results are already confusing to some users. Hence, we decided to restrict the suffix to the final results.

In hindsight, it might have been better also for the night-time partitioning to report only the gross fluxes with suffixes and the others outputs only for one scenario, but I do not want to change the interface now.

In your workaround you demonstrate how to take care of this using the Suffix argument for other ouptuts in your user-loop:

for (suffix in scenarios) {
  EProc$sGLFluxPartition(..., Suffix = suffix)
  myStorageOfCols[,suffix] <- EProc$sExportResults()[[interestingColName]]
}

But for your purpose, as I understood, you need only the output of a single specific uStar threshold scenario and this is now supported with the suggested PR directly.

In the PR (pull request) #35 I extended the example in the vignette uStarCases.Rmd by daytime partitioning and added some explanation.

lsigut · 2019-05-13T07:56:04Z

Great, thanks for the explanations, also for the vignette!

bgctw mentioned this issue May 10, 2019

select uStar Scenrio for which to keep parameters #35

Merged

bgctw closed this as completed in #35 May 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suffix assignment to sGLFluxPartition output is not complete and leads to overwriting #17

Suffix assignment to sGLFluxPartition output is not complete and leads to overwriting #17

lsigut commented Aug 16, 2018

bgctw commented Aug 17, 2018

lsigut commented Aug 20, 2018 •

edited

Loading

bgctw commented May 10, 2019

lsigut commented May 10, 2019

bgctw commented May 10, 2019

lsigut commented May 13, 2019

Suffix assignment to sGLFluxPartition output is not complete and leads to overwriting #17

Suffix assignment to sGLFluxPartition output is not complete and leads to overwriting #17

Comments

lsigut commented Aug 16, 2018

bgctw commented Aug 17, 2018

lsigut commented Aug 20, 2018 • edited Loading

bgctw commented May 10, 2019

lsigut commented May 10, 2019

bgctw commented May 10, 2019

lsigut commented May 13, 2019

lsigut commented Aug 20, 2018 •

edited

Loading