Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upInconsistency with cv_blup function #3
Comments
|
Hi, thank you for your report. You can use the package reprex to report the results here |
|
@billh2006 I closed accidentally this issue. I added a commit that improves the check of data before computing the cross-validation. Please, install the development version and check with your data. |
|
@TiagoOlivoto- Thanks for your quick response. Yes-my data set is unbalanced. Certain genotypes are not represented in all environments and 8 (thank you for the new error message) of the genotype-environment interactions are missing data for one or more of the reps. Is a complete, balanced dataset a necessary condition for cv_blup and downstream functions (e.g. waasb)? |
|
@billh2006 - You can have unbalanced data but it is necessary that all genotype-environment combinations have the same number of replicates (complete dataset). If you assume to have three replicates, for each cross-validation procedure the function will select randomly two replicates for each genotype-environment combination to serve as training set. The other replicate will serve as validation set. If for some GEI combinations you have only two reps, the model cannot be computed because there will be no validation data for such a combination. Please, note a toy example bellow with the example data library(metan)
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
#> []=====================================================[]
#> [] Multi-Environment Trial Analysis (metan) v1.4.0.9000[]
#> [] Author: Tiago Olivoto []
#> [] Type citation('metan') to know how to cite metan []
#> [] Type vignette('metan_start') for a short tutorial []
#> [] Visit http://bit.ly/2TIq6JE for a complete tutorial []
#> []=====================================================[]
df <- data_ge
df[1, 4] <- NA
df
#> # A tibble: 420 x 5
#> ENV GEN REP GY HM
#> <fct> <fct> <fct> <dbl> <dbl>
#> 1 E1 G1 1 NA 44.9
#> 2 E1 G1 2 2.50 46.9
#> 3 E1 G1 3 2.43 47.8
#> 4 E1 G2 1 3.21 45.2
#> 5 E1 G2 2 2.93 45.3
#> 6 E1 G2 3 2.56 45.5
#> 7 E1 G3 1 2.77 46.7
#> 8 E1 G3 2 3.62 43.2
#> 9 E1 G3 3 2.28 47.8
#> 10 E1 G4 1 2.36 47.9
#> # ... with 410 more rows
cv_blup(df, env = ENV, gen = GEN, rep = REP, resp = GY, nboot = 10)
#> ENV GEN n
#> 1 E1 G1 2
#> Error: Combinations of genotype and environment with different number of replication than observed in the trial (3)
# An unbalanced data
df2 <- data_ge
df2[1:3, 4] <- NA
df2
#> # A tibble: 420 x 5
#> ENV GEN REP GY HM
#> <fct> <fct> <fct> <dbl> <dbl>
#> 1 E1 G1 1 NA 44.9
#> 2 E1 G1 2 NA 46.9
#> 3 E1 G1 3 NA 47.8
#> 4 E1 G2 1 3.21 45.2
#> 5 E1 G2 2 2.93 45.3
#> 6 E1 G2 3 2.56 45.5
#> 7 E1 G3 1 2.77 46.7
#> 8 E1 G3 2 3.62 43.2
#> 9 E1 G3 3 2.28 47.8
#> 10 E1 G4 1 2.36 47.9
#> # ... with 410 more rows
cv_blup(df2, env = ENV, gen = GEN, rep = REP, resp = GY, nboot = 10)
#> $RMSPD
#> MODEL RMSPD
#> 1 BLUP_g_RCBD 0.3552628
#> 2 BLUP_g_RCBD 0.3763954
#> 3 BLUP_g_RCBD 0.4266109
#> 4 BLUP_g_RCBD 0.4532448
#> 5 BLUP_g_RCBD 0.4043090
#> 6 BLUP_g_RCBD 0.4221250
#> 7 BLUP_g_RCBD 0.4243280
#> 8 BLUP_g_RCBD 0.3874620
#> 9 BLUP_g_RCBD 0.3697422
#> 10 BLUP_g_RCBD 0.3845615
#>
#> $RMSPDmean
#> # A tibble: 1 x 6
#> MODEL mean sd se Q2.5 Q97.5
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 BLUP_g_RCBD 0.400 0.0308 0.00972 0.359 0.447
#>
#> attr(,"class")
#> [1] "cvalidation"Created on 2020-04-07 by the reprex package (v0.3.0) You can use both unbalanced and incomplete data with library(metan)
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
#> []=====================================================[]
#> [] Multi-Environment Trial Analysis (metan) v1.4.0.9000[]
#> [] Author: Tiago Olivoto []
#> [] Type citation('metan') to know how to cite metan []
#> [] Type vignette('metan_start') for a short tutorial []
#> [] Visit http://bit.ly/2TIq6JE for a complete tutorial []
#> []=====================================================[]
df <- data_ge
df[1, 4] <- NA
df
#> # A tibble: 420 x 5
#> ENV GEN REP GY HM
#> <fct> <fct> <fct> <dbl> <dbl>
#> 1 E1 G1 1 NA 44.9
#> 2 E1 G1 2 2.50 46.9
#> 3 E1 G1 3 2.43 47.8
#> 4 E1 G2 1 3.21 45.2
#> 5 E1 G2 2 2.93 45.3
#> 6 E1 G2 3 2.56 45.5
#> 7 E1 G3 1 2.77 46.7
#> 8 E1 G3 2 3.62 43.2
#> 9 E1 G3 3 2.28 47.8
#> 10 E1 G4 1 2.36 47.9
#> # ... with 410 more rows
mod <- waasb(df, ENV, GEN, REP, GY)
#> Warning: Row(s) 1 with NA values deleted.
#> Method: REML/BLUP
#> Random effects: GEN, GEN:ENV
#> Fixed effects: ENV, REP(ENV)
#> Denominador DF: Satterthwaite's method
#> ---------------------------------------------------------------------------
#> P-values for Likelihood Ratio Test of the analyzed traits
#> ---------------------------------------------------------------------------
#> model GY
#> COMPLETE NA
#> GEN 1.19e-05
#> GEN:ENV 2.02e-11
#> ---------------------------------------------------------------------------
#> All variables with significant (p < 0.05) genotype-vs-environment interaction
gmd(mod)
#> Class of the model: waasb
#> Variable extracted: genpar
#> # A tibble: 9 x 2
#> Parameters GY
#> <chr> <dbl>
#> 1 Phenotypic variance 0.181
#> 2 Heritability 0.154
#> 3 GEIr2 0.313
#> 4 Heribatility of means 0.814
#> 5 Accuracy 0.902
#> 6 rge 0.370
#> 7 CVg 6.24
#> 8 CVr 11.6
#> 9 CV ratio 0.537Created on 2020-04-07 by the reprex package (v0.3.0) |
|
Thank you! |
Hi,
I am testing a sample dataset that has the same format as the oat dataset as demonstrated. This was a RCBD design with 3 replicates
str(dataset)
'data.frame': 791 obs. of 4 variables:
$ ENV : Factor w/ 13 levels "2017_P","2017_S",..: 10 10 10 10 10 10 10 10 10 10 ...
$ GEN : Factor w/ 30 levels "8xp","XR_A0",..: 23 24 30 2 10 26 11 3 12 27 ...
$ BLOCK: int 1 1 1 1 1 1 1 1 1 1 ...
$ GY : num 233 225 248 259 255 ...
When I use cv_blup, I get the following error message:
BLUP_model = cv_blup(dataset,
resp = GY,
gen = GEN,
env = ENV,
rep = BLOCK,
nboot = 200,
random="gen")
Error: Column
errormust be length 262 (the number of rows) or one, not 264In addition: Warning message:
In pred - testing$Y :
longer object length is not a multiple of shorter object length
When I rerun it the command multiple times without changing anything else, the first line of the error message changes each time. For example:
Error: Column
errormust be length 261 (the number of rows) or one, not 265Error: Column
errormust be length 262 (the number of rows) or one, not 266Error: Column
errormust be length 262 (the number of rows) or one, not 263Error: Column
errormust be length 262 (the number of rows) or one, not 266Error: Column
errormust be length 262 (the number of rows) or one, not 265Error: Column
errormust be length 261 (the number of rows) or one, not 267Error: Column
errormust be length 262 (the number of rows) or one, not 265Do you know what is happening here?
Thanks!