/
ParameterEstimation_HTP.Rmd
290 lines (230 loc) · 13.7 KB
/
ParameterEstimation_HTP.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
---
title: "statgenHTP tutorial: 6. Estimation of parameters from time courses"
author: "Emilie Millet, Bart-Jan van Rossum, Martin Boer, Fred van Eeuwijk, Diana M. Pérez-Valencia, María Xosé Rodríguez-Álvarez"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
number_sections: yes
toc: no
bibliography: bibliography.bib
link-citations: yes
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
library(statgenHTP)
library(ggplot2)
```
# Introduction
This document presents the final step of the HTP data analysis: extracting interesting parameters from the modeled time courses [@Brien2020]. For example, in the second data set from the PhenoArch platform, the maximum leaf area (from the P-splines) or the maximum leaf growth rate (from the first derivatives) are relevant parameters (see figure below). We could assess their variability in the genotypes and the difference between treatments, *e.g.* has the water scenario decreased the maximum leaf area?
```{r PSplineDerivArchFig, out.width='80%', fig.pos='c', echo=FALSE}
knitr::include_graphics("figures/psplinederiv_Arch.png", auto_pdf = TRUE)
```
It is also possible to specify a period for the parameter estimation. For example, in the first data set from the Phenovator platform, we can select the period with the high light intensity (see figure below) and estimate the maximum slope during that period (from the first derivatives). This could be interpreted as a recovery rate of the photosystem II efficiency.
```{r PSplineDerivVatorFig, out.width='80%', fig.pos='c', echo=FALSE}
knitr::include_graphics("figures/psplinederiv_Vator.png", auto_pdf = TRUE)
```
These parameters could then be further analyzed, for example in a GxE analysis (see [*statgenGxE*](https://biometris.github.io/statgenGxE/index.html)), or a genetic analysis (see [*statgenGWAS*](https://biometris.github.io/statgenGWAS/index.html)).
The functions described in this tutorial can be applied to corrected data, genotypic means (BLUEs or BLUPS) (see [**statgenHTP tutorial: 3. Correction for spatial trends**](SpatialModel_HTP.html)), curves obtained from the P-splines hierarchical data model (see [**statgenHTP tutorial: 5. Modelling the temporal evolution of the genetic signal**](HierarchicalDataModel_HTP.html)), or on raw data. It allows estimating maximum, minimum, mean, area under the curve (auc) or percentile using predicted values, first or second derivative, during a given period or for the whole time course.
------------------------------------------------------------------------
# Estimation of parameters from curves
## Example 1
We will use the `fit.splineNumOut` previously created (see [**statgenHTP tutorial: 4. Outlier detection for series of observations**](OutlierSerieObs_HTP.html)). It contains the P-spline prediction on a subset of plants without the time course outliers. We will estimate the area under the curve of the trait:
```{r fitSplineVatorNum, echo=FALSE, message=FALSE, warning=FALSE}
subGenoVator <- c("G070", "G160", "G151", "G179", "G175", "G004", "G055")
fit.splineNum <- fitSpline(inDat = spatCorrectedVator,
trait = "EffpsII_corr",
genotypes = subGenoVator,
knots = 50,
useTimeNumber = TRUE,
timeNumber = "timeNumHour")
predDat <- fit.splineNum$predDat
coefDat <- fit.splineNum$coefDat
outVator <- detectSerieOut(corrDat = spatCorrectedVator,
predDat = predDat,
coefDat = coefDat,
trait = "EffpsII_corr",
genotypes = subGenoVator,
thrCor = 0.9,
thrPca = 30)
fit.splineNumOut <- removeSerieOut(fitSpline = fit.splineNum,
serieOut = outVator)
```
```{r paramVator, fig.height=2, fig.width=4, message=FALSE, warning=FALSE}
subGenoVator <- c("G070", "G160", "G151", "G179", "G175", "G004", "G055")
paramVator1 <-
estimateSplineParameters(x = fit.splineNumOut,
estimate = "predictions",
what = "AUC",
timeMin = 330,
timeMax = 432,
genotypes = subGenoVator)
plot(paramVator1, plotType = "box")
```
For this subset of genotypes, there is a variability in AUC of the psII efficiency. This could be used in genetic analysis and maybe to perform a GWAS.
Another example is using the derivative during the recovery period (at the end of the time course, so after the light change) to get the maximum slope, or the maximum rate of the psII per time unit during this period.
```{r paramVator2, fig.height=2, fig.width=4, message=FALSE, warning=FALSE}
paramVator2 <-
estimateSplineParameters(x = fit.splineNumOut,
estimate = "derivatives",
what = "max",
timeMin = 210,
timeMax = 312,
genotypes = subGenoVator)
plot(paramVator2, plotType = "box")
```
> Note: when "min" or "max" is selected, the output also contains the parameter occurence time point, as numerical `timeNumber` and date `timePoint`. See in the table below:
```{r headParam2, echo=FALSE, message=FALSE}
knitr::kable(head(paramVator2), align=c('c','c'), booktabs = TRUE)
```
## Example 2
For this example, we will use the genotypic prediction (BLUPs, see [**statgenHTP tutorial: 3. Correction for spatial trends**](SpatialModel_HTP.html)) available in the `spatPredArch` data set. We will fit P-splines at the genotypic level using the `geno.decomp` levels defined in the spatial model.
```{r fitSplineArch, fig.height=3, fig.width=6, message=FALSE, warning=FALSE}
data(spatPredArch)
fit.splineGenoArch <- fitSpline(inDat = spatPredArch,
trait = "predicted.values",
knots = 15,
minNoTP = 18)
plot(fit.splineGenoArch,
genotypes = "GenoA36")
```
We can estimate the maximum value of the leaf area from the predicted P-splines:
```{r paramArch1, message=FALSE, warning=FALSE}
paramArch1 <-
estimateSplineParameters(x = fit.splineGenoArch,
estimate = "predictions",
what = "max")
```
```{r paramArch1fig, fig.height=2, fig.width=6, message=FALSE, warning=FALSE}
plot(paramArch1, plotType = "hist")
```
## Example 3
For this example, we will use curves obtained from the P-splines hierarchical data model (see [**statgenHTP tutorial: 5. Modelling the temporal evolution of the genetic signal**](HierarchicalDataModel_HTP.html)). We can use `psHDM` objects (that is, objects obtained from `fitSplineHDM()` (fitted curves) or `predict.psHDM()` (predicted curves)). For this example, we will use predicted curves.
```{r fitPredPheno, message=FALSE, warning=FALSE}
## The data from the Phenovator platform have been corrected for spatial
## trends and outliers for single observations have been removed.
## We need to specify the genotype-by-treatment interaction.
## Treatment: water regime (WW, WD).
spatCorrectedArch[["treat"]] <- substr(spatCorrectedArch[["geno.decomp"]],
start = 1, stop = 2)
spatCorrectedArch[["genoTreat"]] <-
interaction(spatCorrectedArch[["genotype"]],
spatCorrectedArch[["treat"]], sep = "_")
## Fit P-Splines Hierarchical Curve Data Model for all genotypes.
fit.psHDM <- fitSplineHDM(inDat = spatCorrectedArch,
trait = "LeafArea_corr",
pop = "geno.decomp",
genotype = "genoTreat",
plotId = "plotId",
difVar = list(geno = FALSE, plot = FALSE),
smoothPop = list(nseg = 5, bdeg = 3, pord = 2),
smoothGeno = list(nseg = 5, bdeg = 3, pord = 2),
smoothPlot = list(nseg = 5, bdeg = 3, pord = 2),
weights = "wt",
trace = FALSE,
useTimeNumber = FALSE)
## Predict the P-Splines Hierarchical Curve Data Model on a dense grid
## Only predictions (and standard errors) are obtained
## at the population and genotype levels
pred.psHDM <- predict(object = fit.psHDM,
newtimes = seq(min(fit.psHDM$time[["timeNumber"]]),
max(fit.psHDM$time[["timeNumber"]]),
length.out = 100),
pred = list(pop = TRUE, geno = TRUE, plot = TRUE),
se = list(pop = TRUE, geno = TRUE, plot = FALSE),
trace = FALSE)
```
Although we have available information at population, genotype and plot levels, this function only extracts information at genotype and plot levels. Nevertheless, we are generally interested in the genotype level. For example, in the paper by @Perez2022, they extracted three features:
* The maximum spatially corrected leaf area (from estimated genotype-specific trajectories)
```{r paramPhenoGenoPred1, fig.height=3, fig.width=4, message=FALSE, warning=FALSE}
## From estimated genotype-specific trajectories
plot(pred.psHDM, plotType = "popGenoTra", themeSizeHDM = 4)
```
```{r paramPhenoGenoPred2, fig.height=3, fig.width=4, message=FALSE, warning=FALSE}
## Estimate maximum spatially corrected leaf area.
paramArch1 <- estimateSplineParameters(x = pred.psHDM,
what = "max",
fitLevel = "geno",
estimate = "predictions")
## Create a boxplot of the estimates.
plot(paramArch1, plotType = "box")
```
* The maximum speed rate (from the first derivative of the estimated genotype-specific trajectories)
```{r paramPhenoGenoDeriv1, fig.height=3, message=FALSE, warning=FALSE}
## From the first derivative of the estimated genotype-specific trajectories
plot(pred.psHDM, plotType = "popGenoDeriv", themeSizeHDM = 4)
```
```{r paramPhenoGenoDeriv2, fig.height=3, fig.width=4, message=FALSE, warning=FALSE}
## Estimate maximum speed rate
## We are interested on a local maximum (before timeNumber 2500)
paramArch2 <- estimateSplineParameters(x = pred.psHDM,
what = "max",
fitLevel = "geno",
estimate = "derivatives",
timeMax = 2500)
## Create a boxplot of the estimates.
plot(paramArch2, plotType = "box")
```
* The area under the estimated genotype-specific deviations, as follows
```{r paramPhenoGenoDev1, fig.height=3, message=FALSE, warning=FALSE}
## From the estimated genotype-specific deviations
plot(pred.psHDM, plotType = "genoDev", themeSizeHDM = 4)
```
```{r paramPhenoGenoDev2, fig.height=3, fig.width=4, message=FALSE, warning=FALSE}
## Estimate area under the curve (AUC).
paramArch3 <- estimateSplineParameters(x = pred.psHDM,
what = "AUC",
fitLevel = "genoDev",
estimate = "predictions")
## Create a boxplot of the estimates.
plot(paramArch3, plotType = "box")
```
## Example 4
For this example, we will use the raw data from the RootPhAir, corrected for individually outlying observations (see [**statgenHTP tutorial: 2. Outlier detection for single observations**](OutlierSingleObs_HTP.html)) and time course outliers (see [**statgenHTP tutorial: 4. Outlier detection Time course**](OutlierSerieObs_HTP.html)). We will fit P-splines at the plant level on a subset of genotypes.
```{r rmOutRoot, echo=FALSE, message=FALSE, warning=FALSE}
subGenoRoot <- c( "2", "6", "8", "9", "10", "520", "522")
fit.splineRoot <- fitSpline(inDat = noCorrectedRoot,
trait = "tipPos_y",
knots = 10,
genotypes = subGenoRoot,
minNoTP = 0,
useTimeNumber = TRUE,
timeNumber = "thermalTime")
predDatRoot <- fit.splineRoot$predDat
coefDatRoot <- fit.splineRoot$coefDat
outRoot <- detectSerieOut(corrDat = noCorrectedRoot,
predDat = predDatRoot,
coefDat = coefDatRoot,
trait = "tipPos_y",
genotypes = subGenoRoot,
thrCor = 0.9,
thrPca = 25)
outRootFinal <- outRoot[outRoot$plotId %in% c("A_03_5","A_29_2"),]
noCorrectedRootOut <- removeSerieOut(dat = noCorrectedRoot,
serieOut = outRootFinal)
noCorrectedRootOut <- noCorrectedRootOut[!is.na(noCorrectedRootOut$tipPos_y),]
noCorrectedRootOut <- droplevels(noCorrectedRootOut)
```
```{r fitSplineRoot, message=FALSE, warning=FALSE}
subGenoRoot <- c( "2","6","8","9","10","520","522")
fit.splineRootOut <- fitSpline(inDat = noCorrectedRootOut,
trait = "tipPos_y",
knots = 10,
genotypes = subGenoRoot,
minNoTP = 0,
useTimeNumber = TRUE,
timeNumber = "thermalTime")
```
We will then estimate the mean growth rate using `what = "mean"` for the subset of genotypes:
```{r paramRoot1, fig.height=2, fig.width=4, message=FALSE, warning=FALSE}
paramRoot1 <-
estimateSplineParameters(x = fit.splineRootOut,
estimate = "derivatives",
what = "mean")
plot(paramRoot1, plotType = "box")
```
------------------------------------------------------------------------
## References