-
Notifications
You must be signed in to change notification settings - Fork 6
/
trainTP.R
298 lines (260 loc) · 12.4 KB
/
trainTP.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
#' Train the emulator on a set of input ESM temperature and precipitation data.
#'
#' This function runs all of the steps in the analysis that depend only on the
#' input data. The results are saved in a structure of class \code{fldgen}.
#' This structure contains everything the emulator has learned about the model.
#' Therefore, it can be saved and reloaded in future sessions to bypass the
#' analysis step.
#'
#' The steps run by this procedure are \code{\link{read.general}},
#' \code{\link{pscl_analyze}}, and \code{\link{eof_analyze}}. Additionally,
#' the global mean temperature, global mean precipitation, and the FFT of the
#' EOF decomposition are calculated and stored. For backward compatibility,
#' all of those functions remain part of the public interface.
#'
#' At present the field generation functions still take as arguments the
#' individual components of the structure; however, in the next version they
#' will expect a \code{fldgen} object, which can be constructed manually using
#' \code{\link{fldgen_object}}.
#'
#' The names of the input files are recorded in the \code{infiles} field of the
#' returned object. This is for the your information only; none of the code
#' here does anything with the recorded filenames. You have the choice of
#' recording either absolute filenames (useful for determining exactly which
#' dataset was used, if you have a lot of similarly named files) or relative
#' filenames (useful if you might distribute a saved model to other users who
#' might not have the same directory structure as you).
#'
#' @section Supplying global mean temperatures:
#'
#' In some cases it may be desirable to supply global mean temperatures, rather
#' than calculating them from the grid. For example, the grid may cover only
#' part of the grid, so that the average of the values would not be a true
#' global average. In such cases the true global mean temperatures can be
#' supplied in a separate file.
#'
#' If this input file is used, it should be a plain text file with one
#' temperature value per line, each line corresponding to one time slice in the
#' corresponding netCDF file containing the grid data. The name of the file
#' will be constructed by adding a suffix (specified in the
#' \code{globalAvg_file} argument) to the name of the netCDF file. For example,
#' if the netCDF file is called \code{larry.nc}, and the suffix is
#' \code{curly.txt}, the name of the global mean temperature file will be
#' \code{larry.nccurly.txt}. If multiple netCDF files are passed in, each one
#' will have its own file of global mean temperatures, constructed in the same
#' way.
#'
#' @param dat A single directory name, or a list of netCDF files. If a
#' directory name is given, all netCDF files in the directory will be used.The
#' pairing of temperature and precipitation netCDF files in the directory
#' relies on the CMIP5 file naming conventions. Other naming conventions are
#' not currently supported.
#' @param tvarname Name of the temperature variable in the temperature netCDF
#' @param tlatvar Name of the latitude coordinate variable in the temperature
#' netCDF files.
#' Normally this is \code{'lat'}, but occasionally it will be something
#' different, such as \code{'lat_2'}. Can vary between paired T and P.
#' @param tlonvar Name of the longitude coordinate variable in the temperature
#' netCDF files. Can vary between paired T and P.
#' @param tvarconvert_fcn The function used to transform the T variable prior
#' to training so that it has support on -infinity to infinity. Defaults to
#' NULL, as Temperature is effectively already supported on this range.
#' @param pvarname Name of the precipitation variable in the precipitation
#' netCDF.
#' @param platvar Name of the latitude coordinate variable in the precipitation
#' netCDF files.
#' Normally this is \code{'lat'}, but occasionally it will be something
#' different, such as \code{'lat_2'}. Can vary between paired T and P.
#' @param plonvar Name of the longitude coordinate variable in precipitation
#' netCDF files. Can vary between paired T and P.
#' @param pvarconvert_fcn The function used to transform the P variable prior
#' to analyis so that it has support on -infinity to infinity. Defaults to
#' log() as precipitation values cannot be less than 0.
#' @param meanfield Function to compute the mean temperature response field.
#' The default is a linear pattern scaling algorithm.
#' @param globalAvg_file Optional string to use in constructing the name of the
#' file containing global mean temperatures, Tgav. If omitted, calculate Tgav
#' as the mean of the grid cells in each frame. (See "Supplying global mean
#' temperatures" for more information.)
#' @param record_absolute If \code{TRUE}, record absolute paths for the input
#' files; otherwise, record relative paths.
#' @return A \code{fldgen} object.
#' @export
trainTP <- function(dat,
tvarname = "tas", tlatvar='lat', tlonvar='lon',
tvarconvert_fcn = NULL,
pvarname = "pr", platvar='lat', plonvar='lon',
pvarconvert_fcn = logPfloor,
meanfield=pscl_analyze,
globalAvg_file = NULL,
record_absolute=FALSE)
{
## silence package checks
value <- var <- timestep <- time <- startyr <- stopyr <-
tfilename <- pfilename <- NULL
## Make sure working with list of file names
if(length(dat) == 1 && file.info(dat)$isdir) {
## This is a directory. Replace with the list of netCDF files contained
## within.
dat <- list.files(dat, '\\.nc$', full.names=TRUE)
}
## pair the T and P files
if(!is.null(dim(dat))){
if(ncol(dat) == 2){
## This is a paired list of input files.
## It should follow the formalism
## column1 = temperature,
## column2 = precipitation file names.
## Entry should each be a netcdf name.
paireddat <- tibble::tibble(tfilename = dat[,1],
pfilename = dat[,2])
}
else{
stop("Unrecognized input structure for training data.")
}
}
else{
## Otherwise files should be name according to the CMIP5 convention so
## that they may be paired.
paireddat <- file.pairer(dat, tvarname = tvarname, pvarname = pvarname)
}
## Prepare the list of input files that contribute to the training for output
infiles <- as.vector(as.matrix(paireddat))
if(record_absolute) {
infiles <- normalizePath(infiles)
}
### Train
# read the relevant Temperature and Precipitation inputs across all
# scenarios and concatenate
readinT <- function(fn) {read.general(fn, varname = tvarname,
latvar=tlatvar, lonvar=tlonvar)}
readinP <- function(fn) {read.general(fn, varname = pvarname,
latvar=platvar, lonvar=plonvar)}
griddataT <- concatGrids.general(lapply(paireddat$tfilename, readinT))
griddataP <- concatGrids.general(lapply(paireddat$pfilename, readinP))
# make sure supported on -infinity to infinity, add some output data to
# facilitate the reversing later
if(!is.null(tvarconvert_fcn)){
griddataT$vardata_raw <- griddataT$vardata
griddataT$vardata <- tvarconvert_fcn(griddataT$vardata)
} else {
griddataT$vardata_raw <- NULL
}
griddataT$tvarconvert_fcn <- tvarconvert_fcn
if(!is.null(pvarconvert_fcn)){
griddataP$vardata_raw <- griddataP$vardata
griddataP$vardata <- pvarconvert_fcn(griddataP$vardata)
} else {
griddataP$vardata_raw <- NULL
}
griddataP$pvarconvert_fcn <- pvarconvert_fcn
# Remove the gridcells that only contain NA values.
griddataT <- drop_NAs(griddataT)
griddataP <- drop_NAs(griddataP)
# calculate global average T
if(is.null(globalAvg_file)){
# Calculate the global mean Temperature internally
tgav <- griddataT$vardata %*% griddataT$globalop
} else {
tgav <- read_globalAvg(paireddat$tfilename,
globalAvg_file,
griddataT$vardata,
paireddat)
}
# Use this to calculate the mean field for T and the mean field for P
# separately. We want the joint variability on the residuals, but we
# want those residuals
meanfldT <- meanfield(griddataT$vardata, tgav)
meanfldP <- meanfield(griddataP$vardata, tgav)
# characterize the distribution in each grid cell for each variable by
# generating an empirical CDF function and an empirical quantile function
# for each variable in each grid cell.
tfuns <- characterize.emp.dist(meanfldT$r)
pfuns <- characterize.emp.dist(meanfldP$r)
# Use the empirical cdf to normalize residuals for both T and P
normresidsT <- normalize.resids(inputresids = meanfldT$r,
empiricalcdf = tfuns$cdf)
normresidsP <- normalize.resids(inputresids = meanfldP$r,
empiricalcdf = pfuns$cdf)
# bind columns of the matrices to create joint matrix of residuals
joint_residuals <- as.matrix(cbind(normresidsT$rn, normresidsP$rn))
# EOF decomposition
reof <- eof_analyze(joint_residuals, Ngrid = ncol(normresidsT$rn),
globop = griddataT$globalop)
# split the EOFs by ESM run.
# tags should be shared by T and P so doesn't matter which griddata you use
reof_l <- split_eof(reof, griddataT)
psd <- psdest(reof_l)
# output a generalized fldgen object
fldgen_object_TP(griddataT, griddataP,
tgav, meanfldT, meanfldP,
tfuns, pfuns,
reof, sqrt(psd$psd), psd$phases, infiles)
}
#' Create a \code{fldgen} object from constituent parts
#'
#' Normally this code will be called internally, but it is available for
#' external use, so that data created using the old interface can be converted.
#'
#' If there is only a single ESM run in the input data set, then \code{fxphase}
#' can be passed as a matrix of phases (instead of a list of matrices). It will
#' be converted into a list automatically.
#'
#' @param griddataT An object returned from \code{\link{read.temperatures}} or
#' \code{\link{concatGrids}}.
#' @param griddataP An object returned from \code{\link{read.temperatures}} or
#' \code{\link{concatGrids}}.
#' @param tgav Global mean temperature for the grids in griddata.
#' @param meanfldT Object returned from whatever mean field analysis is done,
#' currently \code{\link{pscl_analyze}}.
#' @param meanfldP Object returned from whatever mean field analysis is done,
#' currently \code{\link{pscl_analyze}}.
#' @param tfuns The empirical cdf and empirical quantile functions for each
#' grid cell, characterizing the distribution of residuals resulting from
#' meanfldT.
#' @param pfuns The empirical cdf and empirical quantile functions for each
#' grid cell, characterizing the distribution of residuals resulting from
#' meanfldP.
#' @param reof Object returned from \code{\link{eof_analyze}}.
#' @param fxmag The magnitude of the Fourier transform of the EOF projection
#' coefficients. This should be a matrix [Ntime x NEOF]. If using
#' \code{\link{psdest}}, note this is the square root of the power spectral
#' density returned by that function.
#' @param fxphase List of matrices [Ntime x NEOF] of phases of the Fourier
#' transform. There should be one element in the list for each input ESM run.
#' @param infiles Names of input files used to construct the data.
#' @export
#' @keywords internal
fldgen_object_TP <- function(griddataT, griddataP,
tgav, meanfldT, meanfldP,
tfuns, pfuns,
reof, fxmag, fxphase, infiles)
{
if(is.matrix(fxphase)) {
phases <- list(fxphase)
}
else {
phases <- fxphase
}
Fx <- list(mag = fxmag,
phases = phases)
fg <- list(griddataT=griddataT,
griddataP=griddataP,
tgav=tgav,
meanfldT=meanfldT,
meanfldP=meanfldP,
tfuns = tfuns,
pfuns = pfuns,
reof=reof,
fx=Fx,
infiles=infiles)
class(fg) <- 'fldgen'
fg
}
logPfloor <- function(x)
{
## Log transformation of precipitation, with a floor of 1e-9 (slightly less than
## 0.1mm for the year) on the pre-transform values
x <- pmax(x, 1e-9)
log(x)
}