/
fChange_vignette.Rmd
411 lines (337 loc) · 27.5 KB
/
fChange_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
---
title: 'R package: fChange'
author: "Ozan Sonmez"
output:
github_document:
toc: true
toc_depth: 2
bibliography: refff.bib
---
# Introduction
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(fChange)
library(reshape2)
library(fda)
```
The goal of this documentation is to illustrate the use of the R package ``fChange``. The package is readily available in CRAN for download. This package is an comprehensive implementation of the structural break analysis in functional data where the methodologies are inherited from @ozan, @Ber, @Aue, and @Aston2012a. The main functions of the ```fChange``` package will be discussed individually and the usage of these functions are illustrated by examples.
We will start with discussing some of the functional data generation processes such as simulating independent and heavy tailed functional data. Then the dependent functional data generations are discussed such as functional auto-regressive process (FAR) and functional moving average (FMA) processes. Most of the functional data analysis techniques requires an initial dimension reduction by projecting the functional variation onto smaller number of of leading eigenfunctions of the covariance operator (or long run covariance function in the case of dependent data), therefore we will next discuss covariance function (long run covariance function) estimation, and how to choose its tuning parameters. This will be followed by discussing the two major functions in the ``fChange``package, i.e., detecting and dating functional structural breaks (1) with and (2) without dimension reduction. Finally, the functions in this package will be applied to real world data sets.
# Data Generation Processes
In this section some of the basic functional data generating processes are discussed.
## Independent Functional Data
The implementations here follows closely the DGP discussed in @ozan, i.e., $n$ functional data objects are generated using $D$ basis functions $v_1, v_2, \dots, v_D$ on the unit interval $[0,1]$. The independent curves are then generated according to the linear combinations of the chosen basis functions where the linear combination parameters are drawn from independent normal random variables with standard deviations $\sigma = (\sigma_1,\dots,\sigma_D)$ which mimics the decay for the eigenvalues of the covariance operator.
```{r, eval=FALSE}
fun_IID(n, nbasis, Sigma = NULL, basis = NULL, rangeval = c(0, 1), ...)
```
The above function generates an independent functional data of sample size $n$ using `nbasis` number of basis with an eigenvalue decay of the covariance operator. The basis functions can be defined using basis functions in the ``fda`` package. The default is the Fourier basis. Below is an example of two independent data simulation with slow and fast decay of the igenvalues of the covariance operator, respectively.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
set.seed(123)
N=100
D=21
Sigma_fast = 3^-(1:D)
Sigma_slow = (1:D)^-1
fdata_slow = fun_IID(n = N, nbasis = D, Sigma = Sigma_slow)
fdata_fast = fun_IID(n = N, nbasis = D, Sigma = Sigma_fast)
par(mfrow=c(1,2))
invisible(plot(fdata_slow, main="Slow Decay"))
invisible(plot(fdata_fast, main="Fast Decay"))
```
In order to check whether we really simulated an independent functional data, one can plot the auto-correlation and partial auto-correlation functions of the scores of the principle components. Note that when we have a fast decay the smaller nuymber of prinicple components will explain most of the functional variation than the slow decay counter-part. This is also reflected in the plot of the functional data and how smooth the curves are, i.e., when the eigenvalue decay of the covariance function is faster, then one would expect that the generated cureves to be smaller than when the functional data has a slower eigenvalue decay. Assuming that we have the fast decay of the eigenvalues, one or two principle components would explain most of the variation in the data, therefore if there is any depedency that would also appear in the ACF plots of the scores:
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
scores = pca.fd(fdata_fast, nharm = D, centerfns = T)$scores
par(mfrow=c(2,2))
acf(scores[,1], main="First principle component, ACF")
pacf(scores[,1], main="First principle component, PACF")
acf(scores[,2], main="Second principle component, ACF")
pacf(scores[,2], main="Second principle component, PACF")
```
The ACF and PACF plots of the first two principe components are evidence of the independence of the functional data that is generated here. The basis functions can be generated using the R package ``fda`` which is, without a doubt, one of the most popular basic tools to deal with functional data analysis in R programming environment. Some of the basis functions that can be used here are:
```{r, eval=FALSE}
library(fda)
create.bspline.basis(...)
create.fourier.basis(...)
create.constant.basis(...)
create.exponential.basis(...)
create.power.basis()
create.monomial.basis(...)
create.polygonal.basis(...)
```
as a default, in all functional data generation processes, we use Fourier basis, however one can define any particular basis using the above ``fda`` functions.
## Heavy Tailed Functional Data
It is often the case that the financial data is highly heavy tailed, and these can be, by nature, considered a functional data. For example, a particular Stock price is observed in one minute resolution from the opening to the closing of the Stock market. One can consider these as the discrete observations of the daily price function of that particular Stock price. Therefore, it might be useful to be able to mimic these realistic situations by simulating heavy tailed functional data.
The data generation process is very similar to the `fun_IID` function above, except that the linear combination (of the basis functions) coefficients are no longer follow a normal distribution but they are drawn from a t-distribution with a user defined degrees of freedom.
```{r, eval=FALSE}
fun_heavy_tailed(n, nbasis, df = 3, basis = NULL, rangeval = c(0, 1), ...)
```
here is a simple example of heavy tailed functional data generation using B-spline basis basis functions
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
D = 21
range_val = c(0, 1)
bbasis = create.bspline.basis(rangeval = range_val, nbasis = D)
fdata_heavy_tailed = fun_heavy_tailed(n = 100, nbasis = D, basis = bbasis, rangeval = range_val)
invisible(plot(fdata_heavy_tailed, main="Heavy Tailed Functional Data"))
```
## Functional Auto Reggresive (FAR) Process
The FAR data generation process is inherited from @ozan. Once the independent curves are formed as in above, the functional auto-reggresive curves are generated using the appropriate recursion and the order. The functional autoreggresive curves of order $p$ are generated using the autoreggresive recursions where the FAR operator was set up as $\Psi_k=\kappa_k\Psi_0$, and the random operator $\Psi_0$ is represented by a $D\times D$ matrix whose entries consist of independent, centered normal random variables with standard deviations given by $\sigma\times\sigma'$. A scaling was applied to achieve $||\Psi_0||=1$. The constant $\kappa_k,k=1,\dots,p$ can then be used to adjust the strength of the temporal dependence. To ensure stationarity of the time series, the absolute value of the sum of the $\kappa_k$'s must be smaller than 1.
```{r, eval=FALSE}
fun_AR(n, nbasis, order = NULL, kappa = NULL, Sigma = NULL, basis = NULL, rangeval = c(0, 1), ...)
```
The above function generates FAR(p) process where `order` defines the order of the FAR operator $p$, where the the lag dependencies are defined by the `kappa` vector whose length must be same as $p$. The other arguments are similar to the one `fun_IID` function.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
set.seed(12)
FAR1_fast = fun_AR(n = N, nbasis = D, order=1, kappa=0.75, Sigma = Sigma_fast)
scores_FAR1 = pca.fd(FAR1_fast, nharm = D, centerfns = T)$scores
par(mfrow=c(1,3))
invisible(plot(FAR1_fast, main="Fast Decay"))
acf(scores_FAR1[,1], main="ACF of the first PC")
pacf(scores_FAR1[,1], main="PACF of the first PC")
```
Above we generated a functional time series which follows FAR(1) process where the norm of the FAR operator is 0.75, which measures the dependency strength. Note that the `Sigma` is chosen in a way that the eigenvalue decay of the covariance operator is "fast", which results in one or two principle components explaining most of the functional variation. To check the validity of the dependency structure of the functional data, we have also plotted the ACF and PACF of the scores associated with the leading principal component. ACF decaying exponentially and PACF having only one significant lag, is an evidence of the FAR(1) structure in the functional data.
## Functional Moving Average (FMA) Process
Again, following the data generation process in @ozan, below function generates functional moving average process of order $q$. The implimentation is very similar to the `fun_AR` with an appropriate recursion.
```{r, eval=FALSE}
fun_MA(n, nbasis, order = NULL, kappa = NULL, Sigma = NULL, basis = NULL, rangeval = c(0, 1), ...)
```
Here is an example of FMA(1) data generation process:
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
set.seed(1)
FMA1_fast = fun_MA(n = N, nbasis = D, order=1, kappa=0.9, Sigma = Sigma_fast)
scores_FMA1 = pca.fd(FMA1_fast, nharm = D, centerfns = T)$scores
par(mfrow=c(1,3))
invisible(plot(FMA1_fast, main="Fast Decay"))
acf(scores_FMA1[,1], main="ACF of the first PC")
pacf(scores_FMA1[,1], main="PACF of the first PC")
```
Once again, exponential decay of the pacf and ACF having only one significant lag is an evidence of the MA structure in the functional data.
## Inserting Artificial Changes
The main focus of the ``fChange`` package is to date and detect the structural breaks (specially in the mean function) in the functional data, therefore for illustrative puposes, it is important to be able to insert artifical changes to do functional data with some specifications. This will allow us to compare and contrast different methods (will be discussed later) in a controled environment.
```{r, eval=FALSE}
insert_change(fdobj, change_fun = NULL, k = NULL, change_location, SNR, plot = TRUE, ...)
```
This function will instert a change to the functional data `fdobj` where the inserted change can be given in two ways: (1) a custom change function `change_fun`, (2) change function which is the mean of first `k` basis functions (see @ozan). The location of the change is defined by `change_location` which is scaled by the sample size, hence this needs to be in $[0,1]$, for example $0.5$ inserts the change in the middle of the data. The magnitude of the change is defined by the signal to noise ration `SNR`. If `plot=TRUE` then the plot of the resulted functional data is given colored by before (blue) and after (red) the change.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
set.seed(1)
par(mfrow=c(1,4))
Fdata0 = insert_change(FAR1_fast, k=5, change_location = 0.5, SNR=0)
Fdata1 = insert_change(FAR1_fast, k=5, change_location = 0.5, SNR=1)
Fdata5 = insert_change(FAR1_fast, k=5, change_location = 0.5, SNR=5)
Fdata10 = insert_change(FAR1_fast, k=5, change_location = 0.5, SNR=10)
```
Above we inserted a the mean of the first 5 basis functions as an artificial change in the middle of the data generated that was generated before (FAR(1) data) with 4 different signal to noise ratio, i.e., 0, 1, 5 and 10.
# Functional Data Analysis Essentials:
Majority of the functional data analysis techniques depends on the famous dimension reduction technique called "functional Principal Component Analysis" (fPCA), where the main idea is to project the functional data into small number of principle components. In other words the functional data is projected on to finite number of eigenfunctions of the covariance operator. One of the challanges in fPCA is to pick the right dimension, $d$, of the subspace that the functional data is projected onto. The rule of thumb, is picking the subspace dimension based on the Total Variation Explained (TVE) by the first $d$ principal components, for example picking $d$ when TVE exceeds 90 or 95 percent of the variation.
```{r, eval=FALSE}
pick_dim(fdobj, TVE)
```
The above function takes the functional data `fdobj` and gives the number of principle components `d` that will explain at least `TVE` percent of the functional variation. It will also output the TVE for each additional principal components.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
d_fast = pick_dim(fdata_fast, 0.95)
d_slow = pick_dim(fdata_slow, 0.95)
par(mfrow=c(1,2))
plot(d_fast$TVEs, main="Fast decay TVE", type="o")
plot(d_slow$TVEs, main="Slow decay TVE", type="o")
c(d_fast$d, d_slow$d)
```
As it is expected the fast decay functional data requires 2 principle components to reach 95% TVE, while the slow decay functional data requires at least 7 principle components. This function is essential to pick the dimension of the projection space whenever the functional principle component analysis (fPCA) based methodolgy is applied.
Another esstential tool for functional data is to center the data by subtracting the mean function from each observation. However if there is a change in the mean function, one needs to estimate the change location and compute the mean function before and after the chnage and then subtract them from the data before and after the change, respectively. We extend the ``fda`` function `center.fd()` by also incorporating the possible change in the mean function.
```{r, eval=FALSE}
center_data(fdobj, change = TRUE)
```
For an example lets compare theis function with `center.fd()`. Lets use the function `Fdata5` which has an inserted change with SNR=5.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
par(mfrow=c(1,2))
invisible(plot(center.fd(Fdata5$fundata), main="center.fd()"))
invisible(plot(center_data(Fdata5$fundata), main="center_data()"))
```
As it is obvious from above example that `center.fd()` does not take into account any changes in the mean function when centering the data.
# Covariance Function Estimation
In arenas of application including environmental science, economics, and medicine, it is increasingly common to consider time series of curves or functions (functional time series). Many inferential procedures employed in the analysis of such data involve the long run covariance function or operator, which is analogous to the long run covariance matrix familiar to finite dimensional time series analysis and econometrics.
The long run covariance function operator alsp plays a crucial role in dating and detecting structural breaks in functional data. Following the notation and implementation of @ozan the long run covariance function is estimated using the below function in ``fChange``.
```{r, eval=FALSE}
LongRun(fdobj, h, kern_type = "BT", is_change = TRUE, ...)
```
which take s functional data obsject `fdobj` and estimates the long run covariance function using the bandwidth (window) parameter `h` and the kernel `kernel_type` by centering the data. If `is_change=TRUE` then the centering is done considering the change in the mean function (see `center_data`). The function will return to the eigenfunctions and eigenvalues of the long run covariance function along with the coefficinets of the spectral decomposition of the long run covariance function. The plot of the long run covariance function surface $C(t,s)$ is also given. Note that $h=0$ is used when the data is independent.
Here is an example for the independent functional data,
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
fdata_iid = fun_IID(n=100, nbasis=21)
C_iid = LongRun(fdata_iid, h=0)
```
here is the eigen values
```{r}
lambda_hat = C_iid$e_val
lambda_hat
```
and the plots of the eigenfunctions :
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
v_hat = C_iid$e_fun
invisible(plot(v_hat, main="eigen functions"))
```
Moreover the surface plot of the covariance function is given as:
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
C_iid$p
```
One can also see how the surface of the long run covariance function is changing based on the decay rate of the eigenvalues of the long run covariance operator. Below, two FAR(1) processes are generated and with slow and fast eigenvalue decay and the surface plots of the estimated long run covariance functions are displayed.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
fdata_slow = fun_AR(n=100, nbasis=21, order = 1, kappa = 0.8) # slow decay
fdata_fast = fun_AR(n=100, nbasis=21, order = 1, kappa = 0.8, Sigma = 3^-(1:21)) # fast decay
C1 = LongRun(fdata_slow, h=2)
C2 = LongRun(fdata_fast, h=2)
C1$p # Slow decay C(t,s)
C2$p # Fast decay C(t,s)
```
The question of how to pick the optimal bandwidth `h` is investigated by @greg1 proposing a methid for a plug-in bandwidth selection procedure for long run covariance estimation with stationary functional time series.
```{r, eval=FALSE}
opt_bandwidth(fdobj, kern_type, kern_type_ini, is_change = TRUE, ...)
```
For example, for the FAR(1) process generated above we used `h=2` but the optimal bandwith is estimated as:
```{r}
OPT = opt_bandwidth(fdata_fast, "BT", "BT")
h_opt = floor(OPT$hat_h_opt)
h_opt
```
# Structural Break Analysis
Early work in functional structural break analysis dealt primarily with random samples of independent curves, the question of interest being whether all curves have a common mean function or whether there are two or more segments of the data that are homogeneous within but heterogeneous without. @Ber developed statistical methodology to test the null hypothesis of no structural break against the alternative of a (single) break in the mean function assuming that the error terms are independent and identically distributed curves. @Aue quantified the large-sample behavior of a break date estimator under a similar set of assumptions. The work in these two papers was generalized by @Aston2012a and @Aston:2012b to include functional time series exhibiting weak dependence into the modeling framework. Al these aforementioned papers makes use of an initial dimension reduction via fPCA, and @ozan developed methodology to uncover structural breaks in functional data that is “fully functional” in the sense that it does not rely on dimension reduction techniques.
## fPCA Based Methods
Most of the procedures in FDA, such as those presented in the above cited papers, are based on dimension reduction techniques, primarily using the widely popular functional principal components analysis (fPCA), by which the functional variation in the data is projected onto the directions of a small number of principal curves, and multivariate techniques are then applied to the resulting sequence of score vectors. This is also the case in functional structural break detection, in which after an initial fPCA step multivariate structural break theory is utilized. Despite the fact that functional data are, at least in principle, infinite dimensional, the state of the art in FDA remains to start the analysis with an initial dimension reduction procedure. The fPCA based change point detector is based on standardized cumulative sum (CUSUM) statistics of the score vectors.
```{r, eval=FALSE}
change_fPCA(fdobj, d, M = 1000, h = 0, plot = FALSE, ...)
```
The above function tests whether there is a significant change in the mean function of functional data, and it gives an estimate of the location of the change. The procedure will reduce the dimension of the functional data using functional principal component analysis and will use `d` leading principal curves to carry out the change point analysis. The projection dimension `d` can be chosen via total variation explained (TVE) using the function `pick_dim`.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
set.seed(123)
datf = fun_AR(n = 100, nbasis = 21, order = 1, kappa = 0.75, Sigma = Sigma_fast)
dat_change = insert_change(datf, k=5, change_location = 0.5, SNR = 2.5, plot = FALSE)
fdata = dat_change$fundata
d_hat = pick_dim(fdata, 0.90)$d
fPCA_result = change_fPCA(fdata, d=d_hat, plot = TRUE)
fPCA_result$pvalue # estimated p-value
fPCA_result$change # estimated change point location
```
## Fully Functional Method
Dimension reduction approaches, however, automatically incur a loss of information, namely all informa-
tion about the functional data that is orthogonal to the basis onto which it is projected. This weakness is easily illustrated in the context of detecting and dating structural breaks in the mean function: if the function representing the mean break is orthogonal to the basis used for dimension reduction, there cannot be a consistent
test or estimator for the break date in that basis.
@ozan develop methodology for detecting and dating structural breaks in functional data without the application of dimension reduction techniques. In their work, fully functional test statistics and break date estimators are studied, and their asymptotic theory is developed under the assumption that the model errors satisfy a general weak dependence condition. This theory illuminates a number of potential advantages of the fully functional procedures. For example, it is shown that when the direction of the break is orthogonal to the leading principal components of the data, the estimation of the mean break is asymptotically improved when using the fully functional estimator compared to mean breaks of the same size that are contained in the leading principal components. This contrasts with fPCA based techniques in which such mean breaks are more difficult, if not impossible, to detect, even given arbitrarily large sample sizes. In addition, the assumptions required for the fully functional theory are weaker than the ones used in @Aue and @Aston2012a, as convergence of the eigenvalues and eigenfunctions of the empirical covariance operator to the eigenvalues of the population covariance operator do not have to be accounted for. These assumptions are typically formulated as finiteness of fourth moment conditions.
```{r, eval=FALSE}
change_FF(fdobj, M = 1000, h = 0, plot = FALSE, ...)
```
This function tests whether there is a significant change in the mean function of the functional data, and it will give an estimate for the location of the change. The procedure is based on the standard $L^2$ norm and hence does not depend on any dimension reduction technique such as fPCA.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
LRCV = opt_bandwidth(fdata, "BT", "BT")
h_opt = floor(LRCV$hat_h_opt)
FF_result = change_FF(fdata, plot = TRUE)
FF_result$pvalue # estimated p-value
FF_result$change # estimated change point location
```
The advantage of fully functional method is apperent, then most the change is in the later eigendirections but the fPCA is performed with few principal curves.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
set.seed(123)
datf2 = fun_AR(n = 100, nbasis = 21, order = 1, kappa = 0.75, Sigma = Sigma_fast)
dat_change2 = insert_change(datf2, k=21, change_location = 0.5, SNR = 1, plot = FALSE)
fdata2 = dat_change2$fundata
d_hat2 = pick_dim(fdata2, 0.90)$d
fPCA_result2 = change_fPCA(fdata2, d=d_hat2)
LRCV2 = opt_bandwidth(fdata2, "BT", "BT")
h_opt2 = floor(LRCV2$hat_h_opt)
FF_result2 = change_FF(fdata2)
p_fPCA = fPCA_result2$pvalue
c_fPCA = fPCA_result2$change
p_FF = FF_result2$pvalue
c_FF = FF_result2$change
res = data.frame(method=c("fPCA", "FF"), p_value = c(p_fPCA, p_FF), location = c(c_fPCA, c_FF))
res
```
Note that in the above example, the change is spread over $D=21$ eigen directions but the fPCA-based method is perfomed using $d=1$, therefore the fully functional method is over performing the fPCA based method. Hence, whenever the signal of interest is not dominant or is “sparse”, in the sense that it is not entirely contained in the leading principal components, then alternatives to dimension reduction based methods should be considered and are likely more effective.
## Confidence Interval
The mathematical details of the confidence interval implimentation is given by @ozan,
```{r, eval=FALSE}
Conf_int(fdobj, h = 0, kern_type = "BT", alpha = 0.05, ...)
```
A change point in the mean function of functional data is estimated, and for a user selected level, aconfidence interval is computed using the fully functional procedure introduced @ozan.
```{r}
Conf_int(fdata, h=h_opt)
```
One can also illustrates that the confidence interval gets shrunken as the SNR gets larger:
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
set.seed(123)
fdata_ci = fun_AR(n = 100, nbasis = 21, order = 1, kappa = 0.75, Sigma = Sigma_fast)
CI_SNR = function(SNR){
fdata_CI = insert_change(fdata_ci, k=21, change_location = 0.5, SNR = SNR, plot = F)$fundata
LRCV = opt_bandwidth(fdata_CI, "BT", "BT")
h_opt = floor(LRCV$hat_h_opt)
Conf_int(fdata_CI, h = h_opt)
}
snr = c(0.25, 0.5, 0.75, 1, 3)
results = sapply(snr, function(k) CI_SNR(k))
snr_name = paste0("SNR=", snr)
colnames(results) = snr_name
t(results)
```
# Data Sets
## Australian Temperature Data
Australian daily minimum temperature climate data for 8 different stations are provided. The data is taken from Australian Government Bureau of Meteorology. The daily observations are available from http://www.bom.gov.au/climate/data. Copyright Commonwealth of Australia 2010, Bureau of Meteorology. Definitions adapted from http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml. The stations that the daily temperature data is taken are:
* Sydney
* Melbourne
* Boulia
* Cape_Otway
* Gayndah
* Gunnedah
* Hobart
* Robe
For example the first couple rows the Sydney temerature data is given below:
```{r}
head(Australian_Temp$Sydney)
```
we convert the above data into functional data by removing the `NA`'s first:
```{r}
fun_data_S = Australian_Temp$Sydney
fun_data_S$md = with(fun_data_S, paste(Month, Day, sep="_"))
nas = which(is.na(fun_data_S$Days.of.accumulation.of.minimum.temperature))
fun_data_S = fun_data_S[-nas, ]
dataS = dcast(fun_data_S, formula = md~ Year, value.var = "Minimum.temperature..Degree.C.")[,-1]
D=21
basis = create.fourier.basis(rangeval = c(0, 1), nbasis = D)
mat.S = matrix(0, D, ncol(dataS))
for (i in 1:ncol(dataS)){
vec = dataS[,i][!(is.na(dataS[,i]))]
f_Obs = Data2fd(argvals=seq(0, 1, length = length(vec)) , vec, basisobj = basis)
mat.S[, i] = f_Obs$coefs
}
fdata_S = fd(mat.S, basis)
invisible(plot(fdata_S, main="Sydney temp Data"))
h_opt = round(opt_bandwidth(fdata_S, "BT", "PR")$hat_h_opt)
change_FF(fdata_S, h=h_opt, plot = T)$change
Conf_int(fdata_S, h=h_opt)
```
## Stock Price Data
Stock prices in one minute resolution are recorded for
* Bank of America
* Walmart
* Disney
* Chevron
* IBM
* Microsoft
* CocaCola
* Exon mobile
* McDonalds
* Microsoft.
The time period that the observations span is given in the column names of each stock data. During each day, stock price values were recorded in one-minute intervals from 9:30 AM to 4:00 PM EST. The columns display the day, and the rows display the time at which the stock price is recorded.
```{r, warning=FALSE, fig.width=11, fig.height=6, message=FALSE}
# transform first 100 days of Chevron data into functional data object
data = as.matrix(Stock$Chevron[,1:100])
# First need to transform the data to obtain the log returns
temp1=data
for(j in c(1:dim(data)[1])){
for(k in c(1:dim(data)[2])){
data[j,k]=100*(log(temp1[j,k])-log(temp1[1,k]))
}
}
# Transform the Log return data into functional data
#using 21 bspline basis functions on [0,1]
D=21
basis = create.bspline.basis(rangeval = c(0, 1), nbasis = D)
Domain = seq(0, 1, length = nrow(data))
f_data = Data2fd(argvals = Domain , data, basisobj = basis)
plot(f_data)
```
# References