-
Notifications
You must be signed in to change notification settings - Fork 98
/
TK00_Time_Series_Coercion.Rmd
422 lines (289 loc) · 15.2 KB
/
TK00_Time_Series_Coercion.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
---
title: "Time Series Class Conversion"
subtitle: "Between ts, xts, zoo, and tbl"
author: "Matt Dancho"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 2
vignette: >
%\VignetteIndexEntry{Time Series Class Conversion}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo = FALSE, message = FALSE, warning = FALSE}
knitr::opts_chunk$set(
# message = FALSE,
# warning = FALSE,
fig.width = 8,
fig.height = 4.5,
fig.align = 'center',
out.width='95%',
dpi = 100
)
# devtools::load_all() # Travis CI fails on load_all()
```
This vignette covers __time series class conversion__ to and from the many time series classes in R including the general data frame (or tibble) and the various time series classes (`xts`, `zoo`, and `ts`).
# Introduction
The time series landscape in R is vast, deep, and complex causing many inconsistencies in data attributes and formats ultimately making it difficult to coerce between the different data structures. The `zoo` and `xts` packages solved a number of the issues in dealing with the various classes (`ts`, `zoo`, `xts`, `irts`, `msts`, and the list goes on...). However, because these packages deal in classes other than data frame, the issues with conversion between `tbl` and other time series object classes are still present.
The `timetk` package provides tools that solve the issues with conversion, maximizing attribute extensibility (the required data attributes are retained during the conversion to each of the primary time series classes). The following tools are available to coerce and retrieve key information:
* __Conversion functions__: `tk_tbl`, `tk_ts`, `tk_xts`, `tk_zoo`, and `tk_zooreg`. These functions coerce time-based tibbles `tbl` to and from each of the main time-series data types `xts`, `zoo`, `zooreg`, `ts`, maintaining the time-based index.
* __Index function__: `tk_index` returns the index. When the argument, `timetk_idx = TRUE`, A time-based index (non-regularized index) of `forecast` objects, models, and `ts` objects is returned if present. Refer to `tk_ts()` to learn about non-regularized index persistence during the conversion process.
This vignette includes a brief case study on conversion issues and then a detailed explanation of `timetk` function conversion between time-based `tbl` objects and several primary time series classes (`xts`, `zoo`, `zooreg` and `ts`).
# Prerequisites
Before we get started, load the following packages.
```{r, message=FALSE, warning = FALSE}
library(dplyr)
library(timetk)
```
# Data
We'll use the "Q10" dataset - The first ID from a sample a quarterly datasets (see `m4_quarterly`) from the [M4 Competition](https://mofc.unic.ac.cy/m4/). The return structure is a `tibble`, which is not conducive to many of the popular time series analysis packages including `quantmod`, `TTR`, `forecast` and many others.
```{r}
q10_quarterly <- m4_quarterly %>% dplyr::filter(id == "Q10")
q10_quarterly
```
# Case Study: Conversion issues with ts()
The `ts` object class has roots in the `stats` package and many popular packages use this time series data structure including the popular `forecast` package. With that said, the `ts` data structure is the most difficult to coerce back and forth because by default it does not contain a time-based index. Rather it uses a regularized index computed using the `start` and `frequency` arguments. Conversion to `ts` is done using the `ts()` function from the `stats` library, which results in various problems.
## Problems
First, only numeric columns get coerced. If the user forgets to add the `[,"pct"]` to drop the "date" column, `ts()` returns dates in numeric format which is not what the user wants.
```{r}
# date column gets coerced to numeric
ts(q10_quarterly, start = c(2000, 1), freq = 4) %>%
head()
```
The correct method is to call the specific column desired. However, this presents a new issue. The date index is lost, and a different "regularized" index is built using the `start` and `frequency` attributes.
```{r}
q10_quarterly_ts <- ts(q10_quarterly$value, start = c(2000, 1), freq = 4)
q10_quarterly_ts
```
We can see from the structure (using the `str()` function) that the regularized time series is present, but there is no date index retained.
```{r}
# No date index attribute
str(q10_quarterly_ts)
```
We can get the index using the `index()` function from the `zoo` package. The index retained is a regular sequence of numeric values. In many cases, the regularized values cannot be coerced back to the original time-base because the date and date time data contains significantly more information (i.e. year-month-day, hour-minute-second, and timezone attributes) and the data may not be on a regularized interval (frequency).
```{r}
# Regularized numeric sequence
zoo::index(q10_quarterly_ts)
```
## Solution
The `timetk` package contains a new function, `tk_ts()`, that enables maintaining the original date index as an attribute. When we repeat the `tbl` to `ts` conversion process using the new function, `tk_ts()`, we can see a few differences.
First, only numeric columns get coerced, which prevents unintended consequences due to R conversion rules (e.g. dates getting unintentionally converted or characters causing the homogeneous data structure converting all numeric values to character). If a column is dropped, the user gets a warning.
```{r}
# date automatically dropped and user is warned
q10_quarterly_ts_timetk <- tk_ts(q10_quarterly, start = 2000, freq = 4)
q10_quarterly_ts_timetk
```
Second, the data returned has a few additional attributes. The most important of which is a numeric attribute, "index", which contains the original date information as a number. The `ts()` function will not preserve this index while `tk_ts()` will preserve the index in numeric form along with the time zone and class.
```{r}
# More attributes including time index, time class, time zone
str(q10_quarterly_ts_timetk)
```
## Advantages of conversion with tk_tbl()
Since we used the `tk_ts()` during conversion, we can extract the original index in date format using `tk_index(timetk_idx = TRUE)` (the default is `timetk_idx = FALSE` which returns the default regularized index).
```{r}
# Can now retrieve the original date index
timetk_index <- q10_quarterly_ts_timetk %>%
tk_index(timetk_idx = TRUE)
head(timetk_index)
class(timetk_index)
```
Next, the `tk_tbl()` function has an argument `timetk_idx` also which can be used to select which index to return. First, we show conversion using the default index. Notice that the index returned is "regularized" meaning its actually a numeric index rather than a time-based index.
```{r}
# Conversion back to tibble using the default index (regularized)
q10_quarterly_ts_timetk %>%
tk_tbl(index_rename = "date", timetk_idx = FALSE)
```
We can now get the original date index using the `tk_tbl()` argument `timetk_idx = TRUE`.
```{r}
# Conversion back to tibble now using the timetk index (date / date-time)
q10_quarterly_timetk <- q10_quarterly_ts_timetk %>%
tk_tbl(timetk_idx = TRUE) %>%
rename(date = index)
q10_quarterly_timetk
```
We can see that in this case (and in most cases) you can get the same data frame you began with.
```{r}
# Comparing the coerced tibble with the original tibble
identical(q10_quarterly_timetk, q10_quarterly %>% select(-id))
```
# Conversion Methods
Using the `q10_quarterly`, we'll go through the various conversion methods using `tk_tbl`, `tk_xts`, `tk_zoo`, `tk_zooreg`, and `tk_ts`.
## From tbl
The starting point is the `q10_quarterly`. We will coerce this into `xts`, `zoo`, `zooreg` and `ts` classes.
```{r}
# Start:
q10_quarterly
```
### to xts
Use `tk_xts()`. By default "date" is used as the date index and the "date" column is dropped from the output. Only numeric columns are coerced to avoid unintentional conversion issues.
```{r}
# End
q10_quarterly_xts <- tk_xts(q10_quarterly)
head(q10_quarterly_xts)
```
Use the `select` argument to specify which columns to drop. Use the `date_var` argument to specify which column to use as the date index. Notice the message and warning are no longer present.
```{r}
# End - Using `select` and `date_var` args
tk_xts(q10_quarterly, select = -(id:date), date_var = date) %>%
head()
```
Also, as an alternative, we can set `silent = TRUE` to bypass the warnings since the default dropping of the "date" column is what is desired. Notice no warnings or messages.
```{r}
# End - Using `silent` to silence warnings
tk_xts(q10_quarterly, silent = TRUE) %>%
head()
```
### to zoo
Use `tk_zoo()`. Same as when coercing to xts, the non-numeric "date" column is automatically dropped and the index is automatically selected as the date column.
```{r}
# End
q10_quarterly_zoo <- tk_zoo(q10_quarterly, silent = TRUE)
head(q10_quarterly_zoo)
```
### to zooreg
Use `tk_zooreg()`. Same as when coercing to xts, the non-numeric "date" column is automatically dropped. The regularized index is built from the function arguments `start` and `freq`.
```{r}
# End
q10_quarterly_zooreg <- tk_zooreg(q10_quarterly, start = 2000, freq = 4, silent = TRUE)
head(q10_quarterly_zooreg)
```
The original time-based index is retained and can be accessed using `tk_index(timetk_idx = TRUE)`.
```{r}
# Retrieve original time-based index
tk_index(q10_quarterly_zooreg, timetk_idx = TRUE) %>%
str()
```
### to ts
Use `tk_ts()`. The non-numeric "date" column is automatically dropped. The regularized index is built from the function arguments.
```{r}
# End
q10_quarterly_ts <- tk_ts(q10_quarterly, start = 2000, freq = 4, silent = TRUE)
q10_quarterly_ts
```
The original time-based index is retained and can be accessed using `tk_index(timetk_idx = TRUE)`.
```{r}
# Retrieve original time-based index
tk_index(q10_quarterly_ts, timetk_idx = TRUE) %>%
str()
```
## To tbl
Going back to tibble is just as easy using `tk_tbl()`.
### From xts
```{r}
# Start
head(q10_quarterly_xts)
```
Notice no loss of data going back to `tbl`.
```{r}
# End
tk_tbl(q10_quarterly_xts)
```
### From zoo
```{r}
# Start
head(q10_quarterly_zoo)
```
Notice no loss of data going back to `tbl`.
```{r}
# End
tk_tbl(q10_quarterly_zoo)
```
### From zooreg
```{r}
# Start
head(q10_quarterly_zooreg)
```
Notice that the index is a regularized numeric sequence by default.
```{r}
# End - with default regularized index
tk_tbl(q10_quarterly_zooreg)
```
With `timetk_idx = TRUE` the index is the original date sequence. The result is the original `tbl` that we started with!
```{r}
# End - with timetk index that is the same as original time-based index
tk_tbl(q10_quarterly_zooreg, timetk_idx = TRUE)
```
### From ts
```{r}
# Start
q10_quarterly_ts
```
Notice that the index is a regularized numeric sequence by default.
```{r}
# End - with default regularized index
tk_tbl(q10_quarterly_ts)
```
With `timetk_idx = TRUE` the index is the original date sequence. The result is the original `tbl` that we started with!
```{r}
# End - with timetk index
tk_tbl(q10_quarterly_ts, timetk_idx = TRUE)
```
# Testing if an object has a timetk index
The function `has_timetk_idx()` can be used to test whether toggling the `timetk_idx` argument in the `tk_index()` and `tk_tbl()` functions will have an effect on the output. Here are several examples using the ten year treasury data used in the case study:
## tk_ts()
The `tk_ts()` function returns an object with the "timetk index" attribute.
```{r}
# Data coerced with tk_ts() has timetk index
has_timetk_idx(q10_quarterly_ts)
```
If we toggle `timetk_idx = TRUE` when retrieving the index with `tk_index()`, we get the index of dates rather than the regularized time series.
```{r}
tk_index(q10_quarterly_ts, timetk_idx = TRUE)
```
If we toggle `timetk_idx = TRUE` during conversion to `tbl` using `tk_tbl()`, we get the index of dates rather than the regularized index in the returned `tbl`.
```{r}
tk_tbl(q10_quarterly_ts, timetk_idx = TRUE)
```
## Testing other data types
The `timetk_idx` argument will only have an effect on objects that use regularized time series. Therefore, `has_timetk_idx()` returns `FALSE` for other object types (e.g. `tbl`, `xts`, `zoo`) since toggling the argument has no effect on these classes.
```{r}
has_timetk_idx(q10_quarterly_xts)
```
Toggling the `timetk_idx` argument has no effect on the output. Output with `timetk_idx = TRUE` is the same as with `timetk_idx = FALSE`.
```{r}
tk_index(q10_quarterly_xts, timetk_idx = TRUE)
```
# Working with zoo::yearmon and zoo::yearqtr index
The `zoo` package has the `yearmon` and `yearqtr` classes for working with regularized monthly and quarterly data, respectively. The "timetk index" tracks the format during conversion. Here's and example with `yearqtr`.
```{r}
yearqtr_tbl <- q10_quarterly %>%
mutate(date = zoo::as.yearqtr(date))
yearqtr_tbl
```
We can coerce to `xts` and the `yearqtr` class is intact.
```{r}
yearqtr_xts <- tk_xts(yearqtr_tbl)
yearqtr_xts %>% head()
```
We can coerce to `ts` and, although the "timetk index" is hidden, the `yearqtr` class is intact.
```{r}
yearqtr_ts <- tk_ts(yearqtr_xts, start = 1997, freq = 4)
yearqtr_ts %>% head()
```
Coercing from `ts` to `tbl` using `timetk_idx = TRUE` shows that the original index was maintained through each of the conversion steps.
```{r}
yearqtr_ts %>% tk_tbl(timetk_idx = TRUE)
```
# Learning More
<p>
<iframe width="100%" height="450" src="https://www.youtube.com/embed/elQb4VzRINg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" style="box-shadow: 0 0 5px 2px rgba(0, 0, 0, .5);"><span id="selection-marker-1" class="redactor-selection-marker"></span><span id="selection-marker-1" class="redactor-selection-marker"></span><span id="selection-marker-1" class="redactor-selection-marker"></span><span id="selection-marker-1" class="redactor-selection-marker"></span>
</iframe>
</p>
_My Talk on High-Performance Time Series Forecasting_
Time series is changing. __Businesses now need 10,000+ time series forecasts every day.__
__High-Performance Forecasting Systems will save companies MILLIONS of dollars.__ Imagine what will happen to your career if you can provide your organization a "High-Performance Time Series Forecasting System" (HPTSF System).
I teach how to build a HPTFS System in my __High-Performance Time Series Forecasting Course__. If interested in learning Scalable High-Performance Forecasting Strategies then [take my course](https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting). You will learn:
- Time Series Machine Learning (cutting-edge) with `Modeltime` - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
- NEW - Deep Learning with `GluonTS` (Competition Winners)
- Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter Tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- Scalable Forecasting - Forecast 1000+ time series in parallel
- and more.
<p class="text-center" style="font-size:30px;">
<a href="https://university.business-science.io/p/ds4b-203-r-high-performance-time-series-forecasting">Unlock the High-Performance Time Series Forecasting Course</a>
</p>