-
Notifications
You must be signed in to change notification settings - Fork 3
/
hour.Rmd
147 lines (108 loc) · 4.78 KB
/
hour.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "Hour Distances and Daylight Savings"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Hour Distances and Daylight Savings}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(warp)
```
If using `period = "hour"`, it should work as expected at all times when using a time zone that doesn't have daylight savings, like UTC or EST. If using a time zone with DST, like America/New_York, some additional explanation is required, especially when `every > 1`.
## Spring Forward Gap
In America/New_York's time zone, as time was about to reach `1970-04-26 02:00:00`, daylight savings kicked in and time shifts forward 1 hour so that the next time is actually `1970-04-26 03:00:00`.
```{r}
before_dst <- as.POSIXct("1970-04-26 01:59:59", tz = "America/New_York")
before_dst
before_dst + 1
```
`warp_distance()` treats hours 1 and 3 as being side by side, since no hour 2 ever existed. This means that hours (0, 1) and (3, 4) get grouped together in the below example.
```{r}
x <- as.POSIXct("1970-04-26 00:00:00", tz = "America/New_York") + 3600 * 0:7
data.frame(
x = x,
hour = warp_distance(x, "hour", every = 2)
)
```
Because `period = "hour"` just computes the running number of 2 hour periods from the `origin`, this pattern carries forward into the next day to have a contiguous stream of values. This can be somewhat confusing, since hours 0 and 1 don't get grouped together on the 27th.
```{r}
y <- as.POSIXct("1970-04-26 22:00:00", tz = "America/New_York") + 3600 * 0:5
data.frame(
y = y,
hour = warp_distance(y, "hour", every = 2)
)
```
One way that you can sort of get around this is by using lubridate's `force_tz()` function to force a UTC time zone with the same clock time as your original date. I've mocked up a poor man's version of that function below.
```{r}
# Or call `lubridate::force_tz(x, "UTC")`
force_utc <- function(x) {
x_lt <- as.POSIXlt(x)
x_lt <- unclass(x_lt)
attributes(x) <- NULL
out <- x + x_lt$gmtoff
as.POSIXct(out, tz = "UTC", origin = "1970-01-01")
}
x_utc <- force_utc(x)
y_utc <- force_utc(y)
x_utc
```
In UTC, hour 2 exists so groups are created as (0, 1), (2, 3), and so on, even though hour 2 doesn't actually exist in America/New_York because of the DST gap. This has the affect of limiting the (2, 3) group to a maximum size of 1, since only hour 3 is possible in the data.
```{r}
data.frame(
x_utc = x_utc,
hour = warp_distance(x_utc, "hour", every = 2)
)
data.frame(
y_utc = y_utc,
hour = warp_distance(y_utc, "hour", every = 2)
)
```
## Fall Backwards Overlap
In America/New_York's time zone, as time was about to reach `1970-10-25 02:00:00`, daylight savings kicked in and time shifts backwards 1 hour so that the next time is actually `1970-10-25 01:00:00`. This means there are 2 full hours with an hour value of 1 in that day.
```{r}
before_fallback <- as.POSIXct("1970-10-25 01:00:00", tz = "America/New_York")
before_fallback
# add 1 hour of seconds
before_fallback + 3600
```
Because these are two distinct hours, `warp_distance()` treats them as such, so in the example below a group of (1 EDT, 1 EST) gets created. Since daylight savings is currently active, we also have the situation described above where hour 0 and hour 1 are not grouped together.
```{r}
x <- as.POSIXct("1970-10-25 00:00:00", tz = "America/New_York") + 3600 * 0:7
x
data.frame(
x = x,
hour = warp_distance(x, "hour", every = 2)
)
```
This fallback adjustment actually realigns hours 0 and 1 in the next day, since the 25th has 25 hours.
```{r}
y <- as.POSIXct("1970-10-25 22:00:00", tz = "America/New_York") + 3600 * 0:5
y
data.frame(
y = y,
hour = warp_distance(y, "hour", every = 2)
)
```
As before, one way to sort of avoid this is to force a UTC time zone.
```{r}
x_utc <- force_utc(x)
x_utc
```
The consequences of this are that you have two dates with an hour value of 1. When forced to UTC, these look identical. The groups are as you probably expect with buckets of hours (0, 1), (2, 3), and so on, but now the two dates with hour values of 1 are identical so they fall in the same hour group.
```{r}
data.frame(
x_utc = x_utc,
hour = warp_distance(x_utc, "hour", every = 2)
)
```
## Conclusion
While the implementation of `period = "hour"` is _technically_ correct, I recognize that it isn't the most intuitive operation. More intuitive would be a period value of `"dhour"`, which would correspond to the "hour of the day". This would count the number of hour groups from the origin, like `"hour"` does, but it would reset the `every`-hour counter every time you enter a new day. However, this has proved to be challenging to code up, but I hope to incorporate this eventually.