Reference: [R for Data Science: Dates and Times](https://r4ds.had.co.nz/dates-and-times.html#dates-and-times)

In [1]:
library(lubridate)

"package 'lubridate' was built under R version 3.6.3"
Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union



# 1. Introduction

# 2. Creating Dates/Times

There are three types of date/time data that refer to an instant in time:

- A date. Tibbles print this as `<date>`.

- A time within a day. Tibbles print this as `<time>`.

- A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as `<dttm>`. Elsewhere in R these are called `POSIXct`, but I don’t think that’s a very useful name.

You should always use the simplest possible data type that works for your needs. That means if you can use a date instead of a date-time, you should. Date-times are substantially more complicated because of the need to handle time zones, which we’ll come back to at the end of the chapter.

In [3]:
# current date
today()

# current datetime

now()

[1] "2020-12-01 14:06:26 +07"

Otherwise, there are three ways you’re likely to create a date/time:

- From a string.
- From individual date-time components.
- From an existing date/time object.

### 2.1. From strings

Parsing strings to Date/Date-times

In [10]:
# Parsing strings to Date -------------------
ymd('2001/10/06')

# using unquote number
ymd(20011006)

# You can also force the creation of a date-time from a date by supplying a timezone
ymd(20170131, tz = "UTC")

[1] "2017-01-31 UTC"

In [11]:
# Parsing strings to Date-time -------------
ymd_hms('2001/10/06 20:31:29')


[1] "2001-10-06 20:31:29 UTC"

### 2.2. From individual components

To create a date/time from this sort of input, use `make_date()` for dates, or `make_datetime()` for date-times:

In [17]:
# all arguments are optional
args(make_date)

make_date(2001, 10, 06)

make_date(month = 10, day = 6, year = 2001)

In [18]:
# all arguments are optional

args(make_datetime)

make_datetime(2001, 10, 06,  20, 31, 29)

make_datetime(hour = 20, min = 31, sec = 29, day = 6, month = 10, year = 2001)

[1] "2001-10-06 20:31:29 UTC"

[1] "2001-10-06 20:31:29 UTC"

In [15]:
args(make_datetime)

### 2.3. From other types

You may want to switch between a date-time and a date. That’s the job of `as_datetime()` and `as_date()`:

In [25]:
as_date(now())

# 1 day since 1/1/1970
as_date(1)

as_datetime(today())

# 1 second since 1/1/1970 00:00:00
as_datetime(1)

[1] "2020-12-01 UTC"

[1] "1970-01-01 00:00:01 UTC"

# 3. Date-time components

Accessor functions that let you get and set individual components. 

### 3.1 Getting components

You can pull out individual parts of the date with the accessor functions `year()`, `month()`, `mday()` (day of the month), `yday()` (day of the year), `wday()` (day of the week), `hour()`, `minute()`, and `second()`.

In [29]:
datetime <- ymd_hms("2016-07-08 12:34:56")

In [30]:
year(datetime)

In [32]:
semester(datetime)
semester(datetime, with_year = T)

In [33]:
quarter(datetime)
quarter(datetime, with_year = T)

In [34]:
# show month as number
month(datetime)

# show abbriviation month name
month(datetime, label = T)

# show full month name
month(datetime, label = T, abbr = F)

In [35]:
week(datetime)

In [37]:
# day of month
day(datetime)

# day of week (number)
wday(datetime)

# day of week(as abbreviated name)
wday(datetime, label = T)
# day of week (full name)
wday(datetime, label = T, abbr = F)

In [40]:
hour(datetime)

minute(datetime)

second(datetime)

tz(datetime)

### 3.2. Rounding

An alternative approach to plotting individual components is to round the date to a nearby unit of time, with `floor_date()`, `round_date()`, and `ceiling_date()`. Each function takes a vector of dates to adjust and then the name of the unit round down (floor), round up (ceiling), or round to. 

### 3.3 Setting components

You can also use each accessor function to set the components of a date/time:

In [42]:
(datetime <- ymd_hms("2016-07-08 12:34:56"))

year(datetime) <- 2020

datetime

[1] "2016-07-08 12:34:56 UTC"

[1] "2020-07-08 12:34:56 UTC"

Alternatively, rather than modifying in place, you can create a new date-time with `update()`. This also allows you to set multiple values at once.

In [43]:
update(datetime, year = 2001, month = 10, day = 6)

[1] "2001-10-06 12:34:56 UTC"

# 4. Timespan

Next you’ll learn about how arithmetic with dates works, including subtraction, addition, and division. Along the way, you’ll learn about three important classes that represent time spans:

- **durations**, which represent an exact number of seconds.
- **periods**, which represent human units like weeks and months.
- **intervals**, which represent a starting and ending point.

### 4.1 Duration

In R, when you subtract two dates, you get a difftime object:

In [47]:
(my_age <- today() - ymd(011006))

Time difference of 6996 days

A difftime class object records a time span of seconds, minutes, hours, days, or weeks. This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the **duration**m.

In [49]:
as.duration(my_age)

Durations come with a bunch of convenient constructors:

In [51]:
dyears(1)

dmonths(1)

dweeks(1)

ddays(1)

dminutes(1)

or a general constructors:

In [57]:
duration(2, 'years')

duration(1, 'month')

duration(3, 'weeks')

duration(30, 'seconds')

duration(days = 30, hours =  12)

Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds at the standard rate (60 seconds in a minute, 60 minutes in an hour, 24 hours in day, 7 days in a week, 365 days in a year).

You can add and multiply durations:

In [58]:
2 * dyears(1)

ddays(1) + dhours(1)

You can add and subtract durations to and from days:

In [60]:
today() + ddays(1)

today() + dyears(1)

[1] "2021-12-01 06:00:00 UTC"

However, because durations represent an exact number of seconds, sometimes you might get an unexpected result:

In [61]:
one_pm <- ymd_hms("2016-03-12 13:00:00", tz = "America/New_York")

one_pm

one_pm + ddays(1)

[1] "2016-03-12 13:00:00 EST"

[1] "2016-03-13 14:00:00 EDT"

Why is one day after 1pm on March 12, 2pm on March 13?! If you look carefully at the date you might also notice that the time zones have changed. Because of DST, March 12 only has 23 hours, so if we add a full days worth of seconds we end up with a different time.

To solve this problem, lubridate provides **periods**. Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months. That allows them to work in a more intuitive way:

### 4.2 Period

In [63]:
one_pm

one_pm + days(1)

[1] "2016-03-12 13:00:00 EST"

[1] "2016-03-13 13:00:00 EDT"

Like durations, periods can be created with a number of friendly constructor functions or general constructor:

In [65]:
days(1)

years(3)

period(3, 'years')

period(year = 3, month = 6)

You can add and multiply periods:

In [68]:
10 * (months(6) + days(1))

days(50) + hours(25) + minutes(2)

And of course, add them to dates. Compared to durations, periods are more likely to do what you expect:

In [70]:
# duration

ymd(011006) + dyears(1)

# period
ymd(011006) + years(1)

[1] "2002-10-06 06:00:00 UTC"

### 4.3 Intervals

An interval is a duration with a starting point

In [72]:
# interval since i was born till today
ymd(011006) %--% today()

It’s obvious what `dyears(1) / ddays(365)` should return: one, because durations are always represented by a number of seconds, and a duration of a year is defined as 365 days worth of seconds.

What should `years(1) / days(1)` return? Well, if the year was 2015 it should return 365, but if it was 2016, it should return 366! There’s not quite enough information for lubridate to give a single clear answer. What it does instead is give an estimate, with a warning:

In [74]:
years(1) / days(1)

If you want a more accurate measurement, you’ll have to use an interval. An interval is a duration with a starting point: that makes it precise so you can determine exactly how long it is:

In [75]:
next_year <- today() + years(1)
(today() %--% next_year) / ddays(1)


To find out how many periods fall into an interval, you need to use integer division:

In [76]:
(today() %--% next_year) %/% days(1)

### 4.4 Summary

How do you pick between duration, periods, and intervals? As always, pick the simplest data structure that solves your problem. If you only care about physical time, use a duration; if you need to add human times, use a period; if you need to figure out how long a span is in human units, use an interval.

Figure 16.1 summarises permitted arithmetic operations between the different data types.  
![](https://d33wubrfki0l68.cloudfront.net/0020136325ea844476bc61eb7e95d2ac5aeebf00/893e9/diagrams/datetimes-arithmetic.png)

# 5. Timezones

Time zones are an enormously complicated topic because of their interaction with geopolitical entities. Fortunately we don’t need to dig into all the details as they’re not all important for data analysis, but there are a few challenges we’ll need to tackle head on.

The first challenge is that everyday names of time zones tend to be ambiguous. For example, if you’re American you’re probably familiar with EST, or Eastern Standard Time. However, both Australia and Canada also have EST! To avoid confusion, R uses the international standard IANA time zones. These use a consistent naming scheme “/”, typically in the form “<continent>/<city>” (there are a few exceptions because not every country lies on a continent). Examples include “America/New_York”, “Europe/Paris”, and “Pacific/Auckland”.

You might wonder why the time zone uses a city, when typically you think of time zones as associated with a country or region within a country. This is because the IANA database has to record decades worth of time zone rules. In the course of decades, countries change names (or break apart) fairly frequently, but city names tend to stay the same. Another problem is that name needs to reflect not only to the current behaviour, but also the complete history. For example, there are time zones for both “America/New_York” and “America/Detroit”. These cities both currently use Eastern Standard Time but in 1969-1972 Michigan (the state in which Detroit is located), did not follow DST, so it needs a different name. It’s worth reading the raw time zone database (available at http://www.iana.org/time-zones) just to read some of these stories!

You can find out what R thinks your current time zone is with `Sys.timezone()`:

In [77]:
Sys.timezone()

(If R doesn’t know, you’ll get an NA.)

And see the complete list of all time zone names with `OlsonNames()`:

In [79]:
OlsonNames()

In R, the time zone is an attribute of the date-time that only controls printing. For example, these three objects represent the same instant in time:

In [80]:
(x1 <- ymd_hms("2015-06-01 12:00:00", tz = "America/New_York"))
(x2 <- ymd_hms("2015-06-01 18:00:00", tz = "Europe/Copenhagen"))
(x3 <- ymd_hms("2015-06-02 04:00:00", tz = "Pacific/Auckland"))


[1] "2015-06-01 12:00:00 EDT"

[1] "2015-06-01 18:00:00 CEST"

[1] "2015-06-02 04:00:00 NZST"

You can verify that they’re the same time using subtraction:

In [81]:
x2 - x1

x3 - x2

Time difference of 0 secs

Time difference of 0 secs

Unless otherwise specified, lubridate always uses UTC. UTC (Coordinated Universal Time) is the standard time zone used by the scientific community and roughly equivalent to its predecessor GMT (Greenwich Mean Time). It does not have DST, which makes a convenient representation for computation. Operations that combine date-times, like `c()`, will often drop the time zone. In that case, the date-times will display in your local time zone:

In [82]:
c(x1, x2, x3)

[1] "2015-06-01 12:00:00 EDT" "2015-06-01 12:00:00 EDT"
[3] "2015-06-01 12:00:00 EDT"

You can change the time zone in two ways:

- Keep the instant in time the same, and change how it’s displayed. Use this when the instant is correct, but you want a more natural display.

In [84]:
# what time is it now in London?

with_tz(now(), tz = 'Europe/London')

[1] "2020-12-01 07:42:18 GMT"

- Change the underlying instant in time. Use this when you have an instant that has been labelled with the incorrect time zone, and you need to fix it.

In [85]:
instant_time <- now() 

instant_time - force_tz(instant_time, 'Europe/London')

Time difference of -7 hours