# Reading data from Excel: The `readxl` package
R can read in data from Excel with the `readxl` package.

The command belows reads a Excel file into an R object:

In [61]:
library(readxl)

In [62]:
runaways <- read_excel("./data/runaways_exampledata.xlsx")

The data is now loaded into R as a dataframe:

In [63]:
dim(runaways)
head(runaways)

name,id,registered,escaped,returned
Peter vesmand,590,21/03/1712,29/10/1716,03/11/1716
Thomas Petersen,591,21/03/1712,17/10/1713,26/10/1713
Niels Jensen Skaaning,592,31/03/1712,25/08/1716,02/09/1716
Niels Jensen Skaaning,592,31/03/1712,21/09/1716,13/11/1716
[ ] Jens Sönner,594,26/04/1712,00/00/1716,
Magnus Bendixsen / Mogens,599,08/03/1713,09/07/1719,11/11/1720


What are the "registered", "escaped" and "returned" columns showing? What would be interesting to do with them?

In [64]:
head(runaways)

name,id,registered,escaped,returned
Peter vesmand,590,21/03/1712,29/10/1716,03/11/1716
Thomas Petersen,591,21/03/1712,17/10/1713,26/10/1713
Niels Jensen Skaaning,592,31/03/1712,25/08/1716,02/09/1716
Niels Jensen Skaaning,592,31/03/1712,21/09/1716,13/11/1716
[ ] Jens Sönner,594,26/04/1712,00/00/1716,
Magnus Bendixsen / Mogens,599,08/03/1713,09/07/1719,11/11/1720


In [65]:
class(runaways$registered)

In [66]:
# Stored as character - R does not now how to evaluate!
runaways$escaped[1] - runaways$registered[1]

ERROR: Error in runaways$escaped[1] - runaways$registered[1]: non-numeric argument to binary operator


# BREAK?

![lion_chill](https://i.pinimg.com/736x/66/70/75/6670750ccf134bb4d0de4eb726a396e2.jpg)

# Working with dates: The `lubridate` package
We have worked with numeric and text classes before but R also has a `date` class. 

The base R functionality of working with dates can be a bit tricky but `lubridate` makes it very simple!

*Install and load the `lubridate` package.*

In [67]:
library(lubridate)


Attaching package: 'lubridate'

The following object is masked from 'package:base':

    date



In [68]:
# Some example dates - all stored as character
date1 <- "29 aug 1876"
date2 <- "1770-11-26"
date3 <- "12.26.1810"

lapply(c(date1, date2, date3), class)

Converting to dates with `lubridate` is very simple! You just need to know the order of the information in the date (year, month, date).

The main function for converting is `ymd()` (short for year-month-date). This will take a character class object and convert to a date.

A function is there for each combination of year-month-date; meaning you just have to shuffle the letters around to fit the format:

In [69]:
date1 <- dmy(date1)
date2 <- ymd(date2)
date3 <- mdy(date3)

lapply(c(date1, date2, date3), class)

When stored as dates, it is easy to extract components with commands as `year()`, `month()`, `day()`:

In [70]:
print(date1)
year(date1)
month(date1)
day(date1)
wday(date1, label = TRUE, locale = "English")

[1] "1876-08-29"


Date objects allow us to calculate time differences:

In [71]:
date1 - date2

Time difference of 38627 days

Calculating differences between two dates creates a `difftime` object by default. `difftime` objects are more useful for shorter differences. For longer time differences, it is more useful to work with `interval()`.

With intervals, we can ask the number of days, years, months in the interval with `as.period()`.

It is also possible to coerce directly to a numeric object with the specified units with `as.numeric()`.

In [72]:
# Create time difference as interval
time_int <- interval(date2, date1)

# Display time differences with different units
as.period(time_int, unit = "days")
as.period(time_int, unit = "months")
as.period(time_int, unit = "years")

# Numeric coercion
as.numeric(time_int, "years")

Because `lubridate` is a part of the tidyverse, the functions supports vectors or vector-like objects as well!

In [73]:
dates <- c("1876-12-21", "1873-11-01", "1885-01-30", "1842-06-10")
dates <- ymd(dates)
dates

date_ints <- interval(ymd("1800-01-01"), dates)
as.numeric(date_ints, "years")

# EXERCISE 6: WORKING WITH DATES

Make sure you have the runaways data loaded.

1. Convert the columns containing dates to date formats using the proper variation of `ymd()`
2. Create a new column calculating the time difference in *days* between `registered` and `escaped`. Use `interval()` and `as.numeric()`
3. Determine the shortest and longest stays (you can use the `arrange()` command)

# Saving files with `readr`

R can already save files in some other formats but the `readr` package is often more intuitive to use.

Check your directory with `getwd`. Directory can be changed with `setwd`.

We can save an excel .csv-file (comma-separated values) with `readr` using `write_excel_csv`:

In [77]:
library(readr)
write_excel_csv(runaways, path = "my_data.csv", delim = ";", col_names = TRUE)

**Code breakdown:**

| Code | Description |
|:-----|:------------|
|`data` | The object we want to save |
|`path = "my_data.csv"` | The filename - .csv for comma-separated values |
|`delim = ";"` | Setting the separator between values to be commas |
|`col_names = TRUE` | Specifying that data contains column names |