# Section 05: Utilities
### `01-Mathematical utilities`

- `abs()`: Calculate the absolute value.
- `sum()`: Calculate the sum of all the values in a data structure.
- `mean()`: Calculate the arithmetic mean.
- `round()`: Round the values to 0 decimal places by default. Try out ?round in the console for variations of round() and ways to change the number of digits to round to.


Calculate the sum of the absolute rounded values of the training errors. You can work in parts, or with a single one-liner. There's no need to store the result in a variable, just have R print it.




In [1]:
# The errors vector has already been defined for you
errors <- c(1.9, -2.6, 4.0, -9.5, -3.4, 7.3)

# Sum of absolute rounded values of errors
sum(abs(round(errors)))

### `02-Find the error`





- Fix the error by including code on the last line. Remember: you want to call `mean()` only once!




In [2]:
# Don't edit these two lines
vec1 <- c(1.5, 2.5, 8.4, 3.7, 6.3)
vec2 <- rev(vec1)

# Fix the error
mean(c(abs(vec1), abs(vec2)))

### `03-Data Utilities`
- `seq()`: Generate sequences, by specifying the from, to, and by arguments.
- `rep()`: Replicate elements of vectors and lists.
- `sort()`: Sort a vector in ascending order. Works on numerics, but also on character strings and logicals.
- `rev()`: Reverse the elements in a data structures for which reversal is defined.
- `str()`: Display the structure of any R object.
- `append()`: Merge vectors or lists.
- `is.*()`: Check for the class of an R object.
- `as.*()`: Convert an R object from one class to another.
- `unlist()`: Flatten (possibly embedded) lists to produce a vector.


#### `Question :`
- Convert both `linkedin` and `facebook` lists to a vector, and store them as `li_vec` and `fb_vec` respectively.
- Next, append `fb_vec` to the `li_vec` (Facebook data comes last). Save the result as `social_vec`.
- Finally, sort `social_vec` from high to low. Print the resulting vector.

In [3]:
# The linkedin and facebook lists have already been created for you
linkedin <- list(16, 9, 13, 5, 2, 17, 14)
facebook <- list(17, 7, 5, 16, 8, 13, 14)

# Convert linkedin and facebook to a vector: li_vec and fb_vec
li_vec <- unlist(linkedin)
fb_vec <- unlist(facebook)
# Append fb_vec to li_vec: social_vec
social_vec <- append(li_vec, fb_vec)

# Sort social_vec
sort(social_vec, decreasing = TRUE)

### `04-Find the error (2)`
- Correct the expression. Make sure that your fix still uses the functions `rep()` and `seq()`.




In [4]:
# Fix me
rep(seq(1, 7, by = 2), times = 7)

### `05-Beat Gauss using R`

- Using the function `seq()`, create a sequence that ranges from 1 to 500 in increments of 3. Assign the resulting vector to a variable `seq1`.
- Again with the function `seq()`, create a sequence that ranges from 1200 to 900 in increments of -7. Assign it to a variable `seq2`.
- Calculate the total sum of the sequences, either by using the `sum()` function twice and adding the two results, or by first concatenating the sequences and then using the `sum()` function once. Print the result to the console.

In [5]:
# Create first sequence: seq1
seq1 <- seq(1, 500, by = 3)

# Create second sequence: seq2
seq2 <- seq(1200, 900, by = -7)

# Calculate total sum of the sequences
sum(seq1, seq2)

### `06-grepl & grep`

- Use `grepl()` to generate a vector of logicals that indicates whether these email addresses contain `"edu"`. Print the result to the output.
- Do the same thing with `grep()`, but this time save the resulting indexes in a variable `hits`.
- Use the variable `hits` to select from the `emails` vector only the emails that contain `"edu"`.

In [6]:
# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use grepl() to match for "edu"
grepl(pattern = "edu", x = emails)

# Use grep() to match for "edu", save result to hits
hits <- grep(pattern = "edu", x = emails)

# Subset emails using hits
emails[hits]

### `07-grepl & grep (2)`

You can use the caret, `^`, and the dollar sign,`$` to match the content located in the start and end of a string, respectively. This could take us one step closer to a correct pattern for matching only the ".edu" email addresses from our list of emails. But there's more that can be added to make the pattern more robust:

- `@`, because a valid email must contain an at-sign.
- `.*`, which matches any character (.) zero or more times (*). Both the dot and the asterisk are metacharacters. You can use them to match any character between the at-sign and the ".edu" portion of an email address.
- `\\.edu$`, to match the ".edu" part of the email at the end of the string. The `\\` part escapes the dot: it tells R that you want to use the `.` as an actual character.

#### `Questions :`
- Use `grepl()` with the more advanced regular expression to return a logical vector. Simply print the result.
- Do a similar thing with `grep()` to create a vector of indices. Store the result in the variable `hits`.
- Use `emails[hits]` again to subset the emails vector.

In [7]:
# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use grepl() to match for .edu addresses more robustly
grepl(pattern = "@.*\\.edu$", x = emails)

# Use grep() to match for .edu addresses more robustly, save result to hits
hits <- grep(pattern = "@.*\\.edu$", x = emails)

# Subset emails using hits
emails[hits]

### `08-sub & gsub`
- With the advanced regular expression `"@.*\\.edu$"`, use `sub()` to replace the match with `"@datacamp.edu"`. Since there will only be one match per character string, `gsub()` is not necessary here. Inspect the resulting output.

In [8]:
# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "global@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use sub() to convert the email domains to datacamp.edu
sub(pattern = "@.*\\.edu$",replacement = "@datacamp.edu", x = emails)

### `09-sub & gsub (2)`
- `.*`: A usual suspect! It can be read as "any character that is matched zero or more times".
- `\\s`: Match a space. The "s" is normally a character, escaping it (`\\`) makes it a metacharacter.
- `[0-9]+`: Match the numbers 0 to 9, at least once (+).
- `([0-9]+)`: The parentheses are used to make parts of the matching string available to define the replacement. The `\\1` in the replacement argument of sub() gets set to the string that is captured by the regular expression `[0-9]+`.


In [9]:
awards <- c("Won 1 Oscar.",
  "Won 1 Oscar. Another 9 wins & 24 nominations.",
  "1 win and 2 nominations.",
  "2 wins & 3 nominations.",
  "Nominated for 2 Golden Globes. 1 more win & 2 nominations.",
  "4 wins & 1 nomination.")

sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards)

### `10-Right here, right now`

- Ask R for the current date, and store the result in a variable `today`.
- To see what `today` looks like under the hood, call `unclass()` on it.
- Ask R for the current time, and store the result in a variable, `now`.
- To see the numerical value that corresponds to `now`, call `unclass()` on it.

In [10]:
# Get the current date: today
today <- Sys.Date()

# See what today looks like under the hood
unclass(today)

# Get the current time: now
now <- Sys.time()

# See what now looks like under the hood
unclass(now)

### `11- Create and format dates`

To create a Date object from a simple character string in R, you can use the as.Date() function. The character string has to obey a format that can be defined using a set of symbols (the examples correspond to 13 January, 1982):

    - %Y: 4-digit year (1982)
    - %y: 2-digit year (82)
    - %m: 2-digit month (01)
    - %d: 2-digit day of the month (13)
    - %A: weekday (Wednesday)
    - %a: abbreviated weekday (Wed)
    - %B: month (January)
    - %b: abbreviated month (Jan)
    

In [11]:
as.Date("1982-01-13")
as.Date("Jan-13-82", format = "%b-%d-%y")
as.Date("13 January, 1982", format = "%d %B, %Y")

- Three character strings representing dates have been created for you. Convert them to dates using `as.Date()`, and assign them to `date1`, `date2`, and `date3` respectively. The code for date1 is already included.
- Extract useful information from the dates as character strings using `format()`. From the first date, select the weekday. From the second date, select the day of the month. From the third date, you should select the abbreviated month and the 4-digit year, separated by a space.

In [12]:
# Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"

# Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, format = "%b %d, '%y")
date2 <- as.Date(str2)
date3 <- as.Date(str3, format = "%d/%B/%Y")


# Convert dates to formatted strings
format(date1, "%A")
format(date2, "%d")
format(date3, "%b %Y")

### `12-Create and format times`
Similar to working with dates, you can use `as.POSIXct()` to convert from a character string to a `POSIXct` object, and `format()` to convert from a `POSIXct` object to a character string. Again, you have a wide variety of symbols:

- `%H`: hours as a decimal number (00-23)
- `%I`: hours as a decimal number (01-12)
- `%M`: minutes as a decimal number
- `%S`: seconds as a decimal number
- `%T`: shorthand notation for the typical format %H:%M:%S
- `%p`: AM/PM indicator

For a full list of conversion symbols, consult the `strptime` documentation in the console:

- Convert two strings that represent timestamps, `str1` and `str2`, to `POSIXct` objects called `time1` and `time2`.
- Using `format()`, create a string from `time1` containing only the minutes.
- From `time2`, extract the hours and minutes as "hours:minutes AM/PM". Refer to the assignment text above to find the correct conversion symbols!




In [13]:
# Definition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"

# Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")

time2 <- as.POSIXct(str2, format = "%Y-%m-%d %H:%M:%S")


# Convert times to formatted strings
format(time1, "%M")
format(time2, "%I:%M %p")

### `13-Calculations with Dates`
- Calculate the number of days that passed between the last and the first day you ate pizza. Print the result.
- Use the function `diff()` on `pizza` to calculate the differences between consecutive pizza days. Store the result in a new variable `day_diff`.
- Calculate the average period between two consecutive pizza days. Print the result.

In [14]:
day1 <- as.Date("2022-05-19")
day2 <- as.Date("2022-05-21")
day3 <- as.Date("2022-05-26")
day4 <- as.Date("2022-06-01")
day5 <- as.Date("2022-06-06")

In [15]:
# day1, day2, day3, day4 and day5 are already available in the workspace

# Difference between last and first pizza day
day5 - day1

# Create vector pizza
pizza <- c(day1, day2, day3, day4, day5)

# Create differences between consecutive pizza days: day_diff
day_diff <- diff(pizza)
day_diff

# Average period between two consecutive pizza days
mean(day_diff)

Time difference of 18 days

Time differences in days
[1] 2 5 6 5

Time difference of 4.5 days

### `14-Calculations with Times`

- Calculate the difference between the two vectors `logout` and `login`, i.e. the time the user was online in each independent session. Store the result in a variable `time_online`.
- Inspect the variable `time_online` by printing it.
- Calculate the total time that the user was online. Print the result.
- Calculate the average time the user was online. Print the result.

In [16]:
dt_conv <- function(x) {
    as.POSIXct(x, format = "%Y-%m-%d %H:%M:%S", tz = "UCT")
}

# Logins
str1 <- "2022-05-23 10:18:04"
str2 <- "2022-05-28 09:14:18"
str3 <- "2022-05-28 12:21:51"
str4 <- "2022-05-28 12:37:24"
str5 <- "2022-05-30 21:37:55"
login <- c(str1, str2, str3, str4, str5)
login <- dt_conv(login)

# Logouts
s1 <- "2022-05-23 10:56:29"
s2 <- "2022-05-28 09:14:52"
s3 <- "2022-05-28 12:35:48"
s4 <- "2022-05-28 13:17:22"
s5 <- "2022-05-30 22:08:47"
logout <- c(s1, s2, s3, s4, s5)
logout <- dt_conv(logout)


In [17]:
# login and logout are already defined in the workspace
# Calculate the difference between login and logout: time_online
time_online <- logout - login

# Inspect the variable time_online
time_online

# Calculate the total time online
sum(time_online)

# Calculate the average time online
mean(time_online)

Time differences in secs
[1] 2305   34  837 2398 1852

Time difference of 7426 secs

Time difference of 1485.2 secs

### `15-Time is of the essence`


In [18]:
astro <- c("20-Mar-2015", "25-Jun-2015", "23-Sep-2015", "22-Dec-2015")
names(astro) <- c("spring", "summer", "fall", "winter")
str(astro)

 Named chr [1:4] "20-Mar-2015" "25-Jun-2015" "23-Sep-2015" "22-Dec-2015"
 - attr(*, "names")= chr [1:4] "spring" "summer" "fall" "winter"


In [21]:
meteo <- c("March 1, 15", "June 1, 15", "September 1, 15", "December 1, 15")
names(meteo) <- c("spring", "summer", "fall", "winter")
str(meteo)

 Named chr [1:4] "March 1, 15" "June 1, 15" "September 1, 15" ...
 - attr(*, "names")= chr [1:4] "spring" "summer" "fall" "winter"


- Use `as.Date()` to convert the `astro` vector to a vector containing `Date` objects. You will need the `%d`, `%b` and `%Y` symbols to specify the `format`. Store the resulting vector as `astro_dates`.
- Use `as.Date()` to convert the `meteo` vector to a vector with `Date` objects. This time, you will need the `%B`, `%d` and `%y` symbols for the `format` argument. Store the resulting vector as `meteo_dates`.
- With a combination of `max()`, `abs()` and `-`, calculate the maximum absolute difference between the astronomical and the meteorological beginnings of a season, i.e. `astro_dates` and `meteo_dates`. Simply print this maximum difference to the console output.

In [22]:
# Convert astro to vector of Date objects: astro_dates
astro_dates <- as.Date(astro, format = "%d-%b-%Y")

# Convert meteo to vector of Date objects: meteo_dates
meteo_dates <- as.Date(meteo, format = "%B %d, %y")


# Calculate the maximum absolute difference between astro_dates and meteo_dates
max(abs(astro_dates - meteo_dates))

Time difference of 24 days

### `The End `