In this notebook, we will cover:

* [String basics and length](#String-basics-and-length)
* [Combining strings](#Combining-strings)
* [Subsetting strings](#Subsetting-strings)
* [Locales](#Locales)

In [4]:
library(tidyverse)
library(stringr)

Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats


# String basics and length

In [5]:
(mystring <- "STATS 306")

In [6]:
(mystring2 <- 'STATS 306')

You can create strings using double quotes or single quotes -- there's no difference. However, for consisteny, you might want to stick to double quotes except in cases when your strong itself has double quotes.

In [9]:
(mystring3 <- '"MLE" stands for "Maximum Likelihood Estimate"')

You can create a string with double quotes in it while using double quotes to create it but you will have to use a lot of *escape* all the double quotes!

In [11]:
(mystring3 <- "\"MLE\" stands for \"Maximum Likelihood Estimate\"")

This is also means that if you actually want a backslash in your string, you need to escape it as well.

In [12]:
(mystring4 <- "\\ is the backslash character")

In [21]:
(mystrings <- c("\"", '"', '\'', "'", "\\", "\\/")) # note the use of c() to create a character vector

The printed representation of strings shows the escapes.

In [14]:
print(mystrings)

[1] "\""  "\""  "'"   "'"   "\\"  "\\/"


Use `writeLines()` to see the raw contents of the string. 

In [15]:
writeLines(mystrings)

"
"
'
'
\
\/


It is good to know of a few about a few more escape sequences.

In [16]:
writeLines("First line\nSecond line") # newline

First line
Second line


In [17]:
writeLines("Text\n\tIndented Text\nText") # newlines and tab

Text
	Indented Text
Text


You can also print characters if you know their unicode using `\u`. For example, the copyright character has unicode `00A9`. Wikipedia has [a complete list](https://en.wikipedia.org/wiki/List_of_Unicode_characters).

In [22]:
writeLines("\u00A9")

©


Base R has strong functions but we will avoid them use the ones in the `stringr` package. They all start with `str_`

In [23]:
str_length(c("a", "character", "vector"))

# Combining strings

In [24]:
str_c("Let us con", "catenate strings!")

In [30]:
mystrings <- c("one", "two", "ten")

In [31]:
str_c("*** ", mystrings , " ***") # each argument is expanded to the length of the longest

In [33]:
mystrings_na <- c("one", "two", NA)

In [35]:
str_c("*** ", mystrings_na, " ***") # missingness is contagious!

In [36]:
str_c("*** ", str_replace_na(mystrings_na), " ***") # converts missing values to the string "NA"

In [47]:
str_c("one", "two", "ten", sep = ", ") # can provide a separator

In [49]:
str_c(mystrings, sep = ", ") # why does this not combine the strings?

In [50]:
str_c(mystrings, collapse = ", ") # use collapse if the strings you want to combine are in a vector

# Subsetting strings

# Locales