# String Manipulation

# Printing strings

Beware that the printed representation of a string is not the same as string itself, because the printed representation shows the escapes. To see the raw contents of the string, use `writeLines()`:

In [1]:
print('\\')

[1] "\\"


In [2]:
writeLines('\\')

\


# toString

This is a helper function for format to produce a single character string describing an R object.  
A character vector of length 1 is returned.

In [22]:
clans <- c('Dirilis', 'Iron Core')
toString(clans)

In [23]:
toString(1:10)

# checking string

**`is.character`**

In [45]:
is.character('3fad')

In [46]:
is.character(3 + 2i)

In [48]:
is.character(c('1', 'Hello'))

# converting to string

**`as.character`**

In [1]:
as.character(1)

In [20]:
levels <- c(31, 35, 30)
#vectorize: convert each value to `character` type
as.character(levels)

## Concatenating strings

Concatenate vectors after converting to character.

```R
paste(..., sep = " ", collapse = NULL)
```

Parameters:  
* **`...`** represents any number of arguments to be combined.

* **`sep`** represents any separator between the arguments. It is optional.

* **`collapse`** is used to eliminate the space in between two strings. But not the space within two words of one string.

In [1]:
paste('Hello', 'Word')

In [2]:
paste('VN', 'Pikachu', sep = '-')

<hr>

In [41]:
#vectorize
paste(c('1', '2'), c('a', 'b'), sep = '-')

In [39]:
#vectorize, collapse result into a single value
paste(c('Hello', 'World'), c('R', 'Programming'), sep = '-', collapse = ', ')

# repeating string

```R
rep(x, ...)

rep.int(x, times)

rep_len(x, length.out)
```

In [43]:
rep('abc', 3)

## Formatting

```R
format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none")) 
```

* **`x`** is the vector input.

* **`digits`** is the total number of digits displayed.

* **`nsmall`** is the minimum number of digits to the right of the decimal point.

* **`scientific`** is set to TRUE to display scientific notation.

* **`width`** indicates the minimum width to be displayed by padding blanks in the beginning.

* **`justify`** is the display of the string to left, right or center.

In [19]:
## Total number of digits displayed. Last digit rounded off.
format(123.4567, digits = 5)

In [20]:
## Display numbers in scientific notation.
format(c(6, 13.14521), scientific = TRUE)

In [21]:
## The minimum number of digits to the right of the decimal point.
format(1.2, nsmall = 5)

In [22]:
# Format treats everything as a string.
format(6)

In [23]:
# Numbers are padded with blank in the beginning for width.
format(6, width = 5)

In [24]:
#left justify string
format('Pikachu', width = 10, justify = 'l')

In [26]:
#justify string with center
format('-xXx-', width = 17, justify = 'c')

## Counting the number of characters

```R
nchar(x)
```

In [27]:
nchar('VN Pikachu')

In [36]:
#vectorize
nchar(c('VN Pikachu', 'Tank Cao'))

## Changing to upper & lower

**`toupper(x)`**, **`tolower(x)`**

In [31]:
s <- 'i love u'
toupper(s)

In [33]:
s <- 'I LOVE U'
tolower(s)

## Substring

Extract or replace substrings in a character vector.

**Usage**:

```R
substr(x, start, stop)
substring(text, first, last = 1000000L)
substr(x, start, stop) <- value
substring(text, first, last = 1000000L) <- value
```

In [6]:
s <- '12345'
s

In [7]:
substring(s, 1, 4)

In [8]:
substring(s, 2)

In [15]:
substring("abcdef", 1:6, 1:6)

In [17]:
#vectorize start and stop
#start = [1, 4]
#stop = [4,4]
#result: [substring(v, 1, 4), substring(v, 4, 4)]
substring('012345', c(1, 4), c(4, 4))

In [18]:
#broadcast
substring('012345', 0, c(2, 4))

<hr>

In [9]:
#replacement
substring(s, 3, 4) <- 'xx'
s

<hr>

In [12]:
#working with a vector of strings
names <- c('VN Pikachu', 'Tank Cao', 'Meomeo888', 'Morino Nanako', 'quachtinh')
names

In [13]:
#vectorize, apply `substr` for each element in string in the vector
substr(names, 2, 4)

## substr

same usage as **`substr`**

In [3]:
s <- '12345'
substr(s, 1, 2)

## strsplit

```R
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
```

Arguments
```R
x	
character vector, each element of which is to be split. Other inputs, including a factor, will give an error.

split	
character vector (or object which can be coerced to such) containing regular expression(s) (unless fixed = TRUE) to use for splitting. If empty matches occur, in particular if split has length 0, x is split into single characters. If split has length greater than 1, it is re-cycled along x.

fixed	
logical. If TRUE match split exactly, otherwise use regular expressions. Has priority over perl.

perl	
logical. Should Perl-compatible regexps be used?

useBytes	
logical. If TRUE the matching is done byte-by-byte rather than character-by-character, and inputs with marked encodings are not converted. This is forced (with a warning) if any input is found which is marked as "bytes" (see Encoding).
```

In [31]:
strsplit('Every single character', NULL)

In [25]:
strsplit('1x2x3x4', 'x')

In [27]:
#remember, fixed = FALSE (default), which mean we are using Regex
strsplit('1.2.3', '.')

In [29]:
#let's turn off regex mode
strsplit('1.2.3', '.', fixed = TRUE)

# Pattern Matching and replacement (Regex)

**`grep`**, **`sub`**

In [35]:
help(sub)

```R
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)

grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
    fixed = FALSE, useBytes = FALSE)

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
     fixed = FALSE, useBytes = FALSE)

regexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
        fixed = FALSE, useBytes = FALSE)

gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
         fixed = FALSE, useBytes = FALSE)

regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
        fixed = FALSE, useBytes = FALSE)
```

In [50]:
help(grep)