In [4]:
library(tidyverse)

# Control matching behaviour with modifier functions

**fixed**  
Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets.

**coll**  
Compare strings respecting standard collation rules.

**regex**  
The default. Uses ICU regular expressions.

**boundary**  
Match boundaries between things.

```r
fixed(pattern, ignore_case = FALSE)

coll(pattern, ignore_case = FALSE, locale = "en", ...)

regex(
  pattern,
  ignore_case = FALSE,
  multiline = FALSE,
  comments = FALSE,
  dotall = FALSE,
  ...
)

boundary(
  type = c("character", "line_break", "sentence", "word"),
  skip_word_none = NA,
  ...
)
```

# Examples

**`fixed()`**

In [9]:
# match as normal string
'1a2' %>% str_detect(fixed('1.2'))

In [11]:
# match as regular expression

'1a2' %>% str_detect('1.2')

<hr>

**`coll()`**

In [12]:
# coll() is useful for locale-aware case-insensitive matching
i <- c("I", "\u0130", "i")
i

In [13]:
'i' %>% str_detect(fixed(i, ignore_case = T))

In [14]:
'i' %>% str_detect(coll(i, ignore_case = T, locale = 'tr'))

<hr>

**`boundary()`**

In [49]:
args(boundary)

**`type`**	
Boundary type to detect.

* `character`
Every character is a boundary.

* `line_break`
Boundaries are places where it is acceptable to have a line break in the current locale.

* `sentence`
The beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (details).

* `word`
The beginnings and ends of words are boundaries.

**`skip_word_none`** 	
Ignore "words" that don't contain any characters or numbers - i.e. punctuation. Default NA will skip such "words" only when splitting on word boundaries.

In [23]:
words <- "These are   some words."

In [25]:
# Count the number of word boundary
words %>% str_count(boundary('word'))

In [27]:
words %>% str_split(' ') %>% print()

[[1]]
[1] "These"  "are"    ""       ""       "some"   "words."



In [30]:
# Split at word boundary
words %>% str_split(boundary('word')) %>% print()

[[1]]
[1] "These" "are"   "some"  "words"



<hr>

**`regex()`**: create a Regex

In [33]:
'My name is VN Pikachu' %>% str_extract_all(regex('[a-z]+', ignore_case = T))   #flags = re.I

In [48]:
"a11\nb22\nc33" %>%  str_extract_all(regex('^[a-z]+', multiline = F))

"a11\nb22\nc33" %>%  str_extract_all(regex('^[a-z]+', multiline = T))   #flags = re.MULTILINE

In [36]:
"a\nb\nc" %>% str_extract(regex('a.', dotall = T))   #flags = re.DOTALL

In [45]:
# flag = re.VERBOSE
pattern = regex("
\\d+  # This will match for number
", comments = T)

'a322 32ar3 a3' %>% str_extract_all(pattern) 