# 使用stringr处理字符串

这一部分主要是关于正则表达式，正则表达式非常强，用处非常大，几乎所有的编程语言都支持，而且很多工具都支持，对正则表达式(RE)支持最好的应该是perl,正则表达式也很难理解，不知道的话看到复杂的正则表达式还以为是猫在键盘上踩出来的。

In [1]:
# 加载包
library(tidyverse)
library(stringr)

─ [1mAttaching packages[22m ──────────────────── tidyverse 1.2.1 ─
[32m✔[39m [34mggplot2[39m 3.0.0     [32m✔[39m [34mpurrr  [39m 0.3.3
[32m✔[39m [34mtibble [39m 2.1.3     [32m✔[39m [34mdplyr  [39m 0.8.3
[32m✔[39m [34mtidyr  [39m 0.8.1     [32m✔[39m [34mstringr[39m 1.3.1
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.4.0
─ [1mConflicts[22m ───────────────────── tidyverse_conflicts() ─
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


In [2]:
# 一般使用双引号，除非字符串中包含双引号

string1 <- "This is a string"
string2 <- 'To put a "quote" inside a string, use single quotes'
string1
string2

## 转义

都有特殊符号需要转义，下面介绍转义

In [3]:
# 字符串中包含单引号或者双引号
(double_quote <- "\"")
# 或者'"'

(single_quote <- '\'')
# 或者"'"
writeLines(single_quote)# 查看字符串初始化内容

x <- c("\"", "\\")
x
writeLines(x)

# 还有就是换行符和制表符
y <- c("a","\n","b","\t","c")

y
writeLines(y)
# 这里显示有些问题
# 其他特殊字符不常用就不说了。

'


"
\


a


b
	
c


In [4]:
# 字符串长度

# 作者建议我们使用stringr中的函数而不是基础包里面的，
# 我们就听话就好
# stringr系列函数都以str_开头，用法相似
str_length(c("a", "R for data science", NA))

# 字符串组合
str_c("x", "y")
str_c("x", "y", "z")

# 使用sep参数控制字符串的分隔方式
str_c("x", "y", "z", sep = ",")

# 和多数R函数一样，缺失值是可传染的
# 如果想要将它们输出为"NA"，可以使用str_replace_na()函数
x <- c("abc", NA)
str_c("|", x, "|")%>% t()
str_c("|", str_replace_na(x), "|")%>% t()

# str_c()函数是向量化的，可以自动循环短向量
str_c("prefix-", c("a", "b", "c"), "-|") %>% t()

# 长度为0的对象会被无声无息地丢弃。这与if结合起来特别有用：

name <- "Hadley"
time_of_day <- "morning"
birthday <- FALSE
str_c("Good ", time_of_day, " ", name, 
     if(birthday) " and HAPPY BIRTHDAY", "-")
      
str_c("Good ", time_of_day, " ", name, 
     if(!birthday) " and HAPPY BIRTHDAY", "-")
      
# 要想将字符向量合并为字符串，可以使用collapse()函数：
str_c(c("x", "y", "z"), collapse = ", ")

0,1
|abc|,


0,1
|abc|,|NA|


0,1,2
prefix-a-|,prefix-b-|,prefix-c-|


## 字符串取子集

可以使用str_sub()函数取子集

In [5]:
x <- c("Apple", "Banana", "Pear")
x %>% t()
str_sub(x, 1, 3)
str_sub(x, -3, -1)

# 即使字符串过短，str_sub()函数也不会出错，
# 它将返回尽可能多的字符：
str_sub("a", 1, 5) 

# 还可以使用str_sub()函数的赋值形式修改字符串
str_sub(x, 1, 1) <- str_to_lower(str_sub(x, 1, 1))
x %>% t()

# str_to_upper()转换成大写，str_to_title()看帮助去把。

0,1,2
Apple,Banana,Pear


0,1,2
apple,banana,pear


In [6]:
# 比较理解一下吧
str_c("a","-","b",NA )
str_c(str_replace_na(c("a", NA, "b")), "-d")
paste("a","-","b",NA)
paste0("a","-","b",NA)
stringi::stri_join(str_replace_na(c("a", NA, "b")), "-d")

paste("1st", "2nd", "3rd", collapse = ", ")
paste0(1:12, c("st", "nd", "rd", rep("th", 9))) %>% t()

0,1,2,3,4,5,6,7,8,9,10,11
1st,2nd,3rd,4th,5th,6th,7th,8th,9th,10th,11th,12th


str_c()函数的sep和collapse参数区别看一看示例应该就明白了

example(str_c)

In [7]:
x <- c("abcde","abc")
y <- c("abcd", "abef")
# 提取字符串最中间的字符，这里的if()函数有问题
if(str_length(x[1])%%2) str_sub(x, (str_length(x)+1)/2, (str_length(x)+1)/2)
# 如果字符串长度为偶数，则提取中间两个，同样的if()函数有问题
if(!str_length(y[1])%%2) str_sub(y, str_length(y)/2, str_length(y)/2+1)

In [8]:
# str_wrap()用于格式化输出
cat(str_wrap(str_c(rep(letters, 5), collapse = "\n"), 
             width = 52, indent = 0, exdent = 0),"\n")

a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z 


In [9]:
# 字符向量化字符串
str_c(c("a",  "b",  "c"), collapse = "、")

In [10]:
# 三个函数
# str_trim()：移除空白/space，可选参数为left,right,both,默认为both
# str_squish()：移除空格，包括字符串中间的重复空格
# str_pad()：添加空格等，参数好几个，看帮助
example(str_trim)
example(str_pad)


str_tr> str_trim("  String with trailing and leading white space\t")
[1] "String with trailing and leading white space"

str_tr> str_trim("\n\nString with trailing and leading white space\n\n")
[1] "String with trailing and leading white space"

str_tr> str_squish("  String with trailing,  middle, and leading white space\t")
[1] "String with trailing, middle, and leading white space"

str_tr> str_squish("\n\nString with excess,  trailing and leading white   space\n\n")
[1] "String with excess, trailing and leading white space"

str_pd> rbind(
str_pd+   str_pad("hadley", 30, "left"),
str_pd+   str_pad("hadley", 30, "right"),
str_pd+   str_pad("hadley", 30, "both")
str_pd+ )
     [,1]                            
[1,] "                        hadley"
[2,] "hadley                        "
[3,] "            hadley            "

str_pd> # All arguments are vectorised except side
str_pd> str_pad(c("a", "abc", "abcdef"), 10)
[1] "         a" "       abc" "    abcdef"

str_pd> str_pad("a", c(5

# 正则表达式

正则表达式非常复杂，这里只是说说作者提到的。

不同编程语言的正则表达式有差异，但不是很大，可以说是一通百通，建议好好学习正则表达式，像作者说的那样

> 正则表达式是一门非常精练的语言，可以描述字符串中的模式。理解正则表达式需要花费一点精力，但是一旦理解了，你就会发现其功能如此强大。

这里通过str_view()函数和str_view_all()函数学习正则表达式，它们接受一个字符向量和一个正则表达式

In [11]:
# jupyter的显示和Rstudio不同

# 精确匹配
x <- c("apple", "banana", "pear")
str_view(x, "an")

# 使用“.”进行任意字符匹配(除了换行符)
str_view(x, ".a.")

我就不用自己的话来说了，看作者说的

> 但是，如果.可以匹配任意字符，那么如何匹配字符.呢？你需要使用一个“转义”符号来告诉正则表达式实际上就是要匹配.这个字符，而不是使用.来匹配其他字符。和字符串一样，正则表达式也使用反斜杠来去除某些字符的特殊含义。因此，如果要匹配.，那么你需要的正则表达式就是\\.。遗憾的是，这样做会带来一个问题。因为我们使用字符串来表示正则表达式，而且\\在字符串中也用作转义字符，所以正则表达式\.的字符串形式应是\\\\.。

In [12]:
# 匹配英文点(.)的正则表达式
(dot <- "\\.")
# 还记得writeLines()函数吧，我们看看实际表达式本身
writeLines(dot)
# 看例子
str_view(c("abc", "a.c", "bef"), "a\\.c")
# 是不是有点复制

\.


还有一句话，我也原样放在这里，好好理解

> 如果\\在正则表达式中用作转义字符，那么如何匹配\\这个字符呢？我们还是需要去除其特殊意义，建立形式为\\\的正则表达式。要想建立这样的正则表达式，我们需要使用一个字符串，其中还需要对\\进行转义。这意味着要想匹配字符\\，我们需要输入"\\\\\\\\"——你需要4个反斜杠来匹配1个反斜杠！

其实我为了在markdown语法里面显示4个反斜杠，使用了8个反斜杠

In [13]:
# example
x <- "a\\b"
writeLines(x)
str_view(x, "\\\\")

a\b


这一句也要理解下

> 本书将正则表达式写作\\.，将表示正则表达式的字符串写作"\\\\."。

In [14]:
writeLines(c("\"","\\","\\\""))# 匹配的实际内容

"
\
\"


In [15]:
# 正则表达式默认匹配字符串任意部分
# "^"匹配开头；"$"匹配结尾
x <-c("apple", "banana", "pear") 
str_view(x, "^a")
str_view(x, "a$")

In [16]:
# 如果想要强制正则表达式匹配一个完整字符串，那么可以同时设置^和$这两个锚点：
x <-c("apple pie", "apple", "apple cake") 
str_view(x, "apple")
str_view(x, "^apple$")
# 还可以使用\b来匹配单词间的边界

In [17]:
# 练习
# 如何匹配字符串"$^$"？
x <- c("$^$","abc")
str_view(x, "\\$\\^\\$")

In [18]:
# 这一部分还是用rstudio显示好一点
str_view(words, "^y", match = T) 
str_view(words, "x$", match = T)
str_view(words, "^...$", match = T)

4种常用的字符类

\d：匹配任意数字digital

\s：匹配任意空白space

[abc]：匹配abc

[^abc]：匹配非abc

In [19]:
# 使用（）使表达式更清晰
str_view(c("grey", "gray"), "gr(e|a)y", match = T)

In [20]:
# 练习的一部分

str_view(words, "^[aeiou]", match = T)
str_view(words, "^[^aeiou]", match = T)
str_view(words, "(ed)$", match = T)
str_view(words, "(ing)$|(ize)$", match = T)

重复

?：0次或1次。

+：1次或多次。

*：0次或多次。

In [21]:
x <-"1888 is the longest 
year in Roman numerals: MDCCCLXXXVIII"

str_view(x, "CC?", match = T)
str_view(x, "CC+", match = T)
str_view(x, "C[LX]+", match = T)

精确设置匹配的次数。

{n}：匹配n次。

{n,}：匹配n次或更多次。

{,m}：最多匹配m次。

{n, m}：匹配n到m次

In [22]:
str_view(x, "C{2}", match = T)# 2次
str_view(x, "C{2,}", match = T)# 至少2次
str_view(x, "C{2,3}", match = T)# 2到3次

# m默认匹配方式是贪婪的，正则表达式会匹配尽量长的字符串，
# 通过在正则表达式后加一个问号可以改为“懒惰的

str_view(x, "C{2,3}?", match = T)
str_view(x, "C[LX]+?", match = T)

In [23]:
# 以3个辅音字母开头的单词
str_view(words, "^[^aeiou]{3}", match = T)
# 有连续3个或更多元音字母的单词
str_view(words, "[aeiou]{3,}", match = T)
# 有连续2个或更多元音—辅音配对的单词
str_view(words, "[aieou]{2,}[^aeiou]", match = T)



##  分组与回溯引用

前面学习了括号可以用于消除复杂表达式中的歧义。括号还可以定义“分组”，你可以通过回溯引用（如\\1、\\2等）来引用这些分组。

In [24]:
# 找出名称中有重复的一对字母的所有水果
str_view(fruit, "(..)\\1", match = T)

# 某种分组匹配三次
str_view(x, "(.)\\1\\1", match = T)
str_view(x, "([^\\d])\\1\\1", match = T)
# 能看出来只匹配第一个符合的

# 匹配回文字符串
str_view(c("abba", "abccba"), "(.)(.)\\2\\1", match = T)

# 匹配第1,3,5字符相同的，2,4为同不同都可以
str_view(c("abaca", "acbaa", "aaaaa"), "(.).\\1.\\1", match = T)

# 前三和后三回文，中间随意
str_view(c("abcdddcba", "abcdefcba", "abccba"), "(.)(.)(.).*\\3\\2\\1", match = T)

In [25]:
# (2) 创建正则表达式来匹配出以下单词。
# a.开头字母和结尾字母相同的单词。
# b.包含一对重复字母的单词（例如，church中包含了重复的ch）。
# c.包含一个至少重复3次的字母的单词（例如，eleven中的e重复了3次 ）。

# 我的答案，不一定对

str_view(words, "^(.).*\\1$", match = T)

str_view(words, "^(.)(.).*\\1\\2$", match = T)

str_view(words, "^.*(.).*\\1.*\\1.*$", match = T)

# 工具

## 匹配检测

In [26]:
# str_detect()函数返回字符串是否匹配到


# 直接检查是否匹配到
x <- c("apple", "banana", "pear")
str_detect(x, "e")

# 匹配统计
# 有多少个以t开头的常用单词？
sum(str_detect(words, "^t"))

# 以元音字母结尾的常用单词的比例是多少？
mean(str_detect(words, "[aeiou]$"))

# 找出不包含元音字母的所有单词

# 找出至少包含一个元音字母的所有单词，然后取反
no_vowels_1 <- !str_detect(words, "[aeiou]")

# 找出仅包含辅音字母（非元音字母）的所有单词
no_vowels_2 <- str_detect(words, "^[^aeiou]+$")

identical(no_vowels_1, no_vowels_2)# Test Objects for Exact Equality
# 两种方法的结果是一样的，觉得哪种更容易理解用哪种


# 这里补充一下为什么要加"^"和"$"，
# 其实就是为了匹配完整的单词，不然可能仅仅是单词的部分

str_detect()函数的常见用法是选取出匹配某种模式的元素

In [27]:
# str_detect()匹配取以x结尾的单词
words[str_detect(words, "x$")]
# str_subset()取子集的方式获取以x结尾的单词
str_subset(words, "x$")

In [28]:
# 创建数据框用filter取以x结尾的单词

df <- tibble(
    word = words,
    i = seq_along(words)
)

df %>% filter(str_detect(words, "x$"))



word,i
<chr>,<int>
box,108
sex,747
six,772
tax,841


In [29]:
# 用str_count()返回匹配字符串的数量
x <- c("apple", "banana", "pear")

str_count(x, "a")
# 让我们计算元音字母平均出现的次数
mean(str_count(words, "[aeiou]"))

# 还记得mutate()函数吗
df %>% mutate(
    vowels = str_count(word, "[aeiou]"),
    consonants = str_count(word, "[^aeiou]")
) %>% head()

# 以上就是添加新的列并计算每个单词中元音和辅音字符的数量

word,i,vowels,consonants
<chr>,<int>,<int>,<int>
a,1,1,0
able,2,2,2
about,3,3,2
absolute,4,4,4
accept,5,2,4
account,6,3,4


In [30]:
# Note 
# 匹配从来不会重叠。例如，在"abababa"中，模式"aba"会匹配多少次？
# 正则表达式会告诉你是2次，而不是3次：
str_count("abababa", "aba")

str_view_all("abababa", "aba", match = T)

str_view("abababa", "aba", match = T)

# 很多stringr函数都是成对出现的：一个函数用于单个匹配，
# 另一个函数用于全部匹配，后者会有后缀_all

## 练习

In [31]:
# a.找出以x开头或结尾的所有单词
words[str_detect(words, "^x|x$")]
# 之所以没有x开头的是因为words这个数据里没有x开头的
# b. 找出以元音字母开头并以辅音字母结尾的所有单词
# 单个正则表达式解决方法
# words[str_detect(words, "^[aeiou][a-z]*[^aeiou]$")] 

# 分开解决，通过中间变量
# vowels <- words[str_detect(words,"^[aeiou]")]  
# vowels[str_detect(vowels, "[^aeiou]$")]

# 利用管道跳过中间变量
# words[str_detect(words,"^[aeiou]")] %>% .[str_detect(.,"[^aeiou]$")]

# identical()函数验证是否一致
identical(words[str_detect(words, "^[aeiou][a-z]*[^aeiou]$")], 
          words[str_detect(words,"^[aeiou]")] %>% .[str_detect(.,"[^aeiou]$")])

# c. 是否存在包含所有元音字母的单词？
words[str_detect(words, "a")] %>% .[str_detect(.,"e")] %>% 
.[str_detect(.,"i")]  %>% .[str_detect(.,"o")] %>% .[str_detect(.,"u")]  
# 哈哈，不存在的
# 单个正则表达式实在是写不出来



In [32]:
# d. 哪个单词包含最多数量的元音字母？哪个单词包含最大比例的元音字母？

temp <- df %>% mutate(
    vowels = str_count(word, "[aeiou]"),
    consonants = str_count(word, "[^aeiou]"),
    long = vowels + consonants,
    per = vowels/long
) 

arrange(temp, desc(vowels)) %>% head(10)

arrange(temp, desc(per)) %>% head(10)

# 创建个数据框来解决问题

word,i,vowels,consonants,long,per
<chr>,<int>,<int>,<int>,<int>,<dbl>
appropriate,48,5,6,11,0.4545455
associate,57,5,4,9,0.5555556
available,62,5,4,9,0.5555556
colleague,166,5,4,9,0.5555556
encourage,268,5,4,9,0.5555556
experience,292,5,5,10,0.5
individual,423,5,5,10,0.5
television,846,5,5,10,0.5
absolute,4,4,4,8,0.5
achieve,7,4,3,7,0.5714286


word,i,vowels,consonants,long,per
<chr>,<int>,<int>,<int>,<int>,<dbl>
a,1,1,0,1,1.0
area,49,3,1,4,0.75
idea,412,3,1,4,0.75
age,22,2,1,3,0.6666667
ago,24,2,1,3,0.6666667
air,26,2,1,3,0.6666667
die,228,2,1,3,0.6666667
due,250,2,1,3,0.6666667
eat,256,2,1,3,0.6666667
europe,278,4,2,6,0.6666667


## 提取匹配的内容

使用str_extract()函数提取匹配的内容

In [33]:
# 加载数据集
stringr::sentences %>% head()

length(sentences)

In [34]:
# 创建颜色向量

colors <- c( "red", "orange", "yellow", "green", "blue", "purple" ) 

# 利用str_c()函数将字符串向量转化为正则表达式
color_match <- str_c(colors, collapse = "|")# Join multiple strings into a single string.

color_match

In [35]:

# 先找包含颜色的句子
(has_color <- str_subset(sentences, color_match)) %>% head()
# Keep strings matching a pattern, or find positions.

# 提取颜色
(matches <- str_extract(has_color, color_match)) %>% head()

# 能够发现有错误
# str_extract()只提取第一个匹配

In [36]:
# 让我们找更多匹配的
more <- sentences[str_count(sentences, color_match) > 1 ]

str_view_all(more, color_match, match = T)

str_extract(more, color_match)
# 要想得到所有匹配，可以使用str_extract_all()函数，它会返回一个列表
str_extract_all(more, color_match)

In [37]:
# 如果设置了simplify  =  TRUE，那么str_extract_all()会返回一个矩阵，
# 其中较短的匹配会扩展到与最长的匹配具有同样的长度：

str_extract_all(more, color_match, simplify =TRUE)

x <-c ("a", "a b", "a b c") 

str_extract_all(x, "[a-z]", simplify =TRUE) 

0,1
blue,red
green,red
orange,red


0,1,2
a,,
a,b,
a,b,c


### 分组匹配



In [38]:

# 修正颜色匹配错误
color_match_fix <- str_c(" ", colors, " " , collapse = "|")
sentences %>% str_subset(color_match_fix) %>% head() %>% 
str_extract_all(color_match_fix, simplify = T) %>% t()

# 每个句子的第一个单词

first_word <- "[a-zA-Z]+( |'[a-z])"
sentences %>% str_subset(first_word) %>% str_extract(first_word) %>%
head() %>% t()


# 以ing结尾的单词
ing <- "([^ ]+)ing"
sentences %>% str_subset(ing) %>% head() %>% 
str_extract_all(ing, simplify =  T) %>% t()

# 说实话看到这些题目的时候一开始不知道怎么做，
# 看了下一节的内容灵感就来了
# 不懂就先向下看吧
# 关键是构造正则表达式

0,1,2,3,4,5
blue,blue,blue,yellow,green,red


0,1,2,3,4,5
The,Glue,It's,These,Rice,The


0,1,2,3,4,5
stocking,spring,evening,morning,winding,living


In [39]:
# 从句子中抽名词

noun <- "(a|the) ([^ ]+)" # 构造正则表达式

(has_noun <- sentences %>% str_subset(noun) %>% head(10))
has_noun %>%  str_extract(noun) # str_extract()函数可以给出完整匹配
has_noun %>% str_match(noun)# str_match()函数则可以给出每个独立分组

0,1,2
the smooth,the,smooth
the sheet,the,sheet
the depth,the,depth
a chicken,a,chicken
the parked,the,parked
the sun,the,sun
the huge,the,huge
the ball,the,ball
the woman,the,woman
a helps,a,helps


In [40]:
# 使用tidyr::extract()

tibble(sentence = sentences) %>% tidyr::extract(
        sentence, c("article", "noun"), noun,
        remove = F) %>% head()


sentence,article,noun
<chr>,<chr>,<chr>
The birch canoe slid on the smooth planks.,the,smooth
Glue the sheet to the dark blue background.,the,sheet
It's easy to tell the depth of a well.,the,depth
These days a chicken leg is a rare dish.,a,chicken
Rice is often served in round bowls.,,
The juice of lemons makes fine punch.,,


练习

(1) 找出跟在一个数词（one、two、three等）后面的所有单词，提取出数词与后面的单词。

(2) 找出所有缩略形式，分别列出撇号前面和后面的部分

In [41]:
# (1)

num <- " (one|two|three) ([^ ]+)" 
# 前面的空格是为了避免以“one”结尾的单词，当然“\s”也可以

sentences %>% str_subset(num) %>% head() %>%  
str_extract_all(num, simplify = T) %>% t()

abb <- " [^ ]+'[^ ]+ "

sentences %>% str_subset(abb) %>% head() %>% str_match(abb)
# str_extract(abb)

0,1,2,3,4,5
two met,two factors,three lists,two when,one war,one button


0
man's
don't
store's
workmen's
sun's
child's


### 替换匹配内容

替换匹配用str_replace()函数和str_replace_all()函数

In [42]:
x <- c("apple", "pear", "banana") 
str_replace(x, "[aeiou]", "-")# 用-替换aeiou

str_replace_all(x, "[aeiou]", "-")

# 使用命名向量同时执行多个替换
x <- c("1 house", "2 cars", "3 people") 
str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))

# 通过配合回溯引用实现调换顺序
sentences %>% str_replace("([^ ]+) ([^ ]+) ([^ ]+)", "\\1 \\3 \\2") %>%
head()

### 拆分

str_split()函数可以将字符串拆分为多个片段

In [43]:
sentences %>% head() %>% str_split(" ",simplify = T)

# 还可以设定拆分片段的最大数量：

fields <-c("Name: Hadley", "Country: NZ", "Age: 35") 
fields %>%str_split(": ", n =2, simplify =TRUE)

0,1,2,3,4,5,6,7,8
The,birch,canoe,slid,on,the,smooth,planks.,
Glue,the,sheet,to,the,dark,blue,background.,
It's,easy,to,tell,the,depth,of,a,well.
These,days,a,chicken,leg,is,a,rare,dish.
Rice,is,often,served,in,round,bowls.,,
The,juice,of,lemons,makes,fine,punch.,,


0,1
Name,Hadley
Country,NZ
Age,35


In [44]:
# 除了模式，你还可以通过字母、行、句子和单词边界（boundary()函数）来拆分字符串：
x <- "This is a sentence. This is another sentence."
str_view_all(x, boundary("word"))

str_split(x, " ", simplify = T) 

str_split(x, boundary("word"), simplify = T) 

0,1,2,3,4,5,6,7
This,is,a,sentence.,This,is,another,sentence.


0,1,2,3,4,5,6,7
This,is,a,sentence,This,is,another,sentence


In [45]:
x <- "apples, pears, and bananas"
str_split(x, " |, ", simplify = T) 
str_split(x, " ", simplify = T)
str_split(x, boundary("word"), simplify = T)

# 比较以上三种拆分的不同

0,1,2,3
apples,pears,and,bananas


0,1,2,3
"apples,","pears,",and,bananas


0,1,2,3
apples,pears,and,bananas


这一章内容比较多，后面还剩下一部分，就不写了，需要可以看书。