In [2]:
library(tidyverse)

# Separate a character column into multiple columns with a regular expression or numeric locations

Given either a regular expression or a vector of character positions, **`separate()`** turns a single character column into multiple columns.

```r
separate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  extra = "warn",
  fill = "warn",
  ...
)
```

# Examples

In [3]:
# If you want to split by any non-alphanumeric value (the default):
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
df

x
""
a.b
a.d
b.c


In [4]:
df %>% separate(col = x, into = c('first column', 'second column'))
#short from
#df %>% separate(x, c('first column', 'second column'))

first column,second column
,
a,b
a,d
b,c


In [12]:
# use can choose to preserve input (remove = F)
df %>% separate(x, into = c('first', 'second'), remove = F)

x,first,second
,,
a.b,a,b
a.d,a,d
b.c,b,c


In [9]:
# split at position c(1, 2)
df %>% separate(x, into = c('first', 'second', 'third'), sep = c(1, 2))

# can also use negative number to split
df %>% separate(x, c('first', 'second', 'third'), sep = c(1, -1))

first,second,third
,,
a,.,b
a,.,d
b,.,c


first,second,third
,,
a,.,b
a,.,d
b,.,c


In [8]:
# If you just want the second variable:
df %>% separate(x, c(NA, 'second column'))

second column
""
b
d
c


In [9]:
# If every row doesn't split into the same number of pieces, use
# the extra and fill arguments to control what happens:
df <- data.frame(x = c("a", "a b", "a b c", NA))

df

x
a
a b
a b c
""


In [12]:
df %>% separate(x, c('A', 'B'))

"Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1]."

A,B
a,
a,b
a,b
,


In [13]:
# The same behaviour as previous, but drops the c without warnings:
df %>% separate(x, c("a", "b"), extra = "drop", fill = "right")

a,b
a,
a,b
a,b
,


In [15]:
# Opposite of previous, keeping the c and filling left:
df %>% separate(x, c("a", "b"), extra = "merge", fill = "left")

a,b
,a
a,b
a,b c
,


In [16]:
# Or you can keep all three:
df %>% separate(x, c("a", "b", "c"))

"Expected 3 pieces. Missing pieces filled with `NA` in 2 rows [1, 2]."

a,b,c
a,,
a,b,
a,b,c
,,


In [18]:
# To only split a specified number of times use extra = "merge":
df <- data.frame(x = c("x: 123", "y: error: 7"))

df

x
x: 123
y: error: 7


In [19]:
df %>% separate(x, c("key", "value"), ": ", extra = "merge")

key,value
x,123
y,error: 7


In [21]:
# Use regular expressions to separate on multiple characters:
df <- data.frame(x = c(NA, "a?b", "a.d", "b:c"))

df

x
""
a?b
a.d
b:c


In [24]:
df %>% separate(col = x, into = c('first', 'second'), sep = '[?.:]')

first,second
,
a,b
a,d
b,c


In [28]:
# convert = TRUE detects column classes:
df <- data.frame(x = c("a:1", "a:2", "c:4", "d", NA))

df

x
a:1
a:2
c:4
d
""


In [29]:
#convert = FALSE (default)

df %>% separate(x, c("key","value"), ":") %>% str

"Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4]."

'data.frame':	5 obs. of  2 variables:
 $ key  : chr  "a" "a" "c" "d" ...
 $ value: chr  "1" "2" "4" NA ...


In [30]:
#convert = TRUE
# e.g: convert "100" (character) to 100 (number)
df %>% separate(x, c("key","value"), ":", convert = TRUE) %>% str

"Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4]."

'data.frame':	5 obs. of  2 variables:
 $ key  : chr  "a" "a" "c" "d" ...
 $ value: int  1 2 4 NA NA


### `extra` and `fill` argument

`extra` argumnet tells `separate` what to do when there are two many pieces

In [5]:
df <- tibble(x = c("a,b,c", "d,e,f,g", "h,i,j"))
df

x
"a,b,c"
"d,e,f,g"
"h,i,j"


In [7]:
# drop extra pieces with a warn
df %>% separate(x, into = c('first', 'second', 'third'), extra = 'warn') # default

"Expected 3 pieces. Additional pieces discarded in 1 rows [2]."

first,second,third
a,b,c
d,e,f
h,i,j


In [8]:
# drop extra pieces without a warn
df %>% separate(x, into = c('first', 'second', 'third'), extra = 'drop')

first,second,third
a,b,c
d,e,f
h,i,j


In [9]:
# merge extra pieces to the last column
df %>% separate(x, into = c('first', 'second', 'third'), extra = 'merge')

first,second,third
a,b,c
d,e,"f,g"
h,i,j


`fill` argument tells `separate` what to do when there are not enough pieces

In [10]:
# emit a warning and fill from the right
df %>% separate(x, into = c('first', 'second', 'third', 'fourth'), fill = 'warn')

"Expected 4 pieces. Missing pieces filled with `NA` in 2 rows [1, 3]."

first,second,third,fourth
a,b,c,
d,e,f,g
h,i,j,


In [15]:
# fill with missing values on the left
df %>% separate(x, into = c('first', 'second', 'third', 'fourth'), fill = 'left')

first,second,third,fourth
,a,b,c
d,e,f,g
,h,i,j


In [16]:
# fill wiht missing values on the right
df %>% separate(x, into = c('first', 'second', 'third', 'fourth'), fill = 'right')

first,second,third,fourth
a,b,c,
d,e,f,g
h,i,j,
