In [2]:
library(tidyverse)

In python, use **`DataFrame.drop_duplicates()`**


# Arrange rows by column values

Select only unique/distinct rows from a data frame. This is similar to unique.data.frame() but considerably faster.
```R
distinct(.data, ..., .keep_all = FALSE)
```

# Examples

In [17]:
players <- data.frame(
    name = c('VN Pikachu', 'VN Pikachu', 'VN Pikachu', 'VN Wanie', 'Bac giang vn'),
    clan = c('VNC', 'VN', 'VNC', 'VN', 'VNC'),
    level = c(31, 31, 31, 33, 34)
)

players

name,clan,level
VN Pikachu,VNC,31
VN Pikachu,VN,31
VN Pikachu,VNC,31
VN Wanie,VN,33
Bac giang vn,VNC,34


In [18]:
#select distinct row from `players`
distinct(players)
#equivalent
#players %>% distinct()

name,clan,level
VN Pikachu,VNC,31
VN Pikachu,VN,31
VN Wanie,VN,33
Bac giang vn,VNC,34


In [19]:
#Select distinct values from column `players$name`
distinct(players, name)
#equivalent
#players %>% distinct(name)

name
VN Pikachu
VN Wanie
Bac giang vn


In [20]:
#Select rows that have distinct values from column `players$name` 
distinct(players, name, .keep_all = T)

name,clan,level
VN Pikachu,VNC,31
VN Wanie,VN,33
Bac giang vn,VNC,34


You can also use `distinct` on computed variables

In [21]:
#Select distinct first 2 characters of column `players$name`, and name the column result as `signature`
distinct(players, signature = substr(name, 1, 2))

signature
VN
Ba


You `across` to access `select()` style

In [23]:
starwars %>% distinct(across(contains("color")))

hair_color,skin_color,eye_color
blond,fair,blue
,gold,yellow
,"white, blue",red
none,white,yellow
brown,light,brown
"brown, grey",light,blue
brown,light,blue
,"white, red",red
black,light,brown
"auburn, white",fair,blue-gray


In [26]:
# Grouping -------------------------------------------------
# The same behaviour applies for grouped data frames,
# except that the grouping variables are always included

players %>% group_by(clan) %>% distinct(name)

name,clan
VN Pikachu,VNC
VN Pikachu,VN
VN Wanie,VN
Bac giang vn,VNC


# Arguments

**`.data`**  	
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

**`...`**  	
<data-masking> Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables.

**`.keep_all`**  	
If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values.

Value