In [10]:
library(tidyverse)

# Mutating joins

The mutating joins add columns from y to x, matching rows based on the keys:

* `inner_join()`: includes all rows in x and y.

* `left_join()`: includes all rows in x.

* `right_join()`: includes all rows in y.

* `full_join()`: includes all rows in x or y.

```R
inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

# S3 method for data.frame
inner_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  na_matches = c("na", "never")
)

left_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE
)

# S3 method for data.frame
left_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE,
  na_matches = c("na", "never")
)

right_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE
)

# S3 method for data.frame
right_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE,
  na_matches = c("na", "never")
)

full_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE
)

# S3 method for data.frame
full_join(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE,
  na_matches = c("na", "never")
)
```

# Examples

In [24]:
band_members

name,band
Mick,Stones
John,Beatles
Paul,Beatles


In [25]:
band_instruments

name,plays
John,guitar
Paul,bass
Keith,guitar


In [26]:
band_instruments2

artist,plays
John,guitar
Paul,bass
Keith,guitar


In [2]:
setwd('C:/Users/dell/PycharmProjects/MachineLearning/Pandas/datasets')
getwd()

In [4]:
abb <- read.csv('./state-abbrevs.csv')
head(abb)

state,abbreviation
Alabama,AL
Alaska,AK
Arizona,AZ
Arkansas,AR
California,CA
Colorado,CO


In [6]:
areas <- read.csv('./state-areas.csv')
head(areas)

state,area..sq..mi.
Alabama,52423
Alaska,656425
Arizona,114006
Arkansas,53182
California,163707
Colorado,104100


In [7]:
pop <- read.csv('./state-population.csv') 
head(pop)

state.region,ages,year,population
AL,under18,2012,1117489
AL,total,2012,4817528
AL,under18,2010,1130966
AL,total,2010,4785570
AL,under18,2011,1125763
AL,total,2011,4801627


In [12]:
#inner join `abb` and `areas` by state

inner_join(abb, areas, by = 'state') %>% head()

state,abbreviation,area..sq..mi.
Alabama,AL,52423
Alaska,AK,656425
Arizona,AZ,114006
Arkansas,AR,53182
California,CA,163707
Colorado,CO,104100


To join by different variables on x and y, use a named vector. For example, `by = c("a" = "b")` will match `x$a` to `y$b`

In [21]:
#inner join by `abb$abbreviation` and `pop$state.region`
inner_join(abb, pop, by = c('abbreviation' = 'state.region')) %>% head()

state,abbreviation,ages,year,population
Alabama,AL,under18,2012,1117489
Alabama,AL,total,2012,4817528
Alabama,AL,under18,2010,1130966
Alabama,AL,total,2010,4785570
Alabama,AL,under18,2011,1125763
Alabama,AL,total,2011,4801627


To join by multiple variables, use a vector with length > 1. For example, `by = c("a", "b")` will match `x$a` to `y$a` and `x$b` to `y$b`.

Use a named vector to match different variables in x and y. For example, `by = c("a" = "b", "c" = "d")` will match `x$a` to `y$b` and `x$c` to `y$d`.

In [32]:
# `keep = TRUE` to keep the join keys from both `x` and `y`
band_members %>% full_join(band_instruments2, by = c('name' = 'artist'), keep = T)

name,band,artist,plays
Mick,Stones,,
John,Beatles,John,guitar
Paul,Beatles,Paul,bass
,,Keith,guitar


In [33]:
# If a row in `x` matches multiple rows in `y`, all the rows in `y` will be
# returned once for each matching row in `x`
df1 <- tibble(x = 1:3)
df2 <- tibble(x = c(1, 1, 2), y = c("first", "second", "third"))
df1 %>% left_join(df2)

Joining, by = "x"


x,y
1,first
1,second
2,third
3,


In [36]:
# By default, NAs match other NAs so that there are two
# rows in the output of this join:
df1 <- data.frame(x = c(1, NA), y = 2)
df2 <- data.frame(x = c(1, NA), z = 3)
left_join(df1, df2)

Joining, by = "x"


x,y,z
1.0,2,3
,2,3


In [37]:
# You can optionally request that NAs don't match, giving a
# a result that more closely resembles SQL joins
left_join(df1, df2, na_matches = "never")

Joining, by = "x"


x,y,z
1.0,2,3.0
,2,
