## Keys
There are primary keys and secondary keys. Primary keys are variables that uniquely identify each row in a dataset. The secondary key is the key which you use to combine datasets. They need to have the same information than the primary keys.
<br>Sometimes there is not just one variable, than you use multi-variable keys.

## Joins


<b>mutating joins</b>
 * left_join 
 * right_join
 * inner_join 
 * full_join
 
<b>filtering joins</b>
 * semi-join - filter data from one data set based on another dataset
 * anti- join - show which row in the primary have no match in your secondary 

### set operation
`union()` function provides an easy way to combine two datasets without duplicating any values. 
<br>`intersect()` You can think of it as the set operator equivalent of a semi-join. It is what you would use if your datasets contain the exact same variables. 

In [10]:
# Return songs in definitive that are not in complete
definitive %>% 
  anti_join(complete)

Joining, by = c("song", "album")


song,album
Rock and Roll,The Song Remains the Same
Celebration Day,The Song Remains the Same
Black Dog,The Song Remains the Same
Over the Hills and Far Away,The Song Remains the Same
Misty Mountain Hop,The Song Remains the Same
Since I've Been Loving You,The Song Remains the Same
No Quarter,The Song Remains the Same
The Song Remains the Same,The Song Remains the Same
The Rain Song,The Song Remains the Same
The Ocean,The Song Remains the Same


In [11]:
# Return songs in complete that are not in definitive
complete %>% 
  anti_join(definitive)

Joining, by = c("song", "album")


song,album


In [2]:
comparison of two dataset 
setequal()
identical()

ERROR: Error in parse(text = x, srcfile = src): <text>:1:12: unerwartetes Symbol
1: comparison of
               ^


`definitive` and `complete` contain the songs that appear in competing Led Zeppelin anthologies: *The Definitive Collection* and *The Complete Studio Recordings*, respectively.

Both anthologies claim to contain the complete studio recordings of Led Zeppelin, but do the anthologies contain the same exact songs?

In [3]:
library(dplyr)
library(readr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



In [5]:
definitive <- read_csv("definitive.csv")
complete <- read_csv("complete.csv")

Parsed with column specification:
cols(
  song = col_character(),
  album = col_character()
)
Parsed with column specification:
cols(
  song = col_character(),
  album = col_character()
)


In [6]:
# Check if same order: definitive and complete
identical(definitive, complete)

In [7]:
# Check if any order: definitive and complete
setequal(definitive, complete)

FALSE: Different number of rows

In [8]:
# Songs in definitive but not complete
setdiff(definitive, complete)

song,album
Rock and Roll,The Song Remains the Same
Celebration Day,The Song Remains the Same
Black Dog,The Song Remains the Same
Over the Hills and Far Away,The Song Remains the Same
Misty Mountain Hop,The Song Remains the Same
Since I've Been Loving You,The Song Remains the Same
No Quarter,The Song Remains the Same
The Song Remains the Same,The Song Remains the Same
The Rain Song,The Song Remains the Same
The Ocean,The Song Remains the Same


In [9]:
# Songs in complete but not definitive
setdiff(complete, definitive)

song,album


# Binds
 * `bind_rows()`
 * `bind_cols()`
 
<br>faster than the base
<br>can handle lists of dataframe
<br>return tibble

In [63]:
library(lubridate)
side_one <- read_csv("side_one.csv", col_types = list(col_character(),
                                                      col_character()))
side_two <- read_csv("side_two.csv", col_types = list(col_character(),
                                                      col_character()))
side_one <- side_one %>% 
    mutate(length = parse_datetime(length, format = "M:%S"))

"5 parsing failures.
row # A tibble: 5 x 4 col     row   col expected       actual expected   <int> <int> <chr>          <chr>  actual 1     1    NA date like M:%S 1:30   row 2     2    NA date like M:%S 2:43   col 3     3    NA date like M:%S 3:30   expected 4     4    NA date like M:%S 6:53   actual 5     5    NA date like M:%S 4:15  
"

In [62]:
parse_datetime

In [53]:
glimpse(side_one)# Bind side_one and side_two into a single dataset

Observations: 5
Variables: 2
$ song   <chr> "Speak to Me", "Breathe", "On the Run", "Time", "The Great G...
$ length <S4: Period> 1M 30S, 2M 43S, 3M 30S, 6M 53S, 4M 15S


In [26]:
# Bind side_one and side_two into a single dataset
side_one %>% 
  bind_rows(side_two)

song,length
Speak to Me,01:30:00
Breathe,02:43:00
On the Run,03:30:00
Time,06:53:00
The Great Gig in the Sky,04:15:00
Money,06:30:00
Us and Them,07:51:00
Any Colour You Like,03:24:00
Brain Damage,03:50:00
Eclipse,02:03:00
