## Chapter 3: Merging Data
[View this lesson on datacamp](https://learn.datacamp.com/courses/merging-dataframes-with-pandas)

In [1]:
import pandas as pd

### pd.merge()
`pd.merge(df1, df2)` is another function that allows you to combine two (or more) DataFrames. 

Recall from the last chapter how we combined favourite colour and birth month data from ten people:

In [2]:
fav_colour = pd.read_csv('fav_colour.csv')
birthday_month = pd.read_csv('birthday_months.csv')

pd.concat([fav_colour, birthday_month], axis=1)

Unnamed: 0,Participant num,Fav Colour,Participant num.1,Birthday Month
0,1,blue,1,may
1,2,red,2,june
2,3,green,3,january
3,4,purple,4,february
4,5,red,5,september
5,6,green,6,july
6,7,orange,7,may
7,8,yellow,8,may
8,9,yellow,9,august
9,10,pink,10,december


Doing this with `pd.merge()` is a bit simpler, and has the advantage that it automatically recognizes the fact that both data files have a column called `participant_num()`, and merges these into a single column:

In [3]:
pd.merge(fav_colour, birthday_month)

Unnamed: 0,Participant num,Fav Colour,Birthday Month
0,1,blue,may
1,2,red,june
2,3,green,january
3,4,purple,february
4,5,red,september
5,6,green,july
6,7,orange,may
7,8,yellow,may
8,9,yellow,august
9,10,pink,december


This works smoothly when we have a common column in both DataFrames, with the same values in that column (in this case, `Participant num` with values 1–10). But what happens when we have non-overlapping data? Here we load eye colour data, which we have for only some of the people that we have favourite colours and birth months for, as well as for some new people:

In [4]:
eye_colour = pd.read_csv('eye_colour.csv')
eye_colour

Unnamed: 0,Participant num,eye_colour
0,1,brown
1,2,blue
2,3,blue
3,4,hazel
4,5,green
5,11,brown
6,12,brown
7,13,blue


When we merge `eye_colour` with one of the other DataFrames, we only get the data that overlaps between the two (based on shared `Participant num`s):

In [5]:
pd.merge(fav_colour, eye_colour)

Unnamed: 0,Participant num,Fav Colour,eye_colour
0,1,blue,brown
1,2,red,blue
2,3,green,blue
3,4,purple,hazel
4,5,red,green


In other words, `pd.merge()` uses an **inner join** by default. We can override this using the `how=` argument though:

In [6]:
pd.merge(fav_colour, eye_colour, how='outer')

Unnamed: 0,Participant num,Fav Colour,eye_colour
0,1,blue,brown
1,2,red,blue
2,3,green,blue
3,4,purple,hazel
4,5,red,green
5,6,green,
6,7,orange,
7,8,yellow,
8,9,yellow,
9,10,pink,
