<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 10px;">
# Joins & Pandas
***
Week 2 | Lesson 4.3

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Join data via concat
- Do left, right, inner, and outer joins

### INSTRUCTOR PREP
*Before this lesson, instructors will need to:*
- Read in / Review any dataset(s) & starter/solution code
- Generate a brief slide deck

### LESSON GUIDE
| TIMING  | TYPE  | TOPIC  |
|:-:|---|---|
| 5 min  | [Introduction](#introduction)   | Concatenate & Join  |
| 10 min  | [Demo / Guided Practice](#demo)  | Concatenate  |
| 25 min  | [Demo / Guided Practice](#demo)  | Left and right joins  |
| 25 min  | [Demo / Guided Practice](#demo)  | Outer and inner joins  |
| 20 min  | [Independent Practice](#ind-practice)  |   |
| 5 min  | [Conclusion](#conclusion)  |  |

---

<a name="Concatenate & Join"></a>
## Introduction: Concatenate & Joins (5 mins)

Concatenation is taking two or more dataframes, and combining them into a single **DataFrame**. A common programming task that is similliar in context is also **concatenation**:

> ```pythong
>    "I never put salt " + "in my eyes"
>
>    # Expected output:
>    "I never put salt in my eyes"
> ```

Joins using pandas happen when columns of two DataFrames are joined either on index
or on a key column. Here is a representation of left, right, inner, and outer joins
using Venn diagrams.

![](../assets/images/Joins.png)

**Check:** In pairs draw a left, right, inner, and outer join. Then, show the other person and explain it to them.

[Concatenation](http://whatis.techtarget.com/definition/concatenation-concatenate-concatenating)
[Venn of joins](http://www.bogotobogo.com/php/images/mysql-join/Joins.png)

<a name="Concatenate"></a>
## Demo / Guided Practice: Left joins (12 mins)

Let's look at a simple example of concatenation. No need to do this yourself,
just follow along.

> [demo code](w2-4.3-demo.ipynb)

In [19]:
import pandas as pd

In [20]:
# Data Frame 1
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])

In [21]:
# Data Frame 2
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7'],
                    'E': ['E4', 'E5', 'E6', 'E7']},
                    index=[4, 5, 6, 7])

In [22]:
pd.concat([df1, df2])

Unnamed: 0,A,B,C,D,E
0,A0,B0,C0,D0,
1,A1,B1,C1,D1,
2,A2,B2,C2,D2,
3,A3,B3,C3,D3,
4,A4,B4,C4,D4,E4
5,A5,B5,C5,D5,E5
6,A6,B6,C6,D6,E6
7,A7,B7,C7,D7,E7


In [23]:
frames = [df1, df2]
frames

[    A   B   C   D
 0  A0  B0  C0  D0
 1  A1  B1  C1  D1
 2  A2  B2  C2  D2
 3  A3  B3  C3  D3,     A   B   C   D   E
 4  A4  B4  C4  D4  E4
 5  A5  B5  C5  D5  E5
 6  A6  B6  C6  D6  E6
 7  A7  B7  C7  D7  E7]

## Concatenation of two DataFrames

In [24]:
combined_dataframe = pd.concat(frames)
combined_dataframe

Unnamed: 0,A,B,C,D,E
0,A0,B0,C0,D0,
1,A1,B1,C1,D1,
2,A2,B2,C2,D2,
3,A3,B3,C3,D3,
4,A4,B4,C4,D4,E4
5,A5,B5,C5,D5,E5
6,A6,B6,C6,D6,E6
7,A7,B7,C7,D7,E7


<a name="Left and right joins"></a>
## Demo / Guided Practice: Left joins (25 mins)

Let's create a small DataFrame to and try a left, right, inner, and outer join.

## First DataFrame

In [25]:
import pandas as pd
from IPython.display import display
from IPython.display import Image

raw_data = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_a

Unnamed: 0,subject_id,first_name,last_name
0,1,Alex,Anderson
1,2,Amy,Ackerman
2,3,Allen,Ali
3,4,Alice,Aoni
4,5,Ayoung,Atiches


## Second DataFrame

In [26]:
raw_data = {
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_b

Unnamed: 0,subject_id,first_name,last_name
0,4,Billy,Bonder
1,5,Brian,Black
2,6,Bran,Balwner
3,7,Bryce,Brice
4,8,Betty,Btisan


## 3rd DataFrame

In [27]:
raw_data = {
        'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
df_n = pd.DataFrame(raw_data, columns = ['subject_id','test_id'])
df_n

Unnamed: 0,subject_id,test_id
0,1,51
1,2,15
2,3,15
3,4,61
4,5,16
5,7,14
6,8,15
7,9,1
8,10,61
9,11,16


Now, we will merge with a left join produces a complete set of records from
df_a, with the matching records (where available) in df_b. If there is no
match, the right side will contain null.

In [28]:
pd.merge(df_a, df_b, on='subject_id', how='left')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,1,Alex,Anderson,,
1,2,Amy,Ackerman,,
2,3,Allen,Ali,,
3,4,Alice,Aoni,Billy,Bonder
4,5,Ayoung,Atiches,Brian,Black


**Check** What do you think a right join will result in?

Merge with a right join produces a complete set of records from
df_b, with the matching records (where available) in df_a. If there is no
match, the left side will contain **null**.



In [29]:
pd.merge(df_a, df_b, on='subject_id', how='right')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Alice,Aoni,Billy,Bonder
1,5,Ayoung,Atiches,Brian,Black
2,6,,,Bran,Balwner
3,7,,,Bryce,Brice
4,8,,,Betty,Btisan


Further reading [left and right join](http://chrisalbon.com/python/pandas_join_merge_dataframe.html)

<a name="Outer and inner joins"></a>
## Demo / Guided Practice: Outer and inner joins (25 mins)

An outer join produces the set of all records in df_a and df_b, with
matching records from both sides where available. If there is no match, the
missing side will contain null.


In [30]:
pd.merge(df_a, df_b, on='subject_id', how='outer')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,1,Alex,Anderson,,
1,2,Amy,Ackerman,,
2,3,Allen,Ali,,
3,4,Alice,Aoni,Billy,Bonder
4,5,Ayoung,Atiches,Brian,Black
5,6,,,Bran,Balwner
6,7,,,Bryce,Brice
7,8,,,Betty,Btisan


**Check** What do you think an inner join will produce?

An inner join produces only the set of records that match in both df_a and df_b.

In [31]:
pd.merge(df_a, df_b, on='subject_id', how='inner')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Alice,Aoni,Billy,Bonder
1,5,Ayoung,Atiches,Brian,Black


Pandas Docs: [outer and inner join](http://chrisalbon.com/python/pandas_join_merge_dataframe.html)

<a name="ind-practice"></a>
## Independent Practice: Topic (20 minutes)

Here are two simple DataFrames:


In [32]:
df1 = pd.DataFrame([[1, 2, 3]])
df2 = pd.DataFrame([[1, 7, 8],[4, 9, 9]], columns=[0, 3, 4])

In [33]:
df1

Unnamed: 0,0,1,2
0,1,2,3


In [34]:
df2

Unnamed: 0,0,3,4
0,1,7,8
1,4,9,9


Do a left, right, inner, and outer join.

**Bonus**  If you've used SQL before, joins are probably old hat to you. If so and you finish
early, how is your neighbor doing? Remember, you might be a whiz at joins, but
they might be a whiz at math.