___

<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>
___
<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Combining DataFrames

## Full Official Guide (Lots of examples!)

### https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

-------
-------

In [None]:
import numpy as np
import pandas as pd

## Concatenation

Directly  "glue" together dataframes.

In [None]:
data_one = {'A': ['A0', 'A1', 'A2', 'A3'],'B': ['B0', 'B1', 'B2', 'B3']}

In [None]:
data_two = {'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}

In [None]:
one = pd.DataFrame(data_one)

In [None]:
two = pd.DataFrame(data_two)

In [None]:
one

In [None]:
two

## Axis = 0 

### Concatenate along rows

In [None]:
axis0 = pd.concat([one,two],axis=0)

In [None]:
axis0

## Axis = 1

### Concatenate along columns

In [None]:
axis1 = pd.concat([one,two],axis=1)

In [None]:
axis1

### Axis 0 , but columns match up
**In case you wanted this:**

In [None]:
two.columns = one.columns

In [None]:
pd.concat([one,two])

# Merge

## Data Tables

In [None]:
registrations = pd.DataFrame({'reg_id':[1,2,3,4],'name':['Andrew','Bobo','Claire','David']})
logins = pd.DataFrame({'log_id':[1,2,3,4],'name':['Xavier','Andrew','Yolanda','Bobo']})

In [None]:
registrations

In [None]:
logins

# pd.merge()

Merge pandas DataFrames based on key columns, similar to a SQL join. Results based on the **how** parameter.

In [None]:
help(pd.merge)

-----

# Inner,Left, Right, and Outer Joins

## Inner Join

**Match up where the key is present in BOTH tables. There should be no NaNs due to the join, since by definition to be part of the Inner Join they need info in both tables.**
**Only Andrew and Bobo both registered and logged in.**

In [None]:
# Notice pd.merge doesn't take in a list like concat
pd.merge(registrations,logins,how='inner',on='name')

In [None]:
# Pandas smart enough to figure out key column (on parameter) if only one column name matches up
pd.merge(registrations,logins,how='inner')

In [None]:
# Pandas reports an error if "on" key column isn't in both dataframes
# pd.merge(registrations,logins,how='inner',on='reg_id')

---

## Left Join

**Match up AND include all rows from Left Table.**
**Show everyone who registered on Left Table, if they don't have login info, then fill with NaN.**

In [None]:
pd.merge(registrations,logins,how='left',on='name')

## Right Join
**Match up AND include all rows from Right Table.**
**Show everyone who logged in on the Right Table, if they don't have registration info, then fill with NaN.**

In [None]:
pd.merge(registrations,logins,how='right',on='name')

## Outer Join

**Match up on all info found in either Left or Right Table.**
**Show everyone that's in the Log in table and the registrations table. Fill any missing info with NaN**

In [None]:
pd.merge(registrations,logins,how='outer')

## Join on Index or Column

**Use combinations of left_on,right_on,left_index,right_index to merge a column or index on each other**

In [None]:
registrations

In [None]:
logins

In [None]:
registrations = registrations.set_index("name")

In [None]:
registrations

In [None]:
pd.merge(registrations,logins,left_index=True,right_on='name')

In [None]:
pd.merge(logins,registrations,right_index=True,left_on='name')

### Dealing with differing key column names in joined tables

In [None]:
registrations = registrations.reset_index()

In [None]:
registrations

In [None]:
logins

In [None]:
registrations.columns = ['reg_name','reg_id']

In [None]:
registrations

In [None]:
# ERROR
# pd.merge(registrations,logins)

In [None]:
pd.merge(registrations,logins,left_on='reg_name',right_on='name')

In [None]:
pd.merge(registrations,logins,left_on='reg_name',right_on='name').drop('reg_name',axis=1)

### Pandas automatically tags duplicate columns

In [None]:
registrations.columns = ['name','id']

In [None]:
logins.columns = ['id','name']

In [None]:
registrations

In [None]:
logins

In [None]:
# _x is for left
# _y is for right
pd.merge(registrations,logins,on='name')

In [None]:
pd.merge(registrations,logins,on='name',suffixes=('_reg','_log'))

-----------
----------