# Merging/Joining: 

Pandas provides a single function, __merge__, as the entry point for all standard database join operations between DataFrame objects −<br>

Ex:
__pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)__

Here, we have used the following parameters −<br>
- left − A DataFrame object.
- right − Another DataFrame object.
- on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
- left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
- right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
- left_index − If True, use the index from the left DataFrame as its join key(s). 
- right_index − Same usage as left_index for the right DataFrame.
- how − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. 
- sort − Sort the result DataFrame by the join keys in lexicographical order. 


In [None]:
import pandas as pd

# two sample dataframes
left = pd.DataFrame({ 'roll no':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'subject':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({ 'roll no':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'subject':['sub2','sub4','sub3','sub6','sub5']})
print ('left:\n', left)
print ('right:\n', right)

In [None]:
# merge on variable 'roll no' 
m1 = pd.merge(left, right, on='roll no')
print (m1)

## Merge Using 'how' Argument:
The how argument to merge specifies how to determine which keys are to be included in the resulting table.<br>
If a key combination does not appear in either the left or the right tables, the values in the joined table will be NA.<br>
Here is a summary of the how options and their SQL equivalent names:

1. left	(LEFT OUTER JOIN)- Use keys from left object
2. right (RIGHT OUTER JOIN)- Use keys from right object
3. outer (FULL OUTER JOIN)-	Use union of keys
4. inner (INNER JOIN)-	Use intersection of keys

In [None]:
# left join (using a different variable - subject)
m2 = pd.merge(left, right, on='subject', how='left')
print (m2)

In [None]:
# right join on variable subject
m3 = pd.merge(left, right, on='subject', how='right')
print (m3)

In [None]:
# outer join
m4 = pd.merge(left, right, how='outer', on='subject')
print (m4)

In [None]:
# inner join (this is the default)
m5 = pd.merge(left, right, on='subject', how='inner')
print (m5)

# `concat` function for concatenation 

Pandas provides various facilities for easily __combining together Series, DataFrame, and Panel objects.__<br>

 __pd.concat(objs,axis=0,join='outer',join_axes=None,ignore_index=False)__

- objs − This is a sequence or mapping of Series, DataFrame, or Panel objects.
- axis − {0, 1, ...}, default 0. This is the axis to concatenate along.
- join − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection.
- ignore_index − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0,..,n - 1.
- join_axes − This is the list of Index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic.


> Note: There is also the `append` function which is similar but deprecated

In [None]:
# sample dataframes
one = pd.DataFrame({'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject':['sub1','sub2','sub4','sub6','sub5'], 'Marks':[98,90,87,69,78]}, index=[1,2,3,4,5])
two = pd.DataFrame({'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject':['sub2','sub4','sub3','sub6','sub5'],'Marks':[89,80,79,97,88]},index=[1,2,3,4,5])
print('one', one)
print('two', two)

In [None]:
c1 = pd.concat([one, two])
print (c1)

In [None]:
c1.index

In [None]:
# bring in an extra key (dataframe becomes multi-level index)
c2 = pd.concat([one,two],keys=['x','y'])
print(c2)

In [None]:
print('index\n:', c2.index)

### Axis=1

Axis=1 will concatenate 'side-by-side' (add columns)

In [None]:
# axis=1 works on the columns
# note how the column names are not unique
c3 = pd.concat([one,two],axis=1)
print (c3)

### good overview

Check out the first answer at https://stackoverflow.com/questions/49620538/what-are-the-levels-keys-and-names-arguments-for-in-pandas-concat-functio
    