### SESSION 20 - MERGING, JOINING & CONCATENATION

In [22]:
import pandas as pd
import numpy as np

In [23]:
# Datasets
courses = pd.read_csv('DATASETS/S20/courses.csv')
students = pd.read_csv('DATASETS/S20/students.csv')
nov = pd.read_csv('DATASETS/S20/reg-month1.csv')
dec = pd.read_csv('DATASETS/S20/reg-month2.csv')
matches = pd.read_csv('DATASETS/S20/matches.csv')
delivery = pd.read_csv('DATASETS/S20/deliveries.csv')

**pd.concat():**
- Pandas concat() method is used to concatenate pandas objects such as DataFrames and Series. 
- We can pass various parameters to change the behavior of the concatenation operation.
- **Syntax: concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)**
    - **objs:** Series or DataFrame objects
    - **axis:** axis to concatenate along; default = 0
    - **join:** way to handle indexes on other axis; default = ‘outer’
    - **ignore_index:** if True, do not use the index values along the concatenation axis; default = False
    - **keys:** sequence to add an identifier to the result indexes; default = None
    - **levels:** specific levels (unique values) to use for constructing a MultiIndex; default = None
    - **names:** names for the levels in the resulting hierarchical index; default = None
    - **verify_integrity:** check whether the new concatenated axis contains duplicates; default = False
    - **sort:** sort non-concatenation axis if it is not already aligned when join is ‘outer’; default = False
    - **copy:** if False, do not copy data unnecessarily; default = True

In [40]:
# concat the columns vertically(default)
registered = pd.concat([nov,dec], ignore_index=True)
registered

Unnamed: 0,student_id,course_id
0,23,1
1,15,5
2,18,6
3,23,4
4,16,9
5,18,1
6,1,1
7,7,8
8,22,3
9,15,1


In [44]:
import pandas as pd 

d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Name": "David", "ID": 3}

df1 = pd.DataFrame(d1, index=[1, 2])
df2 = pd.DataFrame(d2, index=[3])

df3 = pd.concat([df1, df2])

print(df3)

     Name  ID
1  Pankaj   1
2    Lisa   2
3   David   3


**pd.append() function**

**Note : Don't use ! this function is deprecated and will be removed from pandas in a future version.**

- Pandas dataframe.append() function is used to append rows of other data frames to the end of the given data frame, returning a new data frame object. 

- Columns not in the original data frames are added as new columns and the new cells are populated with NaN value.

- **Syntax: DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)**

    - **other:** DataFrame or Series/dict-like object, or list of these The data to append. 
    - **ignore_index:** If True, do not use the index labels. 
    - **verify_integrity:** If True, raise ValueError on creating an index with duplicates. 
    - **sortPandas:** default False, Sort columns if the columns of self and other are not aligned.
 

    

In [46]:
# example
nov.append(dec, ignore_index=True)

  nov.append(dec, ignore_index=True)


Unnamed: 0,student_id,course_id
0,23,1
1,15,5
2,18,6
3,23,4
4,16,9
5,18,1
6,1,1
7,7,8
8,22,3
9,15,1


In [52]:
# Multiindex dataframe (keep original index as it is)
multi = pd.concat([nov, dec], keys=['Nov', 'Dec'])

# accessing each months
multi.loc['Nov']
multi.loc['Dec']

# accessing the items
multi.loc[('Nov',0)]

student_id    23
course_id      1
Name: (Nov, 0), dtype: int64

In [53]:
# concat dataframe horizontally
pd.concat([nov,dec], axis=1)

Unnamed: 0,student_id,course_id,student_id.1,course_id.1
0,23.0,1.0,3,5
1,15.0,5.0,16,7
2,18.0,6.0,12,10
3,23.0,4.0,12,1
4,16.0,9.0,14,9
5,18.0,1.0,7,7
6,1.0,1.0,7,2
7,7.0,8.0,16,3
8,22.0,3.0,17,10
9,15.0,1.0,11,8
