# Level 9: Merging & Reshaping Data

Combining data from different sources and reshaping it into a tidy format are essential data manipulation tasks. This level covers concatenation, database-style joins, and methods for pivoting data between wide and long formats.

In [1]:
import pandas as pd

## 9.1 Concatenation (`pd.concat()`)

Concatenation is used to stack multiple DataFrames either vertically (row-wise) or horizontally (column-wise).

In [2]:
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})
df3 = pd.DataFrame({'C': ['C0', 'C1'], 'D': ['D0', 'D1']})

### Vertical Stacking

In [3]:
pd.concat([df1, df2])

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
0,A2,B2
1,A3,B3


Notice the index is preserved. You can reset it if needed.

In [4]:
pd.concat([df1, df2], ignore_index=True)

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3


### Horizontal Stacking

In [5]:
pd.concat([df1, df3], axis=1)

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1


## 9.2 Merging & Joining (`pd.merge()`)

`pd.merge()` is used for combining data based on common columns or indices, similar to SQL joins.

In [6]:
left = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K3'], 'B': ['B0', 'B1', 'B3']})

### Inner Join (default)
Returns only the rows where the key exists in **both** DataFrames.

In [7]:
pd.merge(left, right, on='key', how='inner')

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1


### Left Join
Returns all rows from the **left** DataFrame, and matched rows from the right. Unmatched rows in the right get `NaN`.

In [8]:
pd.merge(left, right, on='key', how='left')

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1
2,K2,A2,


### Right Join
Returns all rows from the **right** DataFrame, and matched rows from the left. Unmatched rows in the left get `NaN`.

In [9]:
pd.merge(left, right, on='key', how='right')

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1
2,K3,,B3


### Outer Join
Returns all rows from **both** DataFrames. All unmatched rows get `NaN`.

In [10]:
pd.merge(left, right, on='key', how='outer')

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1
2,K2,A2,
3,K3,,B3


### Handling Suffixes
If both DataFrames have columns with the same name (that are not the join key), you can add suffixes.

In [11]:
left_s = pd.DataFrame({'key': ['K0'], 'data': [1]})
right_s = pd.DataFrame({'key': ['K0'], 'data': [2]})
pd.merge(left_s, right_s, on='key', suffixes=('_left', '_right'))

Unnamed: 0,key,data_left,data_right
0,K0,1,2


## 9.3 Reshaping Data

### `.melt()` (Wide to Long)
Unpivots a DataFrame from a wide format to a long format. This is useful for making data 'tidy'.

In [12]:
df_wide = pd.DataFrame({
    'student': ['Alice', 'Bob'],
    'test1': [85, 90],
    'test2': [88, 92]
})
df_wide

Unnamed: 0,student,test1,test2
0,Alice,85,88
1,Bob,90,92


In [13]:
pd.melt(df_wide, id_vars=['student'], value_vars=['test1', 'test2'], var_name='test', value_name='score')

Unnamed: 0,student,test,score
0,Alice,test1,85
1,Bob,test1,90
2,Alice,test2,88
3,Bob,test2,92


### `.pivot()` (Long to Wide)
Pivots a DataFrame from a long format to a wide format.

In [14]:
df_long = pd.DataFrame({
    'student': ['Alice', 'Alice', 'Bob', 'Bob'],
    'test': ['test1', 'test2', 'test1', 'test2'],
    'score': [85, 88, 90, 92]
})
df_long

Unnamed: 0,student,test,score
0,Alice,test1,85
1,Alice,test2,88
2,Bob,test1,90
3,Bob,test2,92


In [15]:
df_long.pivot(index='student', columns='test', values='score')

test,test1,test2
student,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,85,88
Bob,90,92


### `.stack()` / `.unstack()` (Recap)
These are similar to pivot/melt but work with the DataFrame's index levels.

In [16]:
df_pivoted = df_long.pivot(index='student', columns='test', values='score')
df_pivoted

test,test1,test2
student,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,85,88
Bob,90,92


In [17]:
# Stack the 'test' columns back into the index
df_pivoted.stack()

student  test 
Alice    test1    85
         test2    88
Bob      test1    90
         test2    92
dtype: int64