# 🐼 Pandas Handbook

## 07 - Data Combining - Concatenation, Merging & Joining

Check out the official [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/)  

This notebook uses the [Titanic - Machine Learning from Disaster dataset](https://www.kaggle.com/competitions/titanic/data) from Kaggle to demonstrate how to combine data with pandas.

## 📚 Table of Contents

---

🔀 **Concatenation**  
🔗 **Merging**  
🔄 **Join Types**  
🧩 **Joining**  
👉 **Next Topic: Data Analyzing**  

---

In [1]:
import pandas as pd
import os

In [2]:
data_processed = "../data/processed/"
csv_file = "clean_titanic.csv"
import_path = os.path.join(data_processed, csv_file)
df = pd.read_csv(import_path, index_col="PassengerId")

### 🔀 Concatenation

```pd.concat([df1, df2])``` – Concatenates two DataFrames vertically (row-wise).  
```pd.concat([df1, df2], axis=1)``` – Concatenates two DataFrames horizontally (column-wise).  

In [3]:
df1 = df.iloc[:450]
df2 = df.iloc[450:]

concat_df = pd.concat([df2, df1]) 
concat_df.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
451,0,2,"West, Mr. Edwy Arthur",male,36.0,1,2,C.A. 34651,27.75,Unknown,Southampton
452,0,3,"Hagland, Mr. Ingvald Olai Olsen",male,28.0,1,0,65303,19.9667,Unknown,Southampton
453,0,1,"Foreman, Mr. Benjamin Laventall",male,30.0,0,0,113051,27.75,C111,Cherbourg
454,1,1,"Goldenberg, Mr. Samuel L",male,49.0,1,0,17453,89.1042,C92,Cherbourg
455,0,3,"Peduzzi, Mr. Joseph",male,28.0,0,0,A/5 2817,8.05,Unknown,Southampton


In [4]:
names = df[['Name']]
ages = df[['Age']]
concat_df = pd.concat([names, ages], axis=1)
concat_df.head()

Unnamed: 0_level_0,Name,Age
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Braund, Mr. Owen Harris",22.0
2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
3,"Heikkinen, Miss. Laina",26.0
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
5,"Allen, Mr. William Henry",35.0


### 🔗 Merging

```pd.merge(df, new_df, on='INDEX_COLUMN')``` – Merges the original and new DataFrame on a shared index column.  

In [5]:
extra_info = pd.DataFrame(data={'Overpaid': df['Fare'] > 100}, index=df.index)
merged_df = pd.merge(df, extra_info, on='PassengerId')
merged_df.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Overpaid
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,Unknown,Southampton,False
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,Cherbourg,False
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,Unknown,Southampton,False
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,Southampton,False
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,Unknown,Southampton,False


### 🔄 Join Types: ```inner```, ```outer```, ```left```, ```right```

- `inner`: Only matching keys
- `outer`: All keys, fill missing with NaN
- `left`: All keys from left_df, matched data from right_df
- `right`: All keys from right_df, matched data from left_df

```pd.merge(left_df, right_df, on='INDEX_COLUMN', how='inner')``` – Performs an inner merge, keeping only matching index values.  
```pd.merge(left_df, right_df, on='INDEX_COLUMN', how='outer')``` – Performs an outer merge, keeping all index values from both DataFrames.  
```pd.merge(left_df, right_df, on='INDEX_COLUMN', how='left')``` – Performs a left join, keeping all index values from the left DataFrame.  
```pd.merge(left_df, right_df, on='INDEX_COLUMN', how='right')``` – Performs a right join, keeping all index values from the right DataFrame.  

In [6]:
left_df = df[['Name']].iloc[:5]
right_df = df[['Age']].iloc[3:8]

Returning a DataFrame with only matching IDs

In [7]:
inner_merge_df = pd.merge(left_df, right_df, on='PassengerId', how='inner')
inner_merge_df.head()

Unnamed: 0_level_0,Name,Age
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
5,"Allen, Mr. William Henry",35.0


Returning a DataFrame with all IDs

In [8]:
outer_merge_df = pd.merge(left_df, right_df, on='PassengerId', how='outer')
outer_merge_df.head()

Unnamed: 0_level_0,Name,Age
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Braund, Mr. Owen Harris",
2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",
3,"Heikkinen, Miss. Laina",
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
5,"Allen, Mr. William Henry",35.0


Returning a DataFrame with all IDs from the left

In [9]:
left_merge_df = pd.merge(left_df, right_df, on='PassengerId', how='left')
left_merge_df.head()

Unnamed: 0_level_0,Name,Age
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,"Braund, Mr. Owen Harris",
2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",
3,"Heikkinen, Miss. Laina",
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
5,"Allen, Mr. William Henry",35.0


Returning a DataFrame with all IDs from the right

In [10]:
right_merge_df = pd.merge(left_df, right_df, on='PassengerId', how='right')
right_merge_df

Unnamed: 0_level_0,Name,Age
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1
4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
5,"Allen, Mr. William Henry",35.0
6,,28.0
7,,54.0
8,,2.0


### 🧩 Joining

```df.join(df_new, on='INDEX_COLUMN')``` – Joins the new DataFrame to the original using their shared index.  

In [11]:
titles = df['Name'].str.extract(r'(Mr\.|Mrs\.|Miss\.|Lady\.)')
titles.columns = ['Title']
titles.index = df.index

joined_df = df.join(titles, on='PassengerId')
joined_df.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Title
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,Unknown,Southampton,Mr.
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,Cherbourg,Mrs.
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,Unknown,Southampton,Miss.
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,Southampton,Mrs.
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,Unknown,Southampton,Mr.


### 👉 Next Topic: [Data Analyzing](./08-data-analyzing.ipynb)

Learn how to analyze data with pandas by Filtering, Sorting, Grouping, Aggregating & Pivoting.