# **Merging, Concatening  and Joining**

In [1]:
import numpy as np
import pandas as pd

In [2]:
df_products = pd.DataFrame({
    'product_id': [1, 2, 3, 4],
    'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor']
})

In [3]:
df_sales = pd.DataFrame({
    'sale_id': [101, 102, 103, 104],
    'product_id': [2, 4, 1, 3],
    'quantity': [5, 2, 8, 3]
})

In [4]:
print("Product dataframe")
df_products

Product dataframe


Unnamed: 0,product_id,product_name
0,1,Laptop
1,2,Mouse
2,3,Keyboard
3,4,Monitor


In [5]:
print("Sales dataframe")
df_sales

Sales dataframe


Unnamed: 0,sale_id,product_id,quantity
0,101,2,5
1,102,4,2
2,103,1,8
3,104,3,3


## **Concatenation**

Use pd.concat() when you want to stack DataFrames. You can stack them vertically (adding rows) or horizontally (adding columns).

Vertical Concatenation: This is the default. It's useful for combining data from different time periods or sources with the same columns.

In [7]:
df_sales_2 = pd.DataFrame({
    'sale_id': [105, 106],
    'product_id': [1, 4],
    'quantity': [10, 6]
})

# vertical concatenation
combined_data = pd.concat([df_sales, df_sales_2])
combined_data

Unnamed: 0,sale_id,product_id,quantity
0,101,2,5
1,102,4,2
2,103,1,8
3,104,3,3
0,105,1,10
1,106,4,6


In [9]:
# horizontal concatenation
combined_data = pd.concat([df_products, df_sales], axis=1)
combined_data

Unnamed: 0,product_id,product_name,sale_id,product_id.1,quantity
0,1,Laptop,101,2,5
1,2,Mouse,102,4,2
2,3,Keyboard,103,1,8
3,4,Monitor,104,3,3


## **Merging**
Use pd.merge() to combine DataFrames based on a common column. This is the most flexible way to perform joins.

The on argument: This is the key column that links the DataFrames. Here, it's product_id.

* Works like JOINS in SQL
* Flexible

### **Types of Merging DataFrames**

* **`inner` (default):** Keeps only the rows with matching keys in both DataFrames.
* **`outer`:** Keeps all rows from both DataFrames, filling NaN for non-matches.
* **`left`:** Keeps all rows from the left DataFrame and finds matches in the right.
* **`right`:** Keeps all rows from the right DataFrame and finds matches in the left.

In [10]:
merged_data = pd.merge(df_sales, df_products, on="product_id", how="left")
merged_data

Unnamed: 0,sale_id,product_id,quantity,product_name
0,101,2,5,Mouse
1,102,4,2,Monitor
2,103,1,8,Laptop
3,104,3,3,Keyboard


In [11]:
merged_data = pd.merge(df_sales, df_products, on="product_id", how="right")
merged_data

Unnamed: 0,sale_id,product_id,quantity,product_name
0,103,1,8,Laptop
1,101,2,5,Mouse
2,104,3,3,Keyboard
3,102,4,2,Monitor


##  **Joining**

The .join() method is similar to merge(), but it's a DataFrame method and defaults to joining on the index.

In [12]:
df_sales_indexed = df_sales.set_index("product_id")
df_products_indexed = df_products.set_index("product_id")

joined_data = df_sales_indexed.join(df_products_indexed)
joined_data

Unnamed: 0_level_0,sale_id,quantity,product_name
product_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,101,5,Mouse
4,102,2,Monitor
1,103,8,Laptop
3,104,3,Keyboard
