## DATA CONCATENATION AND MERGING
Think of it like joining tables in Excel or SQL, but pandas gives you two main tools depending on how the data is related.

In [9]:
import pandas as pd
import numpy as np

In [10]:
df_us = pd.DataFrame({
    "country": ["USA", "USA", "USA"],
    "city": ["New York", "Los Angeles", "Chicago"],
    "temperature_c": [28, 24, 20],
    "humidity_percent": [75, 40, 34],
    "wind_speed_kmh": [14, 16, 22]
}
)
df_us

Unnamed: 0,country,city,temperature_c,humidity_percent,wind_speed_kmh
0,USA,New York,28,75,14
1,USA,Los Angeles,24,40,16
2,USA,Chicago,20,34,22


In [11]:
df_aus = pd.DataFrame({
        "country": ["Australia", "Australia", "Australia"],
        'city': ["Sydney", "Melbourne", "Perth"],
        "temperature_c": [25, 22, 30],
        "humidity_percent": [65, 44, 50],
        "wind_speed_kmh": [20, 18, 39]
})
df_aus


Unnamed: 0,country,city,temperature_c,humidity_percent,wind_speed_kmh
0,Australia,Sydney,25,65,20
1,Australia,Melbourne,22,44,18
2,Australia,Perth,30,50,39


### Data Concatenation - `pd.concat()`
- Stacks datasets together
- Works when data has same structure
- Like adding rows or columns

**Real-world use case**
- Monthly sales files → combine into one yearly dataset
- Logs collected in parts → merge into one DataFrame

**Rules for concat:**
    1. ✔ Same columns
    2. ✔ Same structure
    3. ✔ Just append or stack data

- **Syntax:**
    1. Row-Wise concatenation:`pd.concat([df1,df2])`
    2. Column-wise concatenation:`pd.concat([df1, df2], axis=1)`
- **Key parameters**
    1. axis=0 → rows (default)
    2. axis=1 → columns
    3. ignore_index=True → tells pandas to reset the index in the result instead of keeping the old one.
    4. keys=['A','B'] → track data source

In [19]:
df = pd.concat([df_us,df_aus])
df

Unnamed: 0,country,city,temperature_c,humidity_percent,wind_speed_kmh
0,USA,New York,28,75,14
1,USA,Los Angeles,24,40,16
2,USA,Chicago,20,34,22
0,Australia,Sydney,25,65,20
1,Australia,Melbourne,22,44,18
2,Australia,Perth,30,50,39


`pd.concat([df1,df2], ignore_index=True)`
- Use `ignore_index=True` parameter when you don’t care about the original index and want a neat, continuous index.
- You can clearly see the difference with and without using this parameter.
- Before: 0,1,2,0,1,2
- After: 0,1,2,3,4,5

In [20]:
df = pd.concat([df_us,df_aus], ignore_index=True)
df

Unnamed: 0,country,city,temperature_c,humidity_percent,wind_speed_kmh
0,USA,New York,28,75,14
1,USA,Los Angeles,24,40,16
2,USA,Chicago,20,34,22
3,Australia,Sydney,25,65,20
4,Australia,Melbourne,22,44,18
5,Australia,Perth,30,50,39


`pd.concat([df1,df2], keys = 'for tracking data source')`
- Used to label and track the source of data when combining datasets.
- It adds an extra outer level to the index. Helps identify which data came from where.
- keys helps track the origin of combined data by adding labels to the index.
- It tags the data so you know whether a row belongs to dataset A or dataset B.
- Simply you can think of it as index and also origin of dataframe at the same time.

In [26]:
df = pd.concat([df_us,df_aus], keys= ['US dataframe','Australia dataframe'])
df

Unnamed: 0,Unnamed: 1,country,city,temperature_c,humidity_percent,wind_speed_kmh
US dataframe,0,USA,New York,28,75,14
US dataframe,1,USA,Los Angeles,24,40,16
US dataframe,2,USA,Chicago,20,34,22
Australia dataframe,0,Australia,Sydney,25,65,20
Australia dataframe,1,Australia,Melbourne,22,44,18
Australia dataframe,2,Australia,Perth,30,50,39
