
# **Concat**

###  `pd.concat()` ‚Äî **Stack or append data**

### üëâ Important Parameters

| Parameter      | Default   | Meaning                   |
| -------------- | --------- | ------------------------- |
| `objs`         | ‚Äî         | List of DataFrames        |
| `axis`         | `0`       | `0` ‚Üí rows, `1` ‚Üí columns |
| `ignore_index` | `False`   | Reset index               |
| `join`         | `'outer'` | `'outer'` or `'inner'`    |
| `keys`         | `None`    | Create hierarchical index |


In [2]:
import pandas as pd

In [3]:
ser1=pd.Series([1,2,3,4])
ser2=pd.Series([11,12,13,14])

In [4]:
pd.concat([ser1,ser2])

0     1
1     2
2     3
3     4
0    11
1    12
2    13
3    14
dtype: int64

In [5]:
df1=pd.DataFrame({'Name':['Haroon','Moiz','Moin','Sami'],'Dept':['CS','IT','SE','IT']})
df2=pd.DataFrame({ 'Name':['Ali','Jawad'],'Dept':['HR','FINANCE']})

In [6]:
df1=pd.DataFrame({ 'Name':['Ali','Jawad']})
df2=pd.DataFrame({'Name':['Haroon','Moiz','Moin','Sami'],'Dept':['CS','IT','SE','IT']})

pd.concat([df1,df2])
pd.concat([df1,df2],axis=1)


Unnamed: 0,Name,Name.1,Dept
0,Ali,Haroon,CS
1,Jawad,Moiz,IT
2,,Moin,SE
3,,Sami,IT


In [7]:
df1=pd.DataFrame({ 'B':[1,2,3,4,5]})
df2=pd.DataFrame({'A':[11,22,33,44],'C':[55,66,77,88]})
pd.concat([df1,df2])
pd.concat([df1,df2],axis=1)


Unnamed: 0,B,A,C
0,1,11.0,55.0
1,2,22.0,66.0
2,3,33.0,77.0
3,4,44.0,88.0
4,5,,


In [8]:
# Row Wise Concatenation
pd.concat([df1,df2])

Unnamed: 0,B,A,C
0,1.0,,
1,2.0,,
2,3.0,,
3,4.0,,
4,5.0,,
0,,11.0,55.0
1,,22.0,66.0
2,,33.0,77.0
3,,44.0,88.0


### `axis = 0/1` 

In [9]:
# Column Wise Concatenation
pd.concat([df1,df2],axis=1)

Unnamed: 0,B,A,C
0,1,11.0,55.0
1,2,22.0,66.0
2,3,33.0,77.0
3,4,44.0,88.0
4,5,,


### `join = 'inner/outer'`-- **Parameter**
**Outer : Union (All Rows)**

**Inner : Intersection (Overlapping/common Indexes only)**


In [10]:
pd.concat([df1,df2],axis=1,join='inner')

Unnamed: 0,B,A,C
0,1,11,55
1,2,22,66
2,3,33,77
3,4,44,88


In [11]:
pd.concat([df1,df2],axis=1,join='outer')


Unnamed: 0,B,A,C
0,1,11.0,55.0
1,2,22.0,66.0
2,3,33.0,77.0
3,4,44.0,88.0
4,5,,


### `key=["labels"]` --**Parameter, Creates hierarchical Labels**

In [12]:
pd.concat([df1,df2],axis=1,keys=['Df1','Df2'])


Unnamed: 0_level_0,Df1,Df2,Df2
Unnamed: 0_level_1,B,A,C
0,1,11.0,55.0
1,2,22.0,66.0
2,3,33.0,77.0
3,4,44.0,88.0
4,5,,


In [13]:
pd.concat([df1,df2],axis=0,keys=['Df1','Df2'])


Unnamed: 0,Unnamed: 1,B,A,C
Df1,0,1.0,,
Df1,1,2.0,,
Df1,2,3.0,,
Df1,3,4.0,,
Df1,4,5.0,,
Df2,0,,11.0,55.0
Df2,1,,22.0,66.0
Df2,2,,33.0,77.0
Df2,3,,44.0,88.0


---
# **Practice Problems**

In [14]:
jan = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Units': [100, 150]
})
feb = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Units': [120, 130]
})


### üß† Problem 1

Combine January and February sales into **one DataFrame**:

* Row-wise
* Reset index properly

In [15]:
pd.concat([jan,feb],axis=0,ignore_index=True)

Unnamed: 0,Product,Units
0,Pen,100
1,Book,150
2,Pen,120
3,Book,130


### üß† Problem 2

Add a column `Month` **before concatenation** so final data shows:

> Product | Units | Month


In [16]:
feb['Month']='Feb'
jan['Month']='Jan'
pd.concat([jan,feb],axis=0,ignore_index=True)

Unnamed: 0,Product,Units,Month
0,Pen,100,Jan
1,Book,150,Jan
2,Pen,120,Feb
3,Book,130,Feb


### üß† Problem 3

What happens if you concat `jan` and `feb` **column-wise**?
Try it and **explain the result**.


In [17]:
pd.concat([jan,feb],axis=1)

Unnamed: 0,Product,Units,Month,Product.1,Units.1,Month.1
0,Pen,100,Jan,Pen,120,Feb
1,Book,150,Jan,Book,130,Feb


**Columns are glued side-by-side**

---
# üî• **10 HARD & TRICKY CONCAT PROBLEMS**

In [18]:
# üìò Dataset 1 ‚Äî Schema Mismatch
df1 = pd.DataFrame({
    'ID': [1, 2],
    'Name': ['Ali', 'Sara'],
    'Salary': [50000, 60000]
})

df2 = pd.DataFrame({
    'ID': [3, 4],
    'Name': ['Ahmed', 'Zara']
})

### üß© Problem 1

Concat `df1` and `df2` row-wise.

* Observe missing values
* Explain **why NaN appears**


In [51]:
pd.concat([df1,df2],axis=0,ignore_index=True)

Unnamed: 0,ID,Name,Salary
0,1,Ali,50000.0
1,2,Sara,60000.0
2,3,Ahmed,
3,4,Zara,


**Ans: Becuase There is NO Salary column in df2**

In [20]:
# üìò Dataset 2 ‚Äî Duplicate Index Trap
df3 = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Units': [100, 150]
}, index=[0, 0])

### üß© Problem 2

Concat `df3` with itself.

* Observe index
* Fix index issue properly


In [21]:
pd.concat([df3,df3],ignore_index=True)

Unnamed: 0,Product,Units
0,Pen,100
1,Book,150
2,Pen,100
3,Book,150


In [22]:
# üìò Dataset 3 ‚Äî Column Order Confusion
df4 = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

df5 = pd.DataFrame({
    'B': [5, 6],
    'A': [7, 8]
})

### üß© Problem 3

Concat `df4` and `df5` row-wise.

* Does Pandas respect column order?
* Why?

In [59]:
pd.concat([df4,df5],ignore_index=True)

Unnamed: 0,A,B
0,1,3
1,2,4
2,7,5
3,8,6


**Ans: Yes, Pandas aligns by column name, NOT Postional stacking.**

In [24]:
# üìò Dataset 4 ‚Äî Outer vs Inner

df6 = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4]
})

df7 = pd.DataFrame({
    'B': [5, 6],
    'C': [7, 8]
})

### üß© Problem 4

Concat row-wise using:

1Ô∏è‚É£ default behavior

2Ô∏è‚É£ `join='inner'`

Explain difference.

In [62]:
pd.concat([df6,df7],ignore_index=True)
pd.concat([df6,df7],join='inner',ignore_index=True)


Unnamed: 0,B
0,3
1,4
2,5
3,6


**Ans: `join='inner'` concates only intersecting data**

In [26]:
#üìò Dataset 5 ‚Äî Hierarchical Index

q1 = pd.DataFrame({'Sales': [100, 200]}, index=['Jan', 'Feb'])
q2 = pd.DataFrame({'Sales': [150, 250]}, index=['Mar', 'Apr'])


### üß© Problem 5

Concat while keeping **month source info** in index.

(Hint: `keys=`)

In [27]:
pd.concat([q1,q2],keys=['q1','q2'])

Unnamed: 0,Unnamed: 1,Sales
q1,Jan,100
q1,Feb,200
q2,Mar,150
q2,Apr,250


In [28]:
# üìò Dataset 6 ‚Äî Column-wise Alignment Trap

left = pd.DataFrame({'A': [1, 2]}, index=[0, 1])
right = pd.DataFrame({'B': [3, 4]}, index=[1, 2])



### üß© Problem 6

Concat column-wise.

* Explain missing values
* Fix alignment issue


In [64]:
pd.concat([left.reset_index(drop=True),right.reset_index(drop=True)],axis=1)

Unnamed: 0,A,B
0,1,3
1,2,4


In [30]:
# üìò Dataset 7 ‚Äî Real-World File Combine
files = [jan, feb]

### üß© Problem 7

Simulate combining **multiple monthly files** safely.

* Avoid index problems
* Maintain row order


In [31]:
pd.concat(files,ignore_index=True)


Unnamed: 0,Product,Units,Month
0,Pen,100,Jan
1,Book,150,Jan
2,Pen,120,Feb
3,Book,130,Feb


In [32]:
# üìò Dataset 8 ‚Äî Performance Trap
dfs = [df1, df2, df4, df5]

### üß© Problem 8

Why is this BAD?

```python
result = pd.DataFrame()
for d in dfs:
    result = pd.concat([result, d])
```

Rewrite correctly.


In [71]:
result=pd.DataFrame()
for d in dfs:
    result=pd.concat([result,d ])
result

# Bad Time-complexity

Unnamed: 0,ID,Name,Salary,A,B
0,1.0,Ali,50000.0,,
1,2.0,Sara,60000.0,,
0,3.0,Ahmed,,,
1,4.0,Zara,,,
0,,,,1.0,3.0
1,,,,2.0,4.0
0,,,,7.0,5.0
1,,,,8.0,6.0


In [73]:
result1=pd.concat(dfs,ignore_index=True)
result1

Unnamed: 0,ID,Name,Salary,A,B
0,1.0,Ali,50000.0,,
1,2.0,Sara,60000.0,,
2,3.0,Ahmed,,,
3,4.0,Zara,,,
4,,,,1.0,3.0
5,,,,2.0,4.0
6,,,,7.0,5.0
7,,,,8.0,6.0


## üìò Dataset 9 ‚Äî Mixed Axis Confusion

---

### üß© Problem 9

Explain difference between:

```python
pd.concat([df4, df5], axis=0)
pd.concat([df4, df5], axis=1)
```

Without running code.

`axis=0` will concate row-wise

`axis=1` will glue side-by-side both dataframes

## üìò Dataset 10 ‚Äî Debugging Case

---

### üß© Problem 10

You expected **4 rows** but got **6 rows** after concat.

* What caused this?
* How do you fix it?


In [76]:
df_1=pd.DataFrame({'Name':['Haroon','Laiba','Ali']},index=[1,2,1])
df_2=pd.DataFrame({'Name':['Ali','Ali','Sara']},index=[1,1,2])
concated=pd.concat([df_1,df_2],ignore_index=True)
concated=concated.drop_duplicates().reset_index(drop=True)
concated

Unnamed: 0,Name
0,Haroon
1,Laiba
2,Ali
3,Sara
