# **Melt Function**

* Reshapes data from **wide format** to **long format**
* Turns multiple columns into two columns: variable + value
* Great for plotting and statistical analysis
* The melt function is the "Un-Pivoter". It changes your data from **Wide Format (good for humans)** to **Long Format (good for machines/plotting)**.

In [None]:
import  pandas as pd

In [None]:
df=pd.DataFrame({"Days":[1,2,3,4,5,6],"Eng":[12,34,23,53,64,63],"Maths":[33,52,51,20,83,24],"Science":[53,45,24,64,22,74]})
df

## `melt()`

**Basic Syntax :**
```python
pd.melt(
    df,
    id_vars=['id_column1', 'id_column2'],   # columns to keep fixed
    value_vars=['col1','col2'],             # columns to unpivot
    var_name='Variable',                    # name for new column
    value_name='Value'                      # name for values
)
```


| Parameter      | Type         | Default      | What it Does                                     | When to Use                                 |
| -------------- | ------------ | ------------ | ------------------------------------------------ | ------------------------------------------- |
| `frame`        | DataFrame    | ‚Äî            | The DataFrame to reshape                         | Always required                             |
| `id_vars`      | list / tuple | `None`       | Columns to **keep fixed** (identifier columns)   | IDs like `UserID`, `Date`, `Product`        |
| `value_vars`   | list / tuple | `None`       | Columns to **unpivot**                           | When you want to melt only selected columns |
| `var_name`     | str          | `'variable'` | Name of the column storing original column names | Use meaningful names like `"Month"`         |
| `value_name`   | str          | `'value'`    | Name of the column storing values                | Rename to `"Sales"`, `"Revenue"`            |
| `col_level`    | int / str    | `None`       | Which level to melt when columns are MultiIndex  | Advanced / hierarchical columns             |
| `ignore_index` | bool         | `True`       | Reset index after melting                        | Set `False` to keep original index          |


In [None]:
pd.melt(df)
pd.melt(df,id_vars=['Days'])
pd.melt(df,id_vars=['Days'],value_vars=['Maths','Eng'])
pd.melt(df,id_vars=['Days'],var_name='Python',value_name='Java')
pd.melt(df,id_vars=['Days'],value_vars=['Maths','Eng'],var_name='Python',value_name='Java')


---
# **Practice Problems**

In [None]:
sales = pd.DataFrame({
    'Product': ['Pen', 'Book', 'Bag'],
    'Jan': [100, 150, 80],
    'Feb': [120, 130, 90],
    'Mar': [110, 140, 100]
})

### üß† Problem 11

Convert `sales` into **long format** with columns:

> Product | Month | Units


In [None]:
pd.melt(sales,id_vars=['Product'],var_name='Months',value_name='Units')

### üß† Problem 12

Rename the melted columns to:

* `Month`
* `Units_Sold`


In [None]:
renamed_df=pd.melt(sales,id_vars=['Product'],var_name='Month',value_name='Units_Sold')
renamed_df

### üß† Problem 13

Filter melted data to show **only February sales**.



In [None]:
pd.melt(sales,id_vars=['Product'],value_vars=['Feb'],var_name='Month',value_name='Units')

---
# **Hard Practice Problems**

## üîπ Problem 1: Selective Melt with Extra Columns

```python
df = pd.DataFrame({
    'OrderID': [101, 102],
    'Customer': ['Ali', 'Sara'],
    'Jan_Sales': [200, 300],
    'Feb_Sales': [250, 350],
    'Region': ['North', 'South']
})
```

üëâ Melt **only sales columns**, keep `OrderID`, `Customer`, and `Region`.

Expected output columns:

```
OrderID | Customer | Region | Month | Sales
```

In [10]:
df = pd.DataFrame({
    'OrderID': [101, 102],
    'Customer': ['Ali', 'Sara'],
    'Jan_Sales': [200, 300],
    'Feb_Sales': [250, 350],
    'Region': ['North', 'South']
})

pd.melt(df,id_vars=['OrderID','Customer','Region'],var_name='Month',value_name='Sales')

Unnamed: 0,OrderID,Customer,Region,Month,Sales
0,101,Ali,North,Jan_Sales,200
1,102,Sara,South,Jan_Sales,300
2,101,Ali,North,Feb_Sales,250
3,102,Sara,South,Feb_Sales,350



## üîπ Problem 2: Extract Meaning from Column Names

```python
df = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Sales_2023_Jan': [100, 200],
    'Sales_2023_Feb': [150, 250]
})
```

üëâ Melt and then **extract Month** from column names.

Expected:

```
Product | Month | Sales
```

‚ö†Ô∏è Hint: `str.split()` or regex **after melt**


In [103]:
df = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Sales_2023_Jan': [100, 200],
    'Sales_2023_Feb': [150, 250]
})
melted=pd.melt(df,id_vars=['Product'],var_name='Month',value_name='Sales')
melted['Month']=melted['Month'].str.split('_').str[-1]
melted

Unnamed: 0,Product,Month,Sales
0,Pen,Jan,100
1,Book,Jan,200
2,Pen,Feb,150
3,Book,Feb,250


## üîπ Problem 3: Ignore Index = False (Tricky)

```python
df = pd.DataFrame({
    'ID': [1, 2],
    'A': [10, 20],
    'B': [30, 40]
})
```

üëâ Melt while **preserving the original index**.


In [110]:
df = pd.DataFrame({
    'ID': [1, 2],
    'A': [10, 20],
    'B': [30, 40]
})
pd.melt(df,ignore_index=False)

Unnamed: 0,variable,value
0,ID,1
1,ID,2
0,A,10
1,A,20
0,B,30
1,B,40



## üîπ Problem 4: Partial `value_vars`

```python
df = pd.DataFrame({
    'EmpID': [1, 2],
    'Name': ['Ali', 'Sara'],
    'Basic': [50000, 60000],
    'Bonus': [5000, 6000],
    'Tax': [8000, 9000]
})
```

üëâ Melt **only `Basic` and `Bonus`**, keep others unchanged.


In [50]:
df = pd.DataFrame({
    'EmpID': [1, 2],
    'Name': ['Ali', 'Sara'],
    'Basic': [50000, 60000],
    'Bonus': [5000, 6000],
    'Tax': [8000, 9000]
})
pd.melt(df,id_vars=['EmpID','Name','Tax'],var_name='Type',value_name='Amount',ignore_index=False)

Unnamed: 0,EmpID,Name,Tax,Type,Amount
0,1,Ali,8000,Basic,50000
1,2,Sara,9000,Basic,60000
0,1,Ali,8000,Bonus,5000
1,2,Sara,9000,Bonus,6000



## üîπ Problem 5: Multi-ID Melt (Common Mistake)

```python
df = pd.DataFrame({
    'Country': ['PK', 'PK', 'IN'],
    'Year': [2023, 2024, 2023],
    'Q1': [100, 120, 90],
    'Q2': [110, 130, 95]
})
```

üëâ Melt with **both `Country` and `Year` as identifiers**.


In [112]:
df = pd.DataFrame({
    'Country': ['PK', 'PK', 'IN'],
    'Year': [2023, 2024, 2023],
    'Q1': [100, 120, 90],
    'Q2': [110, 130, 95]
})
pd.melt(df,id_vars=['Country','Year'],var_name='Quarter',value_name='Revenue',ignore_index=False)


Unnamed: 0,Country,Year,Quarter,Revenue
0,PK,2023,Q1,100
1,PK,2024,Q1,120
2,IN,2023,Q1,90
0,PK,2023,Q2,110
1,PK,2024,Q2,130
2,IN,2023,Q2,95



## üîπ Problem 6: Wide ‚Üí Long ‚Üí Filter

```python
df = pd.DataFrame({
    'Student': ['A', 'B'],
    'Math': [80, 60],
    'Physics': [90, 70],
    'Chemistry': [85, 75]
})
```

üëâ Melt, then **keep only rows where score ‚â• 80**.


In [68]:
df = pd.DataFrame({
    'Student': ['A', 'B'],
    'Math': [80, 60],
    'Physics': [90, 70],
    'Chemistry': [85, 75]
})
melted=pd.melt(df,id_vars=['Student'],var_name='Subject',value_name='Score')
melted[melted['Score']>=80]

Unnamed: 0,Student,Subject,Score
0,A,Math,80
2,A,Physics,90
4,A,Chemistry,85



## üîπ Problem 7: Duplicate Column Names (Edge Case)

```python
df = pd.DataFrame(
    [[10, 20], [30, 40]],
    columns=['Score', 'Score']
)
```

üëâ Melt this DataFrame safely.

‚ö†Ô∏è Why tricky? Column names are duplicated.


In [114]:
df = pd.DataFrame(
    [[10, 20], [30, 40]],
    columns=['Score', 'Score']
)
df.columns=['Score_1','Score_2']
pd.melt(df,var_name='Score',ignore_index=False)

Unnamed: 0,Score,value
0,Score_1,10
1,Score_1,30
0,Score_2,20
1,Score_2,40



## üîπ Problem 8: Melt + Sorting Logic

```python
df = pd.DataFrame({
    'City': ['Karachi', 'Lahore'],
    'Temp_Mon': [30, 25],
    'Temp_Tue': [32, 26]
})
```

üëâ Melt and **sort by temperature descending**.


In [79]:
df = pd.DataFrame({
    'City': ['Karachi', 'Lahore'],
    'Temp_Mon': [30, 25],
    'Temp_Tue': [32, 26]
})
pd.melt(df,id_vars=['City'],ignore_index=False).sort_values(by='value',ascending=False)

Unnamed: 0,City,variable,value
0,Karachi,Temp_Tue,32
0,Karachi,Temp_Mon,30
1,Lahore,Temp_Tue,26
1,Lahore,Temp_Mon,25


## üîπ Problem 9: Melt with Missing Values

```python
df = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Jan': [10, None],
    'Feb': [None, 20]
})
```

üëâ Melt and **remove missing values after melting**.


In [87]:
df = pd.DataFrame({
    'Product': ['Pen', 'Book'],
    'Jan': [10, None],
    'Feb': [None, 20]
})
melted=pd.melt(df)
melted.dropna()


Unnamed: 0,variable,value
0,Product,Pen
1,Product,Book
2,Jan,10.0
5,Feb,20.0


## üîπ Problem 10: Advanced ‚Äì MultiIndex Columns

```python
arrays = [
    ['Sales', 'Sales', 'Profit', 'Profit'],
    ['Jan', 'Feb', 'Jan', 'Feb']
]
cols = pd.MultiIndex.from_arrays(arrays)

df = pd.DataFrame(
    [[100, 120, 20, 30]],
    columns=cols
)
```

üëâ Melt **only the `Sales` level**, keeping month info.

‚ö†Ô∏è Uses `col_level`


In [119]:
arrays = [
    ['Sales', 'Sales', 'Profit', 'Profit'],
    ['Jan', 'Feb', 'Jan', 'Feb']
]
cols = pd.MultiIndex.from_arrays(arrays)

df = pd.DataFrame(
    [[100, 120, 20, 30]],
    columns=cols
)
df

Unnamed: 0_level_0,Sales,Sales,Profit,Profit
Unnamed: 0_level_1,Jan,Feb,Jan,Feb
0,100,120,20,30


In [127]:
melted=df['Sales'].melt(var_name='Month',value_name='Sales')
melted

Unnamed: 0,Month,Sales
0,Jan,100
1,Feb,120
