## Pandas `melt`

In many data analysis projects, you may find that your data is in a *wide format*—where measurements across different columns represent similar types of information. The **`pandas.melt`** function helps you convert this data into a *long (tidy) format*, where one or more columns store variable names and values. This is especially useful when preparing data for visualization or further analysis.


### Key Concepts and Parameters

- **`id_vars`**: Columns that you want to keep as identifiers and not unpivot.
- **`value_vars`**: Columns that you want to “unpivot” or combine into a single column.
- **`var_name`**: Name of the new column that will store the former column names (optional).
- **`value_name`**: Name of the new column that will hold the corresponding values (optional).

The basic syntax is:
```python
pd.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name="value", col_level=None, ignore_index=True)
```
In this syntax, `frame` is your DataFrame, and you can provide the other parameters as needed to control the melt behavior.

## A Basic Example

Let’s start with a simple example. Imagine you have a DataFrame of student scores in several subjects:

In [1]:
import pandas as pd

In [2]:
# Create a simple DataFrame in wide format
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Maths': [90, 80, 70],
    'English': [85, 75, 95],
    'Science': [88, 82, 89]
})

In [3]:
df

Unnamed: 0,Name,Maths,English,Science
0,Alice,90,85,88
1,Bob,80,75,82
2,Charlie,70,95,89


In [4]:
# Melt the DataFrame to convert subject columns into rows
melted_df = pd.melt(
    df,
    id_vars=['Name'],  # Keep the 'Name' column as is
    value_vars=['Maths', 'English', 'Science'],  # Columns to melt
    var_name='Subject',  # New column to store the subject names
    value_name='Score'  # New column to store the scores
)

In [5]:
melted_df

Unnamed: 0,Name,Subject,Score
0,Alice,Maths,90
1,Bob,Maths,80
2,Charlie,Maths,70
3,Alice,English,85
4,Bob,English,75
5,Charlie,English,95
6,Alice,Science,88
7,Bob,Science,82
8,Charlie,Science,89


**Output Explanation:**

The resulting DataFrame, `melted_df`, transforms the subject columns into two new columns: one for the subject names (`Subject`) and one for the scores (`Score`). Each student now has multiple rows—one per subject.

## Example: Sales Data Transformation

Imagine you work with monthly sales data for different regions. Each month's sales is stored in a separate column. To analyze trends over time, you might want to convert the DataFrame into a long format:

In [6]:
# Create a DataFrame representing sales data across three months.
df_sales = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'West'],
    'Jan_Sales': [200, 150, 300, 250],
    'Feb_Sales': [220, 160, 310, 260],
    'Mar_Sales': [210, 155, 305, 255]
})

In [7]:
df_sales

Unnamed: 0,Region,Jan_Sales,Feb_Sales,Mar_Sales
0,North,200,220,210
1,South,150,160,155
2,East,300,310,305
3,West,250,260,255


In [8]:
# Melt the DataFrame to create a long-format version
melted_sales = pd.melt(
    df_sales,
    id_vars=['Region'],  # 'Region' is the identifier
    var_name='Month',  # New column to hold month names
    value_name='Sales'  # New column to hold the sales figures
)

In [9]:
melted_sales

Unnamed: 0,Region,Month,Sales
0,North,Jan_Sales,200
1,South,Jan_Sales,150
2,East,Jan_Sales,300
3,West,Jan_Sales,250
4,North,Feb_Sales,220
5,South,Feb_Sales,160
6,East,Feb_Sales,310
7,West,Feb_Sales,260
8,North,Mar_Sales,210
9,South,Mar_Sales,155


### Tidying Up the New “Month” Column

If you want to remove the `_Sales` suffix from the month names, you can do so with simple string manipulation:

In [10]:
# Remove the '_Sales' suffix to tidy up the month names
melted_sales['Month'] = melted_sales['Month'].str.replace(
    '_Sales', '', regex=False)

In [11]:
melted_sales

Unnamed: 0,Region,Month,Sales
0,North,Jan,200
1,South,Jan,150
2,East,Jan,300
3,West,Jan,250
4,North,Feb,220
5,South,Feb,160
6,East,Feb,310
7,West,Feb,260
8,North,Mar,210
9,South,Mar,155


This transformation now makes it easier to visualize trends over months—for example, you could use line plots to compare sales trends region-wise.

## Example: Handling Compound Column Names

Sometimes your column names may encode multiple pieces of information. For example, consider a DataFrame where each column contains both the measurement type (e.g., Score, Grade) and the subject:

In [12]:
# Create a DataFrame with compound column names
df_combined = pd.DataFrame({
    'Student': ['Alice', 'Bob', 'Charlie'],
    'Score_Maths': [90, 80, 70],
    'Score_English': [85, 75, 95],
    'Grade_Maths': ['A', 'B', 'C'],
    'Grade_English': ['B', 'C', 'A']
})

In [13]:
df_combined

Unnamed: 0,Student,Score_Maths,Score_English,Grade_Maths,Grade_English
0,Alice,90,85,A,B
1,Bob,80,75,B,C
2,Charlie,70,95,C,A


### Step 1. Melting Scores and Grades Separately

It might be best to melt the columns of similar types together. First, let’s melt the scores:

In [14]:
# Melt only the score columns
melted_scores = pd.melt(
    df_combined,
    id_vars=['Student'],
    value_vars=['Score_Maths', 'Score_English'],
    var_name='Subject_Info',
    value_name='Score'
)

In [15]:
melted_scores

Unnamed: 0,Student,Subject_Info,Score
0,Alice,Score_Maths,90
1,Bob,Score_Maths,80
2,Charlie,Score_Maths,70
3,Alice,Score_English,85
4,Bob,Score_English,75
5,Charlie,Score_English,95


Then, melt the grade columns:

In [16]:
# Melt the grade columns
melted_grades = pd.melt(
    df_combined,
    id_vars=['Student'],
    value_vars=['Grade_Maths', 'Grade_English'],
    var_name='Subject_Info',
    value_name='Grade'
)

In [17]:
melted_grades

Unnamed: 0,Student,Subject_Info,Grade
0,Alice,Grade_Maths,A
1,Bob,Grade_Maths,B
2,Charlie,Grade_Maths,C
3,Alice,Grade_English,B
4,Bob,Grade_English,C
5,Charlie,Grade_English,A


### Step 2. Splitting the Compound Column Names

Now that you have a column (named `Subject_Info`) that looks like `"Score_Maths"` or `"Grade_English"`, you can split this column to extract the subject:

In [18]:
# Extract the subject part from 'Subject_Info'
melted_scores['Subject'] = melted_scores['Subject_Info'].str.split('_').str[1]
melted_grades['Subject'] = melted_grades['Subject_Info'].str.split('_').str[1]

In [19]:
print("Scores with Extracted Subject:")
melted_scores

Scores with Extracted Subject:


Unnamed: 0,Student,Subject_Info,Score,Subject
0,Alice,Score_Maths,90,Maths
1,Bob,Score_Maths,80,Maths
2,Charlie,Score_Maths,70,Maths
3,Alice,Score_English,85,English
4,Bob,Score_English,75,English
5,Charlie,Score_English,95,English


In [20]:
print("Grades with Extracted Subject:")
melted_grades

Grades with Extracted Subject:


Unnamed: 0,Student,Subject_Info,Grade,Subject
0,Alice,Grade_Maths,A,Maths
1,Bob,Grade_Maths,B,Maths
2,Charlie,Grade_Maths,C,Maths
3,Alice,Grade_English,B,English
4,Bob,Grade_English,C,English
5,Charlie,Grade_English,A,English


### Step 3. Merging the Melted DataFrames

Finally, you may want to merge the two DataFrames on `Student` and `Subject` so that you have one combined DataFrame with both the score and grade:

In [21]:
# Merge scores and grades on 'Student' and 'Subject'
merged_df = pd.merge(
    melted_scores[['Student', 'Subject', 'Score']],
    melted_grades[['Student', 'Subject', 'Grade']],
    on=['Student', 'Subject']
)

In [22]:
merged_df

Unnamed: 0,Student,Subject,Score,Grade
0,Alice,Maths,90,A
1,Bob,Maths,80,B
2,Charlie,Maths,70,C
3,Alice,English,85,B
4,Bob,English,75,C
5,Charlie,English,95,A


This example illustrates how you can work with more complicated column names. The `melt` function is used separately for different groups of columns, then with additional string manipulation and merging, you end up with a tidy DataFrame.

## Tips

- **Identify Identifier Columns Early:** Decide which columns serve as identifiers (stay constant) and which ones are going to be melted.
- **Name Your Output Columns Meaningfully:** Use the `var_name` and `value_name` parameters to give clear, descriptive names to the new columns.
- **Leverage String Operations:** After melting, you can use string methods (`str.split`, `str.replace`, etc.) to further clean your new column data.
- **Check Your Data After Melting:** Always inspect the output to verify that the transformation meets your needs.
- **Remember the Counterpart—Pivot:** While `melt` is great for transforming data from wide to long, look into `pivot` or `pivot_table` if you ever need to reverse the process.