#Melt and Pivot

In [1]:
import pandas as pd

In [2]:
#simple dataframe

data = {'Name':['Alice','Bob','Charlie'],
       'Math':[85,78,92],
       'Science':[90,82,89],
       'English':[88,85,94]}

df = pd.DataFrame(data)

#display dataframe
print(df)

      Name  Math  Science  English
0    Alice    85       90       88
1      Bob    78       82       85
2  Charlie    92       89       94


#melt() — Wide to Long

The melt() method in Pandas is used to unpivot a DataFrame from wide format to long format. In other words, it takes columns that represent different variables and combines them into key-value pairs (i.e., long-form data).

When to Use melt():
    
When you have a DataFrame where each row is an observation, and each column represents a different variable or measurement, and you want to reshape the data into a longer format for easier analysis or visualization.

#Parameters:

id_vars: The columns that you want to keep fixed (these columns will remain as identifiers).

value_vars: The columns you want to unpivot (the ones you want to "melt" into a single column).

var_name: The name to use for the new column that will contain the names of the melted columns (default is 'variable').

value_name: The name to use for the new column that will contain the values from the melted columns (default is 'value').

col_level: Used for multi-level column DataFrames.

In [11]:
df2 = df.melt(id_vars=["Name"], value_vars=["Math","Science","English"], var_name="Subject", 
        value_name="Score").copy()
df2

Unnamed: 0,Name,Subject,Score
0,Alice,Math,85
1,Bob,Math,78
2,Charlie,Math,92
3,Alice,Science,90
4,Bob,Science,82
5,Charlie,Science,89
6,Alice,English,88
7,Bob,English,85
8,Charlie,English,94


#pivot() — Long to Wide

The pivot() function in Pandas is used to reshape data, specifically to turn long-format data into wide-format data. This is the reverse operation of melt().

How it works:

pivot() takes a long-format DataFrame and turns it into a wide-format DataFrame by specifying which columns will become the new columns, the rows, and the values.

In [13]:
df2.pivot(index="Name", columns="Subject", values="Score")

Subject,English,Math,Science
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alice,88,85,90
Bob,85,78,82
Charlie,94,92,89


#Parameters:

index: The column whose unique values will become the rows of the new DataFrame.

columns: The column whose unique values will become the columns of the new DataFrame.

values: The column whose values will fill the new DataFrame. These will become the actual data (values in the table).

#pivot table

Important Notes:

Duplicate Entries: If you have multiple rows with the same combination of index and columns, pivot() will raise an error. In such cases, you should use pivot_table() (which can handle duplicate entries by aggregating them).

#pivot table example:-

In [21]:
#simple dataframe

data1 = {'Name':['Alice','Bob','Charlie', 'Alice'],
       'Math':[85,78,92,99],
       'Science':[90,82,89,44]}

df3 = pd.DataFrame(data1)

#display dataframe
print(df3)

      Name  Math  Science
0    Alice    85       90
1      Bob    78       82
2  Charlie    92       89
3    Alice    99       44


In [22]:
df4 = df3.melt(id_vars=["Name"], value_vars=["Math","Science"], 
               var_name="Subject", value_name="Score").copy()

In [23]:
df4

Unnamed: 0,Name,Subject,Score
0,Alice,Math,85
1,Bob,Math,78
2,Charlie,Math,92
3,Alice,Math,99
4,Alice,Science,90
5,Bob,Science,82
6,Charlie,Science,89
7,Alice,Science,44


In [24]:
df4.pivot_table(index="Name", columns="Subject", values="Score", aggfunc="mean")

Subject,Math,Science
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,92,67
Bob,78,82
Charlie,92,89
