# Using pandas.DataFrame.pivot
The `pandas.DataFrame.pivot` function is a powerful tool for reshaping DataFrames. It transforms data from a long format to a wide format, which is useful for data analysis and reporting.

In this notebook, we will explore how to use `pivot` to reorganize and summarize data. We will cover basic usage, handling missing data, reshaping data back, and using MultiIndex.

## Basic Concept

`pandas.DataFrame.pivot` rearranges the data in your DataFrame by converting unique values from one column into new columns and organizing the data according to a given index and values.

It is mainly used to reorganize and summarize data for better analysis.

## Syntax

`DataFrame.pivot(index=None, columns=None, values=None)`

- **index**: The column to use as the new DataFrame’s index. If None, uses the existing index.
- **columns**: The column whose unique values will become the columns in the pivoted DataFrame.
- **values**: The column to fill the new DataFrame's values. If None, all remaining columns are used.

## Basic Example

Let's start with a simple example to understand how pivot works.

In [1]:
import pandas as pd

# Create a simple DataFrame
data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
    'Temperature': [32, 75, 30, 78],
    'Humidity': [80, 20, 85, 18]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
df




Original DataFrame:


Unnamed: 0,Date,City,Temperature,Humidity
0,2023-01-01,New York,32,80
1,2023-01-01,Los Angeles,75,20
2,2023-01-02,New York,30,85
3,2023-01-02,Los Angeles,78,18


In [2]:
# Pivot the DataFrame
pivot_df = df.pivot(index='Date', columns='City', values='Temperature')
print("\nPivoted DataFrame:")
pivot_df


Pivoted DataFrame:


City,Los Angeles,New York
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01,75,32
2023-01-02,78,30


## Pivot with Multiple Values

You can also pivot multiple columns by specifying a list of column names in the `values` parameter.

In [3]:
# Pivot with multiple values
pivot_df = df.pivot(index='Date', columns='City', values=['Temperature', 'Humidity'])
print("\nPivoted DataFrame with multiple values:")
pivot_df



Pivoted DataFrame with multiple values:


Unnamed: 0_level_0,Temperature,Temperature,Humidity,Humidity
City,Los Angeles,New York,Los Angeles,New York
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2023-01-01,75,32,20,80
2023-01-02,78,30,18,85


## Handling Missing Data

If your data contains combinations of index and columns that do not exist in the original DataFrame, the resulting pivoted DataFrame will contain NaN for those missing values.

In [4]:
# Add a row with a missing city
data_with_missing = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03'],
    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York'],
    'Temperature': [32, 75, 30, 78, 28],
    'Humidity': [80, 20, 85, 18, 90]
}

df_missing = pd.DataFrame(data_with_missing)
print("\nOriginal DataFrame with missing data:")
df_missing





Original DataFrame with missing data:


Unnamed: 0,Date,City,Temperature,Humidity
0,2023-01-01,New York,32,80
1,2023-01-01,Los Angeles,75,20
2,2023-01-02,New York,30,85
3,2023-01-02,Los Angeles,78,18
4,2023-01-03,New York,28,90


In [5]:
# Pivot the DataFrame
pivot_df_missing = df_missing.pivot(index='Date', columns='City', values='Temperature')
print("\nPivoted DataFrame with missing data:")
pivot_df_missing


Pivoted DataFrame with missing data:


City,Los Angeles,New York
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01,75.0,32.0
2023-01-02,78.0,30.0
2023-01-03,,28.0


## Reshaping Back: From Pivoted DataFrame to Original

If you need to reshape the pivoted DataFrame back to its original long format, you can use the `pandas.DataFrame.melt` function.

In [6]:
# Reshape back using melt
melted_df = pivot_df_missing.reset_index().melt(id_vars='Date', value_name='Temperature')
print("\nMelted DataFrame:")
print(melted_df)



Melted DataFrame:
         Date         City  Temperature
0  2023-01-01  Los Angeles         75.0
1  2023-01-02  Los Angeles         78.0
2  2023-01-03  Los Angeles          NaN
3  2023-01-01     New York         32.0
4  2023-01-02     New York         30.0
5  2023-01-03     New York         28.0


## Pivoting with MultiIndex

You can create a pivoted DataFrame with multiple index levels (MultiIndex) by using multiple columns in the `index` parameter.

In [7]:
# Create a DataFrame with more complex data
data_multi = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03'],
    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York'],
    'Type': ['Temperature', 'Temperature', 'Temperature', 'Temperature', 'Temperature'],
    'Value': [32, 75, 30, 78, 28]
}

df_multi = pd.DataFrame(data_multi)
print("\nOriginal DataFrame with multiple index levels:")
print(df_multi)

# Pivot with MultiIndex
pivot_df_multi = df_multi.pivot(index=['Date', 'Type'], columns='City', values='Value')
print("\nPivoted DataFrame with MultiIndex:")
pivot_df_multi



Original DataFrame with multiple index levels:
         Date         City         Type  Value
0  2023-01-01     New York  Temperature     32
1  2023-01-01  Los Angeles  Temperature     75
2  2023-01-02     New York  Temperature     30
3  2023-01-02  Los Angeles  Temperature     78
4  2023-01-03     New York  Temperature     28

Pivoted DataFrame with MultiIndex:


Unnamed: 0_level_0,City,Los Angeles,New York
Date,Type,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-01,Temperature,75.0,32.0
2023-01-02,Temperature,78.0,30.0
2023-01-03,Temperature,,28.0


## Pivoting without Aggregation: Difference from pivot_table

Unlike `pivot_table`, which performs aggregation (e.g., sum, mean), `pivot` does not perform any aggregation. It simply reshapes the data.

## Summary

The `pandas.DataFrame.pivot` function is used to reshape DataFrames by transforming data into a wide format, making it easier to analyze and report. Key points include:

- **Parameters**:
  - **index**: Determines the new index for the pivoted DataFrame.
  - **columns**: Defines the new columns based on unique values from this column.
  - **values**: Specifies which column's values to use for filling the new DataFrame.
- **Handling Missing Data**: The resulting pivoted DataFrame will show NaN for missing values.
- **Reshaping**: Use `melt` to reshape data back to its long format.
- **MultiIndex**: Supports creating pivoted DataFrames with hierarchical indices.

Understanding how to use `pivot` effectively can help you organize and analyze data more efficiently, especially when dealing with complex datasets.

In [None]:
pandas.DataFrame.pivot()