# pandas.melt

pandas.melt is a powerful function in pandas that allows you to transform or reshape a DataFrame from a wide format to a long format. This is often used when you need to normalize your data for easier analysis or to prepare it for specific types of visualizations or operations.

## 1. Basic Concept of pandas.melt

* **Wide format:** Data is spread across multiple columns. Each column represents a different variable.
* **Long format:** Data is condensed into fewer columns, with one column identifying the variable type and another column holding the value.
pandas.melt essentially unpivots the DataFrame, making it longer by turning multiple columns into rows.

## 2. Syntax

```python
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
```

* **frame:** The DataFrame to melt.
* **id_vars:** Columns to use as identifiers (i.e., columns that should remain fixed).
* **value_vars:** Columns to unpivot (i.e., columns to convert into rows).
* **var_name:** Name to use for the variable column. If not specified, uses the original column name.
* **value_name:** Name to use for the value column.
* **col_level:** If columns are multi-indexed, this specifies which level to melt.
* **ignore_index:** If True, the index is reset in the result.

## 3. Basic Example of pandas.melt

In [19]:
import pandas as pd

# Create a simple DataFrame
data = {
    'Date': ['2023-01-01', '2023-01-02'],
    'New York': [32, 30],
    'Los Angeles': [75, 78],
    'Chicago': [28, 27]
}

df = pd.DataFrame(data)
print('Original DataFrame:\n')
print(df)



Original DataFrame:

         Date  New York  Los Angeles  Chicago
0  2023-01-01        32           75       28
1  2023-01-02        30           78       27


In [2]:
# Melt the DataFrame
melted_df = pd.melt(df, id_vars=['Date'], value_vars=['New York', 'Los Angeles', 'Chicago'],
                    var_name='City', value_name='Temperature')
print('\nMelted DataFrame:')
print(melted_df)


Melted DataFrame:
         Date         City  Temperature
0  2023-01-01     New York           32
1  2023-01-02     New York           30
2  2023-01-01  Los Angeles           75
3  2023-01-02  Los Angeles           78
4  2023-01-01      Chicago           28
5  2023-01-02      Chicago           27


* **Explanation:**
  * **id_vars:** The Date column remains fixed.
  * **value_vars:** The New York, Los Angeles, and Chicago columns are unpivoted.
  * **var_name:** The unpivoted column names are stored in the City column.
  * **value_name:** The values from the unpivoted columns are stored in the Temperature column.

## 4. Melt with Default Parameters

In [20]:
# Melt without specifying value_vars
melted_df_default = pd.melt(df, id_vars=['Date'])
print('\nMelted DataFrame with default value_vars:\n')
print(melted_df_default)


Melted DataFrame with default value_vars:

         Date     variable  value
0  2023-01-01     New York     32
1  2023-01-02     New York     30
2  2023-01-01  Los Angeles     75
3  2023-01-02  Los Angeles     78
4  2023-01-01      Chicago     28
5  2023-01-02      Chicago     27


* **Explanation:**
  * By default, melt uses all columns except id_vars as value_vars.
  * The variable column (default name) holds the column names, and the value column holds the data.

## 5. Changing var_name and value_name

In [21]:
# Melt with custom var_name and value_name
melted_df_custom = pd.melt(df, id_vars=['Date'], var_name='Location', value_name='Temp')
print('\nMelted DataFrame with custom var_name and value_name:\n')
print(melted_df_custom)


Melted DataFrame with custom var_name and value_name:

         Date     Location  Temp
0  2023-01-01     New York    32
1  2023-01-02     New York    30
2  2023-01-01  Los Angeles    75
3  2023-01-02  Los Angeles    78
4  2023-01-01      Chicago    28
5  2023-01-02      Chicago    27


* **Explanation:**
  * The variable column is renamed to Location.
  * The value column is renamed to Temp.

## 6. Using pandas.melt with MultiIndex Columns

In [9]:
# Create a DataFrame with MultiIndex columns
arrays = [['Temperature', 'Temperature', 'Humidity', 'Humidity'],
          ['New York', 'Los Angeles', 'New York', 'Los Angeles']]
index = pd.MultiIndex.from_arrays(arrays, names=('Type', 'City'))

df_multi = pd.DataFrame([[32, 75, 80, 20], [30, 78, 85, 18]], columns=index, index=['2023-01-01', '2023-01-02'])
print('\nOriginal DataFrame with MultiIndex columns:\n')
print(df_multi)


# Melt the MultiIndex DataFrame
melted_df_multi = pd.melt(df_multi.reset_index(), id_vars=['index'], col_level=1)
print('\nMelted DataFrame with MultiIndex columns:\n')
print(melted_df_multi)



Original DataFrame with MultiIndex columns:

Type       Temperature             Humidity            
City          New York Los Angeles New York Los Angeles
2023-01-01          32          75       80          20
2023-01-02          30          78       85          18


In [17]:
import pandas as pd

# Create a DataFrame with MultiIndex columns
arrays = [['Temperature', 'Temperature', 'Humidity', 'Humidity'],
          ['New York', 'Los Angeles', 'New York', 'Los Angeles']]
index = pd.MultiIndex.from_arrays(arrays, names=('Type', 'City'))

df_multi = pd.DataFrame([[32, 75, 80, 20], [30, 78, 85, 18]], columns=index, index=pd.to_datetime(['2023-01-01', '2023-01-02']))
print('Original DataFrame with MultiIndex columns:\n')
print(df_multi)

# Reset index and check column names
df_reset = df_multi.reset_index()
print('\nDataFrame after reset_index():\n')
print(df_reset.head())
print('Column names after reset_index():', df_reset.columns)

# Extract the name of the index column
index_column = df_reset.columns[0]
print('Index column name:', index_column)

# Melt the DataFrame without col_level
melted_df_multi = pd.melt(df_reset, id_vars=[index_column])
print('\nMelted DataFrame:\n')
print(melted_df_multi)


Original DataFrame with MultiIndex columns:

Type       Temperature             Humidity            
City          New York Los Angeles New York Los Angeles
2023-01-01          32          75       80          20
2023-01-02          30          78       85          18

DataFrame after reset_index():

Type      index Temperature             Humidity            
City               New York Los Angeles New York Los Angeles
0    2023-01-01          32          75       80          20
1    2023-01-02          30          78       85          18
Column names after reset_index(): MultiIndex([(      'index',            ''),
            ('Temperature',    'New York'),
            ('Temperature', 'Los Angeles'),
            (   'Humidity',    'New York'),
            (   'Humidity', 'Los Angeles')],
           names=['Type', 'City'])
Index column name: ('index', '')

Melted DataFrame:

   (index, )         Type         City  value
0 2023-01-01  Temperature     New York     32
1 2023-01-02  Tempe

## **Explanation:**

  

### 1. Create a MultiIndex for columns
```python
arrays = [['Temperature', 'Temperature', 'Humidity', 'Humidity'],
          ['New York', 'Los Angeles', 'New York', 'Los Angeles']]
index = pd.MultiIndex.from_arrays(arrays, names=('Type', 'City'))
```
- Here, arrays is a list of lists that defines two levels of a MultiIndex for the columns. The first list defines the type (e.g., Temperature, Humidity), and the second list defines the city (e.g., New York, Los Angeles).
- pd.MultiIndex.from_arrays creates a MultiIndex object using the arrays. The levels of the MultiIndex are named 'Type' and 'City'.

### 2. Create a DataFrame with MultiIndex columns
```python
df_multi = pd.DataFrame([[32, 75, 80, 20], [30, 78, 85, 18]], columns=index, index=pd.to_datetime(['2023-01-01', '2023-01-02']))
```
- This creates a DataFrame df_multi with two rows and MultiIndex columns.
- The data is structured so that each city's temperature and humidity on two different dates (2023-01-01 and 2023-01-02) is recorded.
- The index of the DataFrame is a datetime index representing the dates.

### 3. Print the original DataFrame
```python
print('Original DataFrame with MultiIndex columns:\n')
print(df_multi)
```
- This prints the DataFrame with its MultiIndex columns, showing how the data is structured.

### 4. Reset the index
```python
df_reset = df_multi.reset_index()
print('\nDataFrame after reset_index():\n')
print(df_reset.head())
print('Column names after reset_index():', df_reset.columns)
```
- The reset_index function moves the index (dates) back to columns and creates a new default integer index.
- df_reset now has a flat structure with the dates as a regular column.
- The column names after resetting the index are printed to see the structure of the DataFrame.

### 5. Extract the name of the index column
```python
index_column = df_reset.columns[0]
print('Index column name:', index_column)
```
- The first column name (which was the index before reset) is extracted and stored in the variable index_column.
- This column name will be used as an identifier when performing the melt operation.

### 6. Melt the DataFrame
```python
melted_df_multi = pd.melt(df_reset, id_vars=[index_column])
print('\nMelted DataFrame:\n')
print(melted_df_multi)
```
- The pd.melt function is used to unpivot the DataFrame from wide format to long format.
- id_vars=[index_column] ensures that the original index (date) remains as an identifier and is not melted.
- The result is a DataFrame where each row corresponds to a specific date, city, and type (temperature or humidity), with their respective values.

`Summary:`
- The code demonstrates how to create a DataFrame with MultiIndex columns, reset the index to make the data more accessible, and then melt the DataFrame to transform it from wide to long format. This process is useful when you need to reshape data for analysis or visualization.




## 7. Ignoring Index with ignore_index

In [18]:
# Melt with ignore_index
melted_df_ignore_index = pd.melt(df, id_vars=['Date'], ignore_index=False)
print('\nMelted DataFrame with original index retained:')
print(melted_df_ignore_index)


Melted DataFrame with original index retained:
         Date     variable  value
0  2023-01-01     New York     32
1  2023-01-02     New York     30
0  2023-01-01  Los Angeles     75
1  2023-01-02  Los Angeles     78
0  2023-01-01      Chicago     28
1  2023-01-02      Chicago     27


* **Explanation:**
  * The original index from the DataFrame is retained in the melted DataFrame.

## 8. When to Use pandas.melt

* **Normalization:** If you have data in a wide format (multiple columns for variables) and you need to normalize it for analysis.
* **Visualization:** Certain visualizations or statistical analyses require data in a long format.
* **Data Preparation:** Prepares data for certain types of operations, like grouping, merging, or applying functions that require long-format data.

## Summary

* **pandas.melt** is used to reshape DataFrames from wide to long format.
* **Key Concepts:**
  * **id_vars:** Columns that remain fixed in the output DataFrame.
  * **value_vars:** Columns to unpivot into rows.
  * **var_name** and **value_name:** Custom names for the resulting columns.
  * **col_level:** Used for MultiIndex columns to specify which level to melt.
  * **ignore_index:** Determines whether to reset the index in the resulting DataFrame.
* **Applications:**
  * Normalizing data.
  * Preparing data for analysis, visualization, or further processing.
Understanding how to use pandas.melt effectively can help you manipulate and analyze your data more efficiently, particularly when dealing with complex datasets that need to be reshaped for specific tasks.